Due: Tuesday 2020-10-06 at 5pm
03-lab-logistic
)In this lab we will work with four packages: ISLR
which is a package that accompanies your textbook, tidyverse
which is a collection of packages for doing data analysis in a “tidy” way, tidymodels
a collection of packages for statistical modeling, and GGally
, a package to help us visualize the data.
For this lab, we are using the Smarket
data from the ISLR
package.
$$
.Equation (1)
\[p(X)=\frac{e^{\beta_0+\beta_1X}}{1+e^{\beta_0+\beta_1X}}\]
Equation (2)
\[\textrm{log}\left(\frac{p(X)}{1-p(X)}\right)=\beta_0+\beta_1X\]
For this lab we are using the Smarket
data. Examine this data set - how many observations are there? How many columns? What are the variables?
Let’s look at the correlation between all of the variables. To do this, if you haven’t done so already, we need to install the GGally
package. Run the following code in your Console one time.
Then add the code below to your .Rmd file. What can you learn from this visualization? Which pair of variables have the highest correlation?
Inference Fit a logistic regression model to predict Direction
using Lag1
, Lag2
, Lag3
, Lag4
, Lag5
, and Volume
. Show a table that contains the coefficients and p-values along with the confidence intervals for each of the 6 predictors. Which predictor has the smallest p-value? Interpret the coefficient, confidence interval, and p-value for this predictor.
Inference Exponentiate the results from Exercise 5. Interpret the odds ratio for the same predictor you selected in Exercise 5.
Prediction Using 5-fold cross validation, fit the same logistic regression model as Exercise 5. What is the test Accuracy for this model? Interpret this result.
Inference Fit a logistic regression model to predict Direction
using only Lag1
and Lag2
. Show a table that contains the coefficients and p-values along with the confidence intervals for each of the 2 predictors. Which predictor has the smallest p-value? Interpret the coefficient, confidence interval, and p-value for this predictor.
Inference Exponentiate the results from Exercise 8. Interpret the odds ratio for the same predictor you selected in Exercise 8.
Prediction Using 5-fold cross validation, fit the same logistic regression model as Exercise 8. What is the test Accuracy for this model? Interpret this result.
If you had to choose between the model fit in Exercise 5 and the one fit in Exercise 8, which would you choose? Why?