Lab 03 - Logistic Regression

Due: Tuesday 2020-10-06 at 5pm

Getting started

Packages

In this lab we will work with four packages: ISLR which is a package that accompanies your textbook, tidyverse which is a collection of packages for doing data analysis in a “tidy” way, tidymodels a collection of packages for statistical modeling, and GGally, a package to help us visualize the data.

library(tidyverse) 
library(tidymodels)
library(ISLR)
library(GGally)

Data

For this lab, we are using the Smarket data from the ISLR package.

Exercises

Conceptual questions

  1. Using algebra, rearrange equation (1) below to get equation (2). Show all steps. Be sure to use Latex, remember you can insert a Latex equation by wrapping it in $$.

Equation (1)

\[p(X)=\frac{e^{\beta_0+\beta_1X}}{1+e^{\beta_0+\beta_1X}}\]

Equation (2)

\[\textrm{log}\left(\frac{p(X)}{1-p(X)}\right)=\beta_0+\beta_1X\]

  1. Suppose that an individual has a 23% chance of defaulting on their credit card payment. What are the odds that they will default?

Logistic Regression

  1. For this lab we are using the Smarket data. Examine this data set - how many observations are there? How many columns? What are the variables?

  2. Let’s look at the correlation between all of the variables. To do this, if you haven’t done so already, we need to install the GGally package. Run the following code in your Console one time.

install.packages("GGally")

Then add the code below to your .Rmd file. What can you learn from this visualization? Which pair of variables have the highest correlation?

ggpairs(Smarket, lower = list(combo = wrap(ggally_facethist, binwidth = 0.5)), progress = FALSE)
  1. Inference Fit a logistic regression model to predict Direction using Lag1, Lag2, Lag3, Lag4, Lag5, and Volume. Show a table that contains the coefficients and p-values along with the confidence intervals for each of the 6 predictors. Which predictor has the smallest p-value? Interpret the coefficient, confidence interval, and p-value for this predictor.

  2. Inference Exponentiate the results from Exercise 5. Interpret the odds ratio for the same predictor you selected in Exercise 5.

  3. Prediction Using 5-fold cross validation, fit the same logistic regression model as Exercise 5. What is the test Accuracy for this model? Interpret this result.

  4. Inference Fit a logistic regression model to predict Direction using only Lag1 and Lag2. Show a table that contains the coefficients and p-values along with the confidence intervals for each of the 2 predictors. Which predictor has the smallest p-value? Interpret the coefficient, confidence interval, and p-value for this predictor.

  5. Inference Exponentiate the results from Exercise 8. Interpret the odds ratio for the same predictor you selected in Exercise 8.

  6. Prediction Using 5-fold cross validation, fit the same logistic regression model as Exercise 8. What is the test Accuracy for this model? Interpret this result.

  7. If you had to choose between the model fit in Exercise 5 and the one fit in Exercise 8, which would you choose? Why?