Due: Tuesday 2020-11-03 at 5:00pm
In this lab we will work with three packages: ISLR
for the data, tidyverse
which is a collection of packages for doing data analysis in a “tidy” way and tidymodels
for statistial modeling.
If you’d like to run your code in the Console as well you’ll also need to load the packages there. To do so, run the following in the console.
Note that the packages are also loaded with the same commands in your R Markdown document.
For this lab, we are using Wage
data from the ISLR
package.
Examine the Wage
data set from the ISLR
package. What are the variables? How many observations are there?
Create a linear model specification, setting the engine to lm
. Call this model specification linear_spec
.
Create a recipe using the Wage
data from the ISLR
package. We want to predict the variable wage
from age
, health_ins
, jobclass
, education
, and race
. We want age to be fit using a natural spline. Use tune()
to decide how many degrees of freedom to use for the age
variable. (In this exercise, you are just creating the recipe).
Use tune_grid()
to fit the linear model specified in Exercise 2 with the recipe created in Exercise 3 using 10-fold cross validation, similar to Lab 04. Choose the model with the lowest RMSE. How many degrees of freedom were used for the age natural spline for this best model? Report the RMSE for this model as well as the chosen degrees of freedom.
Create a recipe using the Wage
data from the ISLR
package. We want to predict the variable wage
from age
, health_ins
, jobclass
, education
, and race
. We want age to be fit using a polynomial. Use tune()
to decide how many degrees the polynomial should have for the age
variable. (In this exercise, you are just creating the recipe).
Use tune_grid()
to fit the linear model specified in Exercise 2 with the recipe created in Exercise 5 using 10-fold cross validation, similar to Lab 04. Choose the model with the lowest RMSE. What degree polynomial was used for age for this best model? Report the RMSE for this model as well as the chosen degree.
If the goal is prediction, which model would you prefer, the one fit in Exercise 4 or the one fit in Exercise 6? Using your chosen model, examine whether ridge, lasso, or elastic net would provide a better fit. Describe your results.