linear_reg() %>% set_engine("lm")
linear_reg() %>% set_engine("glmnet")
linear_reg() %>% set_engine("spark")
decision_tree() %>% set_engine("ranger")
Specify Model
Write a pipe that creates a model that uses lm()
to fit a linear regression using tidymodels. Save it as lm_spec
and look at the object. What does it return?
Hint: you'll need https://www.tidymodels.org
02:00
lm_spec <- linear_reg() %>% # Pick linear regression set_engine(engine = "lm") # set enginelm_spec
## Linear Regression Model Specification (regression)## ## Computational engine: lm
fit()
functionfit(lm_spec, mpg ~ horsepower, data = Auto)
## parsnip model object## ## Fit time: 7ms ## ## Call:## stats::lm(formula = mpg ~ horsepower, data = data)## ## Coefficients:## (Intercept) horsepower ## 39.9359 -0.1578
Fit Model
Fit the model:
library(ISLR)lm_fit <- fit(lm_spec, mpg ~ horsepower, data = Auto)lm_fit
Does this give the same results as
lm(mpg ~ horsepower, data = Auto)
01:30
lm_fit %>% predict(new_data = Auto)
lm_fit %>% predict(new_data = Auto)
predict()
functionlm_fit %>% predict(new_data = Auto)
predict()
functionnew_data
has an underscorelm_fit %>% predict(new_data = Auto)
predict()
functionnew_data
has an underscorelm_fit %>% predict(new_data = Auto) %>% bind_cols(Auto)
## # A tibble: 392 x 10## .pred mpg cylinders displacement horsepower weight acceleration year## * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>## 1 19.4 18 8 307 130 3504 12 70## 2 13.9 15 8 350 165 3693 11.5 70## 3 16.3 18 8 318 150 3436 11 70## 4 16.3 16 8 304 150 3433 12 70## 5 17.8 17 8 302 140 3449 10.5 70## 6 8.68 15 8 429 198 4341 10 70## 7 5.21 14 8 454 220 4354 9 70## 8 6.00 14 8 440 215 4312 8.5 70## 9 4.42 14 8 455 225 4425 10 70## 10 9.95 15 8 390 190 3850 8.5 70## # … with 382 more rows, and 2 more variables: origin <dbl>, name <fct>
01:30
Get predictions
Edit the code below to add the original data to the predicted data.
mpg_pred <- lm_fit %>% predict(new_data = Auto) %>% ---
mpg_pred <- lm_fit %>% predict(new_data = Auto) %>% bind_cols(Auto)mpg_pred
## # A tibble: 392 x 10## .pred mpg cylinders displacement horsepower weight acceleration year## * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>## 1 19.4 18 8 307 130 3504 12 70## 2 13.9 15 8 350 165 3693 11.5 70## 3 16.3 18 8 318 150 3436 11 70## 4 16.3 16 8 304 150 3433 12 70## 5 17.8 17 8 302 140 3449 10.5 70## 6 8.68 15 8 429 198 4341 10 70## 7 5.21 14 8 454 220 4354 9 70## 8 6.00 14 8 440 215 4312 8.5 70## 9 4.42 14 8 455 225 4425 10 70## 10 9.95 15 8 390 190 3850 8.5 70## # … with 382 more rows, and 2 more variables: origin <dbl>, name <fct>
mpg_pred %>% rmse(truth = mpg, estimate = .pred)
## # A tibble: 1 x 3## .metric .estimator .estimate## <chr> <chr> <dbl>## 1 rmse standard 4.89
mpg_pred %>% rmse(truth = mpg, estimate = .pred)
## # A tibble: 1 x 3## .metric .estimator .estimate## <chr> <chr> <dbl>## 1 rmse standard 4.89
What is this estimate? (training error? testing error?)
Auto_split <- initial_split(Auto, prop = 0.5)Auto_split
## <Analysis/Assess/Total>## <196/196/392>
Auto_split <- initial_split(Auto, prop = 0.5)Auto_split
## <Analysis/Assess/Total>## <196/196/392>
training(Auto_split)testing(Auto_split)
Auto_train <- training(Auto_split)
Auto_train
## # A tibble: 196 x 9## mpg cylinders displacement horsepower weight acceleration year origin## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>## 1 14 8 454 220 4354 9 70 1## 2 15 8 383 170 3563 10 70 1## 3 14 8 340 160 3609 8 70 1## 4 14 8 455 225 3086 10 70 1## 5 24 4 113 95 2372 15 70 3## 6 18 6 199 97 2774 15.5 70 1## 7 21 6 200 85 2587 16 70 1## 8 25 4 110 87 2672 17.5 70 2## 9 24 4 107 90 2430 14.5 70 2## 10 25 4 104 95 2375 17.5 70 2## # … with 186 more rows, and 1 more variable: name <fct>
04:00
Validation Set
Copy the code below, fill in the blanks to fit a model on the training data then calculate the test RMSE.
set.seed(100)Auto_split <- ________Auto_train <- ________Auto_test <- ________lm_fit <- fit(lm_spec, mpg ~ horsepower, data = ________)mpg_pred <- ________ %>% predict(new_data = ________) %>% bind_cols(________)rmse(________, truth = ________, estimate = ________)
last_fit()
and specify the splittrain
data from the splitrmse
as before) you can just use collect_metrics()
and it will automatically calculate the metrics on the test
data from the splitset.seed(100)Auto_split <- initial_split(Auto, prop = 0.5)lm_fit <- last_fit(lm_spec, mpg ~ horsepower, split = Auto_split)lm_fit %>% collect_metrics()
## # A tibble: 2 x 3## .metric .estimator .estimate## <chr> <chr> <dbl>## 1 rmse standard 4.87 ## 2 rsq standard 0.625
Auto_cv <- vfold_cv(Auto, v = 5)Auto_cv
## # 5-fold cross-validation ## # A tibble: 5 x 2## splits id ## <list> <chr>## 1 <split [313/79]> Fold1## 2 <split [313/79]> Fold2## 3 <split [314/78]> Fold3## 4 <split [314/78]> Fold4## 5 <split [314/78]> Fold5
fit_resamples(lm_spec, mpg ~ horsepower, resamples = Auto_cv)
fit_resamples(lm_spec, mpg ~ horsepower, resamples = Auto_cv)
## # 5-fold cross-validation ## # A tibble: 5 x 4## splits id .metrics .notes ## <list> <chr> <list> <list> ## 1 <split [313/79]> Fold1 <tibble [2 × 3]> <tibble [0 × 1]>## 2 <split [313/79]> Fold2 <tibble [2 × 3]> <tibble [0 × 1]>## 3 <split [314/78]> Fold3 <tibble [2 × 3]> <tibble [0 × 1]>## 4 <split [314/78]> Fold4 <tibble [2 × 3]> <tibble [0 × 1]>## 5 <split [314/78]> Fold5 <tibble [2 × 3]> <tibble [0 × 1]>
How do we get the metrics out? With collect_metrics()
again!
How do we get the metrics out? With collect_metrics()
again!
results <- fit_resamples(lm_spec, mpg ~ horsepower, resamples = Auto_cv)results %>% collect_metrics()
## # A tibble: 2 x 5## .metric .estimator mean n std_err## <chr> <chr> <dbl> <int> <dbl>## 1 rmse standard 4.93 5 0.0779## 2 rsq standard 0.611 5 0.0277
02:00
K-fold cross validation
Edit the code below to get the 5-fold cross validation error rate for the following model:
mpg=β0+β1horsepower+β2horsepower2+ϵ
Auto_cv <- vfold_cv(Auto, v = 5)results <- fit_resamples(lm_spec, ----, resamples = ---)results %>% collect_metrics()
rsq
is?Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |