+ - 0:00:00
Notes for current slide
Notes for next slide

Cross validation

Dr. D’Agostino McGowan

1 / 22

Cross validation

💡 Big idea

  • We have determined that it is sensible to use a test set to calculate metrics like prediction error
2 / 22

Cross validation

💡 Big idea

  • We have determined that it is sensible to use a test set to calculate metrics like prediction error

Why?

2 / 22

Cross validation

💡 Big idea

  • We have determined that it is sensible to use a test set to calculate metrics like prediction error

How have we done this so far?

3 / 22

Cross validation

💡 Big idea

  • We have determined that it is sensible to use a test set to calculate metrics like prediction error
  • What if we don't have a seperate data set to test our model on?
4 / 22

Cross validation

💡 Big idea

  • We have determined that it is sensible to use a test set to calculate metrics like prediction error
  • What if we don't have a seperate data set to test our model on?
  • 🎉 We can use resampling methods to estimate the test-set prediction error
4 / 22

Training error versus test error

What is the difference? Which is typically larger?

5 / 22

Training error versus test error

What is the difference? Which is typically larger?

  • The training error is calculated by using the same observations used to fit the statistical learning model
5 / 22

Training error versus test error

What is the difference? Which is typically larger?

  • The training error is calculated by using the same observations used to fit the statistical learning model
  • The test error is calculated by using a statistical learning method to predict the response of new observations
5 / 22

Training error versus test error

What is the difference? Which is typically larger?

  • The training error is calculated by using the same observations used to fit the statistical learning model
  • The test error is calculated by using a statistical learning method to predict the response of new observations
  • The training error rate typically underestimates the true prediction error rate
5 / 22

6 / 22

Estimating prediction error

  • Best case scenario: We have a large data set to test our model on
7 / 22

Estimating prediction error

  • Best case scenario: We have a large data set to test our model on
  • This is not always the case!
7 / 22

Estimating prediction error

  • Best case scenario: We have a large data set to test our model on
  • This is not always the case!

💡 Let's instead find a way to estimate the test error by holding out a subset of the training observations from the model fitting process, and then applying the statistical learning method to those held out observations

7 / 22

Approach #1: Validation set

  • Randomly divide the available set up samples into two parts: a training set and a validation set
8 / 22

Approach #1: Validation set

  • Randomly divide the available set up samples into two parts: a training set and a validation set
  • Fit the model on the training set, calculate the prediction error on the validation set
8 / 22

Approach #1: Validation set

  • Randomly divide the available set up samples into two parts: a training set and a validation set
  • Fit the model on the training set, calculate the prediction error on the validation set

If we have a quantitative predictor what metric would we use to calculate this test error?

8 / 22

Approach #1: Validation set

  • Randomly divide the available set up samples into two parts: a training set and a validation set
  • Fit the model on the training set, calculate the prediction error on the validation set

If we have a quantitative predictor what metric would we use to calculate this test error?

  • Often we use Mean Squared Error (MSE)
8 / 22

Approach #1: Validation set

  • Randomly divide the available set up samples into two parts: a training set and a validation set
  • Fit the model on the training set, calculate the prediction error on the validation set

If we have a qualitative predictor (classification) what metric would we use to calculate this test error?

9 / 22

Approach #1: Validation set

  • Randomly divide the available set up samples into two parts: a training set and a validation set
  • Fit the model on the training set, calculate the prediction error on the validation set

If we have a qualitative predictor (classification) what metric would we use to calculate this test error?

  • Often we use misclassification rate
9 / 22

Approach #1: Validation set

10 / 22

Approach #1: Validation set

MSEtest-split=Aveitest-split[yif^(xi)]2

10 / 22

Approach #1: Validation set

MSEtest-split=Aveitest-split[yif^(xi)]2 Errtest-split=Aveitest-splitI[yiC^(xi)]

10 / 22

Approach #1: Validation set

Auto example:

  • We have 392 observations
  • Trying to predict mpg from horsepower
  • We can split the data in half and use 196 to fit the model and 196 to test

11 / 22

Approach #1: Validation set

MSEtest-split

12 / 22

Approach #1: Validation set

MSEtest-split

MSEtest-split

12 / 22

Approach #1: Validation set

MSEtest-split

MSEtest-split

MSEtest-split

12 / 22

Approach #1: Validation set

MSEtest-split

MSEtest-split

MSEtest-split

MSEtest-split

12 / 22

Approach #1: Validation set

Auto example:

  • We have 392 observations
  • Trying to predict mpg from horsepower
  • We can split the data in half and use 196 to fit the model and 196 to test - what if we did this many times?

13 / 22

Approach #1: Validation set (Drawbacks)

  • the validation estimate of the test error can be highly variable, depending on which observations are included in the training set and which observations are included in the validation set
14 / 22

Approach #1: Validation set (Drawbacks)

  • the validation estimate of the test error can be highly variable, depending on which observations are included in the training set and which observations are included in the validation set
  • In the validation approach, only a subset of the observations (those that are included in the training set rather than in the validation set) are used to fit the model
14 / 22

Approach #1: Validation set (Drawbacks)

  • the validation estimate of the test error can be highly variable, depending on which observations are included in the training set and which observations are included in the validation set
  • In the validation approach, only a subset of the observations (those that are included in the training set rather than in the validation set) are used to fit the model
  • Therefore, the validation set error may tend to overestimate the test error for the model fit on the entire data set
14 / 22

Approach #2: K-fold cross validation

💡 The idea is to do the following:

  • Randomly divide the data into K equal-sized parts
15 / 22

Approach #2: K-fold cross validation

💡 The idea is to do the following:

  • Randomly divide the data into K equal-sized parts
  • Leave out part k, fit the model to the other K1 parts (combined)
15 / 22

Approach #2: K-fold cross validation

💡 The idea is to do the following:

  • Randomly divide the data into K equal-sized parts
  • Leave out part k, fit the model to the other K1 parts (combined)
  • Obtain predictions for the left-out kth part
15 / 22

Approach #2: K-fold cross validation

💡 The idea is to do the following:

  • Randomly divide the data into K equal-sized parts
  • Leave out part k, fit the model to the other K1 parts (combined)
  • Obtain predictions for the left-out kth part
  • Do this for each part k=1,2,K, and then combine the result
15 / 22

K-fold cross validation

MSEtest-split-1

16 / 22

K-fold cross validation

MSEtest-split-1

MSEtest-split-2

16 / 22

K-fold cross validation

MSEtest-split-1

MSEtest-split-2

MSEtest-split-3

16 / 22

K-fold cross validation

MSEtest-split-1

MSEtest-split-2

MSEtest-split-3

MSEtest-split-4

16 / 22

K-fold cross validation

MSEtest-split-1

MSEtest-split-2

MSEtest-split-3

MSEtest-split-4

Take the mean of the k MSE values

16 / 22

Estimating prediction error (quantitative outcome)

  • Split the data into K parts, where C1,C2,,Ck indicate the indices of observations in part k

CV(K)=k=1KnknMSEk

17 / 22

Estimating prediction error (quantitative outcome)

  • Split the data into K parts, where C1,C2,,Ck indicate the indices of observations in part k

CV(K)=k=1KnknMSEk

  • MSEk=iCk(yiy^i)2/nk
17 / 22

Estimating prediction error (quantitative outcome)

  • Split the data into K parts, where C1,C2,,Ck indicate the indices of observations in part k

CV(K)=k=1KnknMSEk

  • MSEk=iCk(yiy^i)2/nk
  • nk is the number of observations in group k
  • y^i is the fit for observation i obtained from the data with the part k removed
17 / 22

Estimating prediction error (quantitative outcome)

  • Split the data into K parts, where C1,C2,,Ck indicate the indices of observations in part k

CV(K)=k=1KnknMSEk

  • MSEk=iCk(yiy^i)2/nk
  • nk is the number of observations in group k
  • y^i is the fit for observation i obtained from the data with the part k removed
  • If we set K=n, we'd have nfold cross validation which is the same as leave-one-out cross validation (LOOCV)
17 / 22

Leave-one-out cross validation

18 / 22

Leave-one-out cross validation

18 / 22

Leave-one-out cross validation

18 / 22

Leave-one-out cross validation

18 / 22

Leave-one-out cross validation

18 / 22

Leave-one-out cross validation

18 / 22

Leave-one-out cross validation

18 / 22

Leave-one-out cross validation

18 / 22

Picking K

  • K can vary from 2 (splitting the data in half each time) to n (LOOCV)
19 / 22

Picking K

  • K can vary from 2 (splitting the data in half each time) to n (LOOCV)
  • LOOCV is sometimes useful but usually the estimates from each fold are very correlated, so their average can have a high variance
19 / 22

Picking K

  • K can vary from 2 (splitting the data in half each time) to n (LOOCV)
  • LOOCV is sometimes useful but usually the estimates from each fold are very correlated, so their average can have a high variance
  • A better choice tends to be K=5 or K=10
19 / 22

Bias variance trade-off

  • Since each training set is only (K1)/K as big as the original training set, the estimates of prediction error will typically be biased upward
20 / 22

Bias variance trade-off

  • Since each training set is only (K1)/K as big as the original training set, the estimates of prediction error will typically be biased upward
  • This bias is minimized when K=n (LOOCV), but this estimate has a high variance
20 / 22

Bias variance trade-off

  • Since each training set is only (K1)/K as big as the original training set, the estimates of prediction error will typically be biased upward
  • This bias is minimized when K=n (LOOCV), but this estimate has a high variance
  • K=5 or K=10 provides a nice compromise for the bias-variance trade-off
20 / 22

Approach #2: K-fold Cross Validation

Auto example:

  • We have 392 observations
  • Trying to predict mpg from horsepower

21 / 22

Estimating prediction error (qualitative outcome)

  • The premise is the same as cross valiation for quantitative outcomes
  • Split the data into K parts, where C1,C2,,Ck indicate the indices of observations in part k

CVK=k=1KnknErrk

22 / 22

Estimating prediction error (qualitative outcome)

  • The premise is the same as cross valiation for quantitative outcomes
  • Split the data into K parts, where C1,C2,,Ck indicate the indices of observations in part k

CVK=k=1KnknErrk

  • Errk=iCkI(yiy^i)/nk (missclassification rate)
22 / 22

Estimating prediction error (qualitative outcome)

  • The premise is the same as cross valiation for quantitative outcomes
  • Split the data into K parts, where C1,C2,,Ck indicate the indices of observations in part k

CVK=k=1KnknErrk

  • Errk=iCkI(yiy^i)/nk (missclassification rate)
  • nk is the number of observations in group k
  • y^i is the fit for observation i obtained from the data with the part k removed
22 / 22

Cross validation

💡 Big idea

  • We have determined that it is sensible to use a test set to calculate metrics like prediction error
2 / 22
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow