+ - 0:00:00
Notes for current slide
Notes for next slide

Decision trees - Regression tree building

Dr. D’Agostino McGowan

1 / 7

The tree building process

  • Divide the predictor space (the set of possible values for X1,X2,,Xp ) into J distinct non-overlapping regions, R1,R2,Rj
2 / 7

The tree building process

  • Divide the predictor space (the set of possible values for X1,X2,,Xp ) into J distinct non-overlapping regions, R1,R2,Rj
  • For every observation that falls into the region Rj, we make the same prediction, the mean response value for the training observations in Rj
2 / 7

The tree building process

  • The regions could have any shape, but we choose to divide the predictor space into high-dimensional boxes for simplicity and ease of interpretation
3 / 7

The tree building process

  • The regions could have any shape, but we choose to divide the predictor space into high-dimensional boxes for simplicity and ease of interpretation
  • The goal is to find boxes, R1,,Rj that minimize the RSS, given by

j=1JiRj(yiy^Rj)2 where y^Rj is the mean response for the training observations within the jth box.

3 / 7

The tree building process

  • It is often computationally infeasible to consider every possible partition of the feature space into J boxes
4 / 7

The tree building process

  • It is often computationally infeasible to consider every possible partition of the feature space into J boxes
  • Therefore, we take a top-down, greedy approach known as recursive binary splitting
4 / 7

The tree building process

  • It is often computationally infeasible to consider every possible partition of the feature space into J boxes
  • Therefore, we take a top-down, greedy approach known as recursive binary splitting
  • This is top-down because it begins at the top of the tree and then splits the predictor space successively into two branches at a time
4 / 7

The tree building process

  • It is often computationally infeasible to consider every possible partition of the feature space into J boxes
  • Therefore, we take a top-down, greedy approach known as recursive binary splitting
  • This is top-down because it begins at the top of the tree and then splits the predictor space successively into two branches at a time
  • It is greedy because at each step the best split is made at that step (instead of looking forward and picking a split that may result in a better tree in a future step)
4 / 7

The tree building process

  • First select the predictor Xj and the cutpoint s such that splitting the predictor space into {X|Xj<s} and {X|Xks} leads to the greatest possible reduction in RSS
5 / 7

The tree building process

  • First select the predictor Xj and the cutpoint s such that splitting the predictor space into {X|Xj<s} and {X|Xks} leads to the greatest possible reduction in RSS
  • We repeat this process, looking for the best predictor and cutpoint to split the data within each of the resulting regions
  • Now instead of splitting the entire predictor space, we split one of the two previously identified regions, now we have three regions
5 / 7

The tree building process

  • First select the predictor Xj and the cutpoint s such that splitting the predictor space into {X|Xj<s} and {X|Xks} leads to the greatest possible reduction in RSS
  • We repeat this process, looking for the best predictor and cutpoint to split the data within each of the resulting regions
  • Now instead of splitting the entire predictor space, we split one of the two previously identified regions, now we have three regions
  • Then we look to split one of these three regions to minimize the RSS
5 / 7

The tree building process

  • First select the predictor Xj and the cutpoint s such that splitting the predictor space into {X|Xj<s} and {X|Xks} leads to the greatest possible reduction in RSS
  • We repeat this process, looking for the best predictor and cutpoint to split the data within each of the resulting regions
  • Now instead of splitting the entire predictor space, we split one of the two previously identified regions, now we have three regions
  • Then we look to split one of these three regions to minimize the RSS
  • This process continues until some stopping criteria are met. 🛑 e.g., we could stop when we have created a fixed number of regions, or we could keep going until no region contains more than 5 observations, etc.
5 / 7
05:00

Draw a partition

Draw an example of a parition of a two-dimensional feature space that could result from recursive binary splitting with six regions. Label your figure with the regions, R1,,R6 as well as the cutpoints t1,t2,. Draw a decision tree corresponding to this partition.

6 / 7
7 / 7

The tree building process

  • Divide the predictor space (the set of possible values for X1,X2,,Xp ) into J distinct non-overlapping regions, R1,R2,Rj
2 / 7
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow