class: center, middle, inverse, title-slide # Variable Importance ### Dr. D’Agostino McGowan --- layout: true <div class="my-footer"> <span> Dr. Lucy D'Agostino McGowan <i>adapted from slides by Hastie & Tibshirani</i> </span> </div> --- ## Variable importance * For bagged or random forest _regression trees_, we can record the _total RSS_ that is decreased due to splits of a given predictor `\(X_i\)` averaged over all `\(B\)` trees -- * A large value would indicate that that variable is _important_ --- ## Variable importance * For bagged or random forest _classification trees_ we can add up the total amount that the Gini Index is decreased by splits of a given predictor, `\(X_i\)`, averaged over `\(B\)` trees --- ## Variable importance in R .small[ ```r rf_spec <- rand_forest( mode = "classification", mtry = 3 ) %>% set_engine( "ranger", * importance = "impurity") model <- fit(rf_spec, HD ~ Age + Sex + ChestPain + RestBP + Chol + Fbs + RestECG + MaxHR + ExAng + Oldpeak + Slope + Ca + Thal, data = heart) ``` ] ```r ranger::importance(model$fit) ``` ``` ## Age Sex ChestPain RestBP Chol Fbs RestECG ## 8.9760549 4.2966894 15.7186880 7.1357392 7.3301372 0.6465737 1.7371211 ## MaxHR ExAng Oldpeak Slope Ca Thal ## 13.2162856 6.3648232 13.5063487 5.9506314 17.6392069 14.5557538 ``` --- ## Variable importance .small[ ```r library(ranger) ``` ``` ## Warning: package 'ranger' was built under R version 3.5.2 ``` ```r importance(model$fit) ``` ``` ## Age Sex ChestPain RestBP Chol Fbs RestECG ## 8.9760549 4.2966894 15.7186880 7.1357392 7.3301372 0.6465737 1.7371211 ## MaxHR ExAng Oldpeak Slope Ca Thal ## 13.2162856 6.3648232 13.5063487 5.9506314 17.6392069 14.5557538 ``` ] -- .small[ ```r var_imp <- ranger::importance(model$fit) ``` ] --- ## Plotting variable importance .small[ ```r var_imp_df <- data.frame( variable = names(var_imp), importance = var_imp ) var_imp_df %>% ggplot(aes(x = variable, y = importance)) + geom_col() ``` ![](20-variable-importance_files/figure-html/unnamed-chunk-7-1.png)<!-- --> ] -- .question[ How could we make this plot better? ] --- ## Plotting variable importance .small[ ```r var_imp_df %>% ggplot(aes(x = variable, y = importance)) + geom_col() + coord_flip() ``` ![](20-variable-importance_files/figure-html/unnamed-chunk-8-1.png)<!-- --> ] .question[ How could we make this plot better? ] --- ## Plotting variable importance .small[ ```r var_imp_df %>% mutate(variable = factor(variable, levels = variable[order(var_imp_df$importance)])) %>% ggplot(aes(x = variable, y = importance)) + geom_col() + coord_flip() ``` ![](20-variable-importance_files/figure-html/unnamed-chunk-9-1.png)<!-- --> ] ---