class: center, middle, inverse, title-slide # Logistic regression: Odds Ratios ### Dr. D’Agostino McGowan --- layout: true <div class="my-footer"> <span> Dr. Lucy D'Agostino McGowan </span> </div> --- ## A bit about "odds" * The "odds" tell you how likely an event is * 🌂 Let's say there is a 60% chance of rain today * What is the probability that it will rain? -- * `\(p = 0.6\)` -- * What is the probability that it **won't** rain? -- * `\(1-p = 0.4\)` -- * What are the **odds** that it will rain? -- * 3 to 2, 3:2, `\(\frac{0.6}{0.4} = 1.5\)` --- ## Transforming logs * How do you "undo" a `\(\log\)` base `\(e\)`? -- * Use `\(e\)`! For example: * `\(e^{\log(10)} = 10\)` -- * `\(e^{\log(1283)} = 1283\)` -- * `\(e^{\log(x)} = x\)` --- ## Transforming logs .question[ How would you get the **odds** from the **log(odds)**? ] * How do you "undo" a `\(\log\)` base `\(e\)`? * Use `\(e\)`! For example: * `\(e^{\log(10)} = 10\)` * `\(e^{\log(1283)} = 1283\)` * `\(e^{\log(x)} = x\)` -- * `\(e^{\log(odds)}\)` = odds --- ## Transforming odds * odds = `\(\frac{\pi}{1-\pi}\)` * Solving for `\(\pi\)` * `\(\pi = \frac{\textrm{odds}}{1+\textrm{odds}}\)` -- * Plugging in `\(e^{\log(odds)}\)` = odds -- * `\(\pi = \frac{e^{\log(odds)}}{1+e^{\log(odds)}}\)` -- * Plugging in `\(\log(odds) = \beta_0 + \beta_1x\)` -- * `\(\pi = \frac{e^{\beta_0 + \beta_1x}}{1+e^{\beta_0 + \beta_1x}}\)` --- ## The logistic model * ✌️ forms Form | Model -----|------ Logit form | `\(\log\left(\frac{\pi}{1-\pi}\right) = \beta_0 + \beta_1x\)` Probability form | `\(\Large\pi = \frac{e^{\beta_0 + \beta_1x}}{1+e^{\beta_0 + \beta_1x}}\)` --- ## The logistic model probability | odds | log(odds) --|--|-- `\(\pi\)` | `\(\frac{\pi}{1-\pi}\)` | `\(\log\left(\frac{\pi}{1-\pi}\right)=l\)` -- .center[ # ⬅️ ] log(odds) | odds | probability --|--|-- `\(l\)` | `\(e^l\)` | `\(\frac{e^l}{1+e^l} = \pi\)` --- ## The logistic model * ✌️ forms -- * **log(odds)**: `\(l \approx \beta_0 + \beta_1x\)` -- * **P(Outcome = Yes)**: `\(\Large\pi \approx\frac{e^{\beta_0 + \beta_1x}}{1+e^{\beta_0 + \beta_1x}}\)` --- # Odds ratios A study investigated whether a handheld device that sends a magnetic pulse into a person’s head might be an effective treatment for migraine headaches. * Researchers recruited 200 subjects who suffered from migraines * randomly assigned them to receive either the TMS (transcranial magnetic stimulation) treatment or a placebo treatment * Subjects were instructed to apply the device at the onset of migraine symptoms and then assess how they felt two hours later. (either **Pain-free** or **Not pain-free**) --- # Odds ratios .question[ What is the explanatory variable? ] A study investigated whether a handheld device that sends a magnetic pulse into a person’s head might be an effective treatment for migraine headaches. * Researchers recruited 200 subjects who suffered from migraines * randomly assigned them to receive either the TMS (transcranial magnetic stimulation) treatment or a placebo treatment * Subjects were instructed to apply the device at the onset of migraine symptoms and then assess how they felt two hours later (either **Pain-free** or **Not pain-free**) --- # Odds ratios .question[ What type of variable is this? ] A study investigated whether a handheld device that sends a magnetic pulse into a person’s head might be an effective treatment for migraine headaches. * Researchers recruited 200 subjects who suffered from migraines * randomly assigned them to receive either the TMS (transcranial magnetic stimulation) treatment or a placebo treatment * Subjects were instructed to apply the device at the onset of migraine symptoms and then assess how they felt two hours later (either **Pain-free** or **Not pain-free**) --- # Odds ratios .question[ What is the outcome variable? ] A study investigated whether a handheld device that sends a magnetic pulse into a person’s head might be an effective treatment for migraine headaches. * Researchers recruited 200 subjects who suffered from migraines * randomly assigned them to receive either the TMS (transcranial magnetic stimulation) treatment or a placebo treatment * Subjects were instructed to apply the device at the onset of migraine symptoms and then assess how they felt two hours later (either **Pain-free** or **Not pain-free**) --- # Odds ratios .question[ What type of variable is this? ] A study investigated whether a handheld device that sends a magnetic pulse into a person’s head might be an effective treatment for migraine headaches. * Researchers recruited 200 subjects who suffered from migraines * randomly assigned them to receive either the TMS (transcranial magnetic stimulation) treatment or a placebo treatment * Subjects were instructed to apply the device at the onset of migraine symptoms and then assess how they felt two hours later (either **Pain-free** or **Not pain-free**) --- # Odds ratios |TMS | Placebo| Total ---|---|----|--- Pain-free two hours later |39| 22 |61 Not pain-free two hours later |61 |78| 139 Total| 100| 100 |200 -- * We can compare the results using **odds** -- * What are the odds of being pain-free for the placebo group? -- * `\((22/100)/(78/100) = 22/78 = 0.282\)` -- * What are the odds of being pain-free for the treatment group? -- * `\(39/61 = 0.639\)` -- * Comparing the **odds** what can we conclude? -- * TMS increases the likelihood of success --- # Odds ratios |TMS | Placebo| Total ---|---|----|--- Pain-free two hours later |39| 22 |61 Not pain-free two hours later |61 |78| 139 Total| 100| 100 |200 * We can summarize this relationship with an **odds ratio**: the ratio of the two odds -- `\(\Large OR = \frac{39/61}{22/78} = \frac{0.639}{0.282} = 2.27\)` -- _"the odds of being pain free were 2.27 times higher with TMS than with the placebo"_ --- # Odds ratios .question[ What if we wanted to calculate this in terms of _Not pain-free_ (with _pain-free_) as the referent? ] |TMS | Placebo| Total ---|---|----|--- Pain-free two hours later |39| 22 |61 Not pain-free two hours later |61 |78| 139 Total| 100| 100 |200 -- `\(\Large OR = \frac{61/39}{78/22} = \frac{1.564}{3.545} = 0.441\)` -- _the odds for still being in pain for the TMS group are 0.441 times the odds of being in pain for the placebo group_ --- # Odds ratios .question[ What changed here? ] |TMS | Placebo| Total ---|---|----|--- Pain-free two hours later |39| 22 |61 Not pain-free two hours later |61 |78| 139 Total| 100| 100 |200 `\(\Large OR = \frac{78/22}{61/39} = \frac{3.545}{1.564} = 2.27\)` _the odds for still being in pain for the placebo group are 2.27 times the odds of being in pain for the TMS group_ --- # Odds ratios * In general, it's more natural to interpret odds ratios > 1, you can flip the referent to do so |TMS | Placebo| Total ---|---|----|--- Pain-free two hours later |39| 22 |61 Not pain-free two hours later |61 |78| 139 Total| 100| 100 |200 `\(\Large OR = \frac{78/22}{61/39} = \frac{3.545}{1.564} = 2.27\)` _the odds for still being in pain for the placebo group are 2.27 times the odds of being in pain for the TMS group_ --- ## <i class="fas fa-laptop"></i> `Application Exercise` 5000 women were enrolled in a study and were randomly assigned to receive either letrozole or a placebo. The primary response variable of interest was disease-free survival. |letrozole | placebo | total ---|----------|---------|------ death or disease| 185 |341| 526 no death or disease| 2390| 2241| 4631 total| 2575| 2582| 5157 * calculate the odds ratio of death or disease in the placebo group versus the treatment group * calculate the odds ratio of no death or disease in the treatment group versus the placebo group * calculate the odds ratio of death or disease in the treatment group versus the placebo group --- ## Odds ratios * Let's look at some Titanic data. We are interested in whether the sex of the passenger is related to whether they survived. |Female | Male | Total ---------|-------|-----|----- Survived| 308 | 142 | 450 Died | 154 | 709| 863 Total | 462 | 851 | 1313 --- ## Odds ratios .question[ What are the odds of surviving for females versus males? ] * Let's look at some Titanic data. We are interested in whether the sex of the passenger is related to whether they survived. |Female | Male | Total ---------|-------|-----|----- Survived| 308 | 142 | 450 Died | 154 | 709| 863 Total | 462 | 851 | 1313 -- .center[ `\(\Large OR = \frac{308/154}{142/709} = \frac{2}{0.2} = 9.99\)` ] --- ## Odds ratios .question[ How do you interpret this? ] |Female | Male | Total ---------|-------|-----|----- Survived| 308 | 142 | 450 Died | 154 | 709| 863 Total | 462 | 851 | 1313 .center[ `\(\Large OR = \frac{308/154}{142/709} = \frac{2}{0.2} = 9.99\)` ] -- _the odds of surviving for the female passengers was 9.99 times the odds of surviving for the male passengers_ --- ## Odds ratios .question[ What if we wanted to fit a **model**? What would the equation be? ] |Female | Male | Total ---------|-------|-----|----- Survived| 308 | 142 | 450 Died | 154 | 709| 863 Total | 462 | 851 | 1313 -- .center[ `\(\Large \log(\textrm{odds of survival}) \approx \beta_0 + \beta_1 \textrm{Sex}\)` ] --- ## Odds ratios .center[ `\(\Large \log(\textrm{odds of survival}) \approx \beta_0 + \beta_1 \textrm{Sex}\)` ] .small[ ```r logistic_reg() %>% set_engine("glm") %>% fit(Survived ~ Sex, data = Titanic) %>% tidy() ``` ``` ## # A tibble: 2 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) -1.61 0.0919 -17.5 1.70e-68 ## 2 Sexfemale 2.30 0.135 17.1 2.91e-65 ``` ] -- .question[ What is my referent category? ] --- ## Odds Ratios .question[ How do you interpret this result? ] .small[ ```r logistic_reg() %>% set_engine("glm") %>% fit(Survived ~ Sex, data = Titanic) %>% tidy() ``` ``` ## # A tibble: 2 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) -1.61 0.0919 -17.5 1.70e-68 ## 2 Sexfemale 2.30 0.135 17.1 2.91e-65 ``` ] --- ## Odds Ratios .question[ How do you interpret this result? ] .small[ ```r logistic_reg() %>% set_engine("glm") %>% fit(Survived ~ Sex, data = Titanic) %>% * tidy(exponentiate = TRUE) ``` ``` ## # A tibble: 2 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 0.200 0.0919 -17.5 1.70e-68 ## 2 Sexfemale 9.99 0.135 17.1 2.91e-65 ``` ```r exp(2.301176) ``` ``` ## [1] 9.99 ``` ] _the odds of surviving for the female passengers was 9.99 times the odds of surviving for the male passengers_ --- ## Odds ratios * What if the explanatory variable is continuous? -- .small[ ```r data("MedGPA") MedGPA$Acceptance <- as.factor(MedGPA$Acceptance) ``` ```r logistic_reg() %>% set_engine("glm") %>% fit(Acceptance ~ GPA, data = MedGPA) %>% tidy() ``` ``` ## # A tibble: 2 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) -19.2 5.63 -3.41 0.000644 ## 2 GPA 5.45 1.58 3.45 0.000553 ``` ] -- _A one unit increase in GPA yields a 5.45 increase in the log odds of acceptance_ --- ## Odds ratios * What if the explanatory variable is continuous? .small[ ```r logistic_reg() %>% set_engine("glm") %>% fit(Acceptance ~ GPA, data = MedGPA) %>% * tidy(exponentiate = TRUE) ``` ``` ## # A tibble: 2 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 4.56e-9 5.63 -3.41 0.000644 ## 2 GPA 2.34e+2 1.58 3.45 0.000553 ``` ] -- _A one unit increase in GPA yields a 234-fold increase in the odds of acceptance_ -- * 😱 that seems huge! **Remember:** the interpretation of these coefficients depends on your units (the same as in ordinary linear regression). --- ## Odds ratios .question[ How could we get the odds associated with increasing GPA by 0.1? ] .small[ ```r logistic_reg() %>% set_engine("glm") %>% fit(Acceptance ~ GPA, data = MedGPA) %>% tidy() ``` ``` ## # A tibble: 2 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) -19.2 5.63 -3.41 0.000644 ## 2 GPA 5.45 1.58 3.45 0.000553 ``` ] -- .small[ ```r exp(5.454) ## a one unit increase in GPA ``` ``` ## [1] 234 ``` ```r exp(5.454 * 0.1) ## a 0.1 increase in GPA ``` ``` ## [1] 1.73 ``` ] -- _A one-tenth unit increase in GPA yields a 1.73-fold increase in the odds of acceptance_ --- ## Odds ratios .question[ How could we get the odds associated with increasing GPA by 0.1? ] .small[ ```r MedGPA <- MedGPA %>% mutate(GPA_10 = GPA * 10) logistic_reg() %>% set_engine("glm") %>% fit(Acceptance ~ GPA_10, data = MedGPA) %>% tidy(exponentiate = TRUE) ``` ``` ## # A tibble: 2 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 0.00000000456 5.63 -3.41 0.000644 ## 2 GPA_10 1.73 0.158 3.45 0.000553 ``` ] _A one-tenth unit increase in GPA yields a 1.73-fold increase in the odds of acceptance_ --- class: inverse ## <svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 640 512"><path d="M512 64v256H128V64h384m16-64H112C85.5 0 64 21.5 64 48v288c0 26.5 21.5 48 48 48h416c26.5 0 48-21.5 48-48V48c0-26.5-21.5-48-48-48zm100 416H389.5c-3 0-5.5 2.1-5.9 5.1C381.2 436.3 368 448 352 448h-64c-16 0-29.2-11.7-31.6-26.9-.5-2.9-3-5.1-5.9-5.1H12c-6.6 0-12 5.4-12 12v36c0 26.5 21.5 48 48 48h544c26.5 0 48-21.5 48-48v-36c0-6.6-5.4-12-12-12z"/></svg> `Application Exercise` Using the `Default` data from the `ISLR` package, fit a logistic regression model predicting whether a customer `defaults` with whether they are a `student` and their current `balance`. Here is some code to get you started: ```r library(ISLR) data("Default") ```