library(tidyverse)
library(tidymodels)
library(knitr)
<- read_csv("data/derbyplus.csv") derby
Lecture 02 AE: Review of multiple linear regression
Introduction
Today’s data is from the Kentucky Derby, an annual 1.25-mile horse race held at the Churchill Downs race track in Louisville, KY. The data is in the file derbyplus.csv
in the data
folder. It contains information for races 1896 - 2017.
Response variable
speed
: Average speed of the winner in feet per second (ft/s)
Additional variable
winner
: Winning horse
Predictor variables
year
: Year of the racecondition
: Condition of the track (good, fast, slow)starters
: Number of horses who raced
Goal: Understand variability in average winner speed based on characteristics of the race.
Part 1
Model 1: Main effects model
<- lm(speed ~ starters + year + condition, data = derby)
model1
tidy(model1) |>
kable(digits = 3)
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 8.197 | 4.508 | 1.818 | 0.072 |
starters | -0.005 | 0.017 | -0.299 | 0.766 |
year | 0.023 | 0.002 | 9.766 | 0.000 |
conditiongood | -0.443 | 0.231 | -1.921 | 0.057 |
conditionslow | -1.543 | 0.161 | -9.616 | 0.000 |
[add response here]
[add response here]
[add response here]
Model 2: Main effects + quadratic effect for year
[add response here]
Suppose you have the following model:
\[\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 ~ x_1 + \hat{\beta}_2 ~ x_2 + \hat{\beta}_3 ~ x_2^2\]
The interpretation of a variable’s effect when there is a quadratic term in the model is
“When \(x_2\) increases from a to b, \(y\) is expected to change by \(\hat{\beta}_2(b - a) + \hat{\beta}_3(b^2 - a^2)\), holding \(x_1\) constant.”
[add response here]
Model 3: Main effects + interaction between year
and condition
[add response here]
[add response here]
[add response here]
Part 2
[add response here]
[add response here]