library(tidyverse)
library(tidymodels)
library(knitr)
derby <- read_csv("data/derbyplus.csv")Lecture 02 AE: Review of multiple linear regression
Introduction
Today’s data is from the Kentucky Derby, an annual 1.25-mile horse race held at the Churchill Downs race track in Louisville, KY. The data is in the file derbyplus.csv in the data folder. It contains information for races 1896 - 2017.
Response variable
speed: Average speed of the winner in feet per second (ft/s)
Additional variable
winner: Winning horse
Predictor variables
year: Year of the racecondition: Condition of the track (good, fast, slow)starters: Number of horses who raced
Goal: Understand variability in average winner speed based on characteristics of the race.
Part 1
Model 1: Main effects model
model1 <- lm(speed ~ starters + year + condition, data = derby)
tidy(model1) |>
kable(digits = 3)| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 8.197 | 4.508 | 1.818 | 0.072 |
| starters | -0.005 | 0.017 | -0.299 | 0.766 |
| year | 0.023 | 0.002 | 9.766 | 0.000 |
| conditiongood | -0.443 | 0.231 | -1.921 | 0.057 |
| conditionslow | -1.543 | 0.161 | -9.616 | 0.000 |
[add response here]
[add response here]
[add response here]
Model 2: Main effects + quadratic effect for year
[add response here]
Suppose you have the following model:
The interpretation of a variable’s effect when there is a quadratic term in the model is
“When
[add response here]
Model 3: Main effects + interaction between year and condition
[add response here]
[add response here]
[add response here]
Part 2
[add response here]
[add response here]