library(tidyverse)
library(tidymodels)
library(knitr)
<- read_csv("data/derbyplus.csv") derby
Lecture 02 AE: Review of multiple linear regression
Introduction
Today’s data is from the Kentucky Derby, an annual 1.25-mile horse race held at the Churchill Downs race track in Louisville, KY. The data is in the file derbyplus.csv
in the data
folder. It contains information for races 1896 - 2017.
Response variable
speed
: Average speed of the winner in feet per second (ft/s)
Additional variable
winner
: Winning horse
Predictor variables
year
: Year of the racecondition
: Condition of the track (good, fast, slow)starters
: Number of horses who raced
Goal: Understand variability in average winner speed based on characteristics of the race.
Part 1
Model 1: Main effects model
<- lm(speed ~ starters + year + condition, data = derby)
model1
tidy(model1) |>
kable(digits = 3)
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 8.197 | 4.508 | 1.818 | 0.072 |
starters | -0.005 | 0.017 | -0.299 | 0.766 |
year | 0.023 | 0.002 | 9.766 | 0.000 |
conditiongood | -0.443 | 0.231 | -1.921 | 0.057 |
conditionslow | -1.543 | 0.161 | -9.616 | 0.000 |
[add response here]
[add response here]
[add response here]
Model 2: Main effects + quadratic effect for year
[add response here]
Suppose you have the following model:
The interpretation of a variable’s effect when there is a quadratic term in the model is
“When
[add response here]
Model 3: Main effects + interaction between year
and condition
[add response here]
[add response here]
[add response here]
Part 2
[add response here]
[add response here]