library(tidyverse)
library(broom)
library(knitr)
# add other packages as needed
Lecture 08: Logistic regression
Binomial responses and overdispersion
Data: Supporting railroads in the 1870s
The data set RR_Data_Hale.csv
contains information on support for referendums related to railroad subsidies for 11 communities in Hale County, Alabama in the 1870s. The data were originally collected from the US Census by historian Michael Fitzgerald and analyzed as part of a thesis project by a student at St. Olaf College. The variables in the data are
pctBlack
: percentage of Black residents in the countydistance
: distance the proposed railroad is from the community (in miles)YesVotes
: number of “yes” votes in favor of the proposed railroad lineNumVotes
: number of votes cast in the election
<- read_csv("data/RR_Data_Hale.csv")
rr
<- rr |>
rr mutate(pctYes = YesVotes/NumVotes,
emp_logit = log(pctYes / (1 - pctYes)),
inFavor = if_else(pctYes > 0.5, "Yes", "No"))
Part 1
<- glm(cbind(YesVotes, NumVotes - YesVotes) ~ distance + pctBlack,
rr_model data = rr, family = binomial)
tidy(rr_model, conf.int = TRUE) |>
kable(digits = 3)
term | estimate | std.error | statistic | p.value | conf.low | conf.high |
---|---|---|---|---|---|---|
(Intercept) | 4.222 | 0.297 | 14.217 | 0.000 | 3.644 | 4.809 |
distance | -0.292 | 0.013 | -22.270 | 0.000 | -0.318 | -0.267 |
pctBlack | -0.013 | 0.004 | -3.394 | 0.001 | -0.021 | -0.006 |
Alternate model syntax
<- glm(pctYes ~ distance + pctBlack, data = rr,
rr_model_alt family = binomial, weight = NumVotes)
tidy(rr_model_alt, conf.int = TRUE) |>
kable(digits = 3)
term | estimate | std.error | statistic | p.value | conf.low | conf.high |
---|---|---|---|---|---|---|
(Intercept) | 4.222 | 0.297 | 14.217 | 0.000 | 3.644 | 4.809 |
distance | -0.292 | 0.013 | -22.270 | 0.000 | -0.318 | -0.267 |
pctBlack | -0.013 | 0.004 | -3.394 | 0.001 | -0.021 | -0.006 |
Interpret the coefficient of distance in the context of the data.
Use a likelihood ratio test or drop-in-deviance test to determine if the interaction between distance
and pctBlack
should be added to the model.
# code to test interaction
Use the model selected in the previous exercise. Interpret the effect of the demographics for a community that is…
Right on the proposed railroad (distance = 0)
15 miles away from the proposed railroad (distance = 15)
Conduct the appropriate test to assess if the model selected in Exercise 2 is good fit for the data.
# code for goodness-of-fit test
Part 2
Fit the quasibinomial model. How did the coefficients change from the original model? How did the standard errors change?
# code to fit quasibinomial model
Based on the results from Exercise 5, what might be your next step in the analysis? If possible, conduct that step below.
# code for next step