Feb 19, 2024
Quiz 02: Tue, Feb 20, 9am - Thu, Feb 22, 12pm (noon)
Covers readings & lectures: Jan 24 - Feb 12
Poisson regression, unifying framework for GLMs, logistic regression, proportional odds models, probit regression
Read Sadler and Miller (2010) as part of Feb 26 prepare assignment
Correlation among outcomes within the same group (teacher, patient, household) is called intraclass correlation
We can think of correlated data as a multilevel structure
Population elements are aggregated into groups
There are observational units and measurements at each level
For now we will focus on data with two levels:
Example: education study
Level one
Observational units: students
Level-one covariates: test scores (response), year in school, demographics
Level two
Observational units: teachers
Level-two covariates:years of experience
Researchers are interested in understanding the effect social media has on opinions about a proposed economic plan. They randomly select 1000 households. They ask each adult in the household how many minutes they spend on social media daily and whether they support the proposed economic plan.
Radon is a carcinogen – a naturally occurring radioactive gas whose decay products are also radioactive – known to cause lung cancer in high concentrations. The EPA sampled more than 80,000 homes across the U.S. Each house came from a randomly selected county and measurements were made on each level of each home. Uranium measurements at the county level were included to improve the radon estimates.1
What are the level one and level two observational units?
What is the response variable and what is its type (normal, Poisson, etc.)?
Describe the within-group correlation.
What are the fixed effects? What are the random effects?
Today’s data are simulated results of an experiment with 24 dams (mother rats) randomly divided into four groups that received different doses of teratogen, a substance that could potentially cause harm to developing fetuses. The four groups are
Each dam produced 10 rat pups and the presence of a deformity was noted for each pup.
Goal: Understand the association between teratogen exposure and the probability a pup is born with a deformity.
Dose effect: Studying whether different dose levels are associated with different probabilities of birth defects in the pups.
Dam (litter) effect: Different dams may have different propensity to produce pups with defects, i.e. pups from same litter are more likely to be similar than pups from different litters.
Pup-to-pup variability: Within litter pup differences. This is the unexplained variability.
Assume dose has no effect on, \(p\), the probability of a pup born with a deformity.
Scenario 1a.: \(p = 0.5\) for each dam
Scenario 1b.: \(p \sim Beta(0.5, 0.5)\) (expected value = 0.5)
Would you expect the number of pups with a deformity for dams in Scenario 1a to follow a distribution similar to the binomial distribution with \(n=10\) and \(p=0.5\)? Why or why not?
Would you expect the number of pups with a deformity for dams in Scenario 1b to follow a distribution similar to the binomial distribution with \(n=10\) and \(p=0.5\)? Why or why not?
Which scenario do you think is more realistic - Scenario 1a or 1b?
mean_1a | sd_1a | mean_1b | sd_1b |
---|---|---|---|
5.166667 | 1.493949 | 5.666667 | 4.103727 |
Let’s take a look at a binomial and quasibinomial model for Scenarios 1a and 1b.
08:00
Now we will consider the effect of the dose of teratogen on the probability of a pup born with a deformity. The 24 pups have been randomly divided into four groups:
dose = 3
)dose = 2
)dose = 1
)dose = 0
)We will assume the true relationship between \(p\) and dose is the following:
\[\log\Big(\frac{p}{1-p}\Big) = -2 + 1.33 ~ dose\]
Scenario 2a.
\[p = \frac{e^{-2 + 1.33 ~ dose}}{1 + e^{-2 + 1.33 ~ dose}}\]
Scenario 2b.
\[p \sim Beta\Big(\frac{2p}{(1-p)}, 2\Big)\]
On average, dams who receive dose \(x\) have the same probability of pup born with deformity as dams with dose \(x\) under Scenario 2a.
dose = 1
, the probability of a pup born with deformity is 0.338 in Scenario 2a and the mean is 0.338 in Scenario 2b.
Scenario 2a
|
Scenario 2b
|
|||||||
---|---|---|---|---|---|---|---|---|
Dosage | Mean p | SD p | Mean Count | SD Count | Mean p | SD p | Mean Count | SD Count |
0 | 0.119 | 0 | 1.333 | 1.366 | 0.061 | 0.069 | 0.500 | 0.837 |
1 | 0.339 | 0 | 3.167 | 1.835 | 0.239 | 0.208 | 3.500 | 2.881 |
2 | 0.661 | 0 | 5.833 | 1.472 | 0.615 | 0.195 | 5.833 | 1.941 |
3 | 0.881 | 0 | 8.833 | 1.169 | 0.872 | 0.079 | 8.833 | 1.169 |
The estimated effect of dose and the 95% CI from the binomial and quasibinomial models are below:
Scenario 2a
Odds Ratio | 95% CI | |
---|---|---|
Binomial | 3.536 | (2.604, 4.958) |
Quasibinomial | 3.536 | (2.512, 5.186) |
Scenario 2b
Odds Ratio | 95% CI | |
---|---|---|
Binomial | 4.311 | (3.086, 6.271) |
Quasibinomial | 4.311 | (2.735, 7.352) |
library(lme4)
library(broom.mixed)
random_effect_model <- glmer(deformity ~ dose + (1|dam),
family = binomial, data = scenario2_raw)
tidy(random_effect_model, conf.int = TRUE) |>
kable(digits = 3)
effect | group | term | estimate | std.error | statistic | p.value | conf.low | conf.high |
---|---|---|---|---|---|---|---|---|
fixed | NA | (Intercept) | -2.819 | 0.528 | -5.343 | 0 | -3.853 | -1.785 |
fixed | NA | dose | 1.691 | 0.282 | 5.992 | 0 | 1.138 | 2.244 |
ran_pars | dam | sd__(Intercept) | 0.834 | NA | NA | NA | NA | NA |
The structure of the data set may imply correlation between observations.
Correlated observations provide less information than independent observations; we need to account for this reduction in information.
Failing to account for this reduction could result in underestimating standard error, thus resulting in overstating significance and the precision of the estimates.
We showed how we can account for this by incorporating the dispersion parameter or a random effect.