STA 310 - Spring 2024 - Multilevel Generalized Linear Models

game	visitor	hometeam	foul.num	foul.home	foul.vis	foul.diff	foul.type	time
1	IA	MN	1	0	1	0	Personal	14.167
1	IA	MN	2	1	0	-1	Personal	11.433
1	IA	MN	3	1	0	0	Personal	10.233
1	IA	MN	4	0	1	1	Personal	9.733
1	IA	MN	5	0	1	0	Shooting	7.767
1	IA	MN	6	0	1	-1	Shooting	5.567
1	IA	MN	7	1	0	-2	Shooting	2.433
1	IA	MN	8	1	0	-1	Offensive	1.000
2	MI	MIST	1	0	1	0	Shooting	18.983
2	MI	MIST	2	1	0	-1	Personal	17.200

effect	group	term	estimate	std.error	statistic	p.value
fixed	NA	(Intercept)	-0.157	0.046	-3.382	0.001
fixed	NA	foul.diff	-0.285	0.038	-7.440	0.000
ran_pars	game	sd__(Intercept)	0.542	NA	NA	NA
ran_pars	game	cor__(Intercept).foul.diff	-1.000	NA	NA	NA
ran_pars	game	sd__foul.diff	0.035	NA	NA	NA

effect	group	term	estimate	std.error	statistic	p.value
fixed	NA	(Intercept)	-0.187	0.044	-4.213	0
fixed	NA	foul.diff	-0.272	0.040	-6.713	0
ran_pars	game	sd__(Intercept)	0.518	NA	NA	NA
ran_pars	game.1	sd__foul.diff	0.043	NA	NA	NA

Crossed random effects

The Level Two covariates are the home team and visiting team
There is some evidence in the EDA that there may be differences in the probability of a foul depending on the home team
We will account for this difference by treating home team and visiting team as random effects in the model
- Issue: Home and visiting team are not nested within game, since a single home and visiting team can be in multiple games
The random effects for game, home team, and visiting team are crossed random effects

Notation

$Y_{i [g h] j}$ : Random variable indicating whether the $j^{t h}$ foul in Game $i$ was called on home team $h$ instead of visiting team $g$

$Y_{i [g h] j} \sim B e r n o u l l i (p_{i [g h] j})$

where $p_{i [g h] j}$ is the true probability a foul in Game $i$ was called on home team $h$ instead of visiting team $g$

Model 3: Models by level

Level One

$\log (\frac{p_{i [g h] j}}{1 - p_{i [g h] j}}) = a_{i} + b_{i} {foul.diff}_{i j}$

Level Two

$\begin{aligned} a_{i} = α_{0} + u_{i} + v_{h} + w_{g} \\ b_{i} = β_{0} \end{aligned}$

$u_{i} \sim N (0, σ_{u}^{2}) v_{h} \sim N (0, σ_{v}^{2}) w_{g} \sim N (0, σ_{w}^{2})$

Model 3: Composite model

$\log (\frac{p_{i [g h] j}}{1 - p_{i [g h] j}}) = α_{0} + β_{0} {foul.diff}_{i j} + [u_{i} + v_{h} + w_{g}]$

$u_{i} \sim N (0, σ_{u}^{2}) v_{h} \sim N (0, σ_{v}^{2}) w_{g} \sim N (0, σ_{w}^{2})$

Why add additional random effects?

Get more precise estimates of fixed effects
Can make comparisons of game-to-game and team-to-team variability
Can get estimated random effects for each team and use them to compare odds of a foul on the home team for different teams

Model 3 in R

model3 <- glmer(foul.home ~ foul.diff + 
                  (1|game) + (1|hometeam) + (1 | visitor),
               data = basketball, family = binomial)

effect	group	term	estimate	std.error	statistic	p.value
fixed	NA	(Intercept)	-0.188	0.063	-2.967	0.003
fixed	NA	foul.diff	-0.264	0.039	-6.795	0.000
ran_pars	game	sd__(Intercept)	0.414	NA	NA	NA
ran_pars	hometeam	sd__(Intercept)	0.261	NA	NA	NA
ran_pars	visitor	sd__(Intercept)	0.152	NA	NA	NA

Model 3 coefficients

effect	group	term	estimate	std.error	statistic	p.value
fixed	NA	(Intercept)	-0.188	0.063	-2.967	0.003
fixed	NA	foul.diff	-0.264	0.039	-6.795	0.000
ran_pars	game	sd__(Intercept)	0.414	NA	NA	NA
ran_pars	hometeam	sd__(Intercept)	0.261	NA	NA	NA
ran_pars	visitor	sd__(Intercept)	0.152	NA	NA	NA

About what percent of the variability in the intercepts is due to…

game-to-game differences?
differences among home teams?
differences among visiting teams?

Keep the crossed random effects?

Given a large proportion of the variability in the intercepts is explained by game-to-game differences, we can assess if the random effects for home team and visiting team are providing useful information.
To do so, we will compare the following models

modela <-  glmer(foul.home ~ foul.diff + (1|game), 
                 data = basketball, family = binomial)

modelb <- glmer(foul.home ~ foul.diff + 
                  (1|game) + (1 | hometeam) + (1|visitor),
                data = basketball, family = binomial)

What parameters are being tested?
Write the null and alternative hypotheses.

Keep the crossed random effects?

We can use the following methods to assess if it is useful to keep the crossed random effects in the model:

Parametric bootstrap confidence intervals
Compare models with and without the random effects using AIC or BIC

Additional methods to use with caution:

Likelihood ratio test based on $χ^{2}$ (unreliable when testing random effects)
Parametric bootstrap likelihood ratio test (can have very long computational time)

Parametric bootstrap CI

set.seed(310)
confint(modelb, method = "boot", oldNames = FALSE) |>
  kable(digits = 3)

	2.5 %	97.5 %
sd_(Intercept)\|game	0.281	0.493
sd_(Intercept)\|hometeam	0.137	0.362
sd_(Intercept)\|visitor	0.000	0.255
(Intercept)	-0.306	-0.069
foul.diff	-0.299	-0.227

AIC and BIC

glance(modela) |> kable(digits = 3)

nobs	sigma	logLik	AIC	BIC	deviance	df.residual
4972	1	-3393.27	6792.54	6812.075	6397.136	4969

glance(modelb) |> kable(digits = 3)

nobs	sigma	logLik	AIC	BIC	deviance	df.residual
4972	1	-3385.233	6780.466	6813.024	6420.534	4967

Full model

$\begin{aligned} \log (\frac{p_{i [g h] j}}{1 - p_{i [g h] j}}) & = α_{0} + β_{0} {foul.diff}_{i j} + γ_{0} {score.diff}_{i j} + \\ + ϕ_{0} {time}_{i j} + κ_{0} {offensive}_{i j} + λ_{0} {personal}_{i j} \\ + μ_{0} {foul.diff}_{i j} {:offensive}_{i j} + ν_{0} {foul.diff}_{i j} {:personal}_{i j} \\ + ω_{0} {foul.diff}_{i j} {:time}_{i j} \\ + [u_{i} + v_{h} + w_{g}] \end{aligned}$ $u_{i} \sim N (0, σ_{u}^{2}) v_{h} \sim N (0, σ_{v}^{2}) w_{g} \sim N (0, σ_{w}^{2})$

full_model <- glmer(foul.home ~ foul.diff + score.diff + time + 
                      offensive + personal + foul.diff:offensive + 
                      foul.diff:personal + foul.diff:time + 
                      (1|game) + (1|hometeam) + (1|visitor),
  family = binomial, data = basketball)

Full model

effect	group	term	estimate	std.error	statistic	p.value
fixed	NA	(Intercept)	-0.336	0.100	-3.347	0.001
fixed	NA	foul.diff	-0.169	0.046	-3.699	0.000
fixed	NA	score.diff	0.035	0.006	6.241	0.000
fixed	NA	time	0.005	0.006	0.854	0.393
fixed	NA	offensive	-0.077	0.111	-0.696	0.487
fixed	NA	personal	0.073	0.065	1.114	0.265
fixed	NA	foul.diff:offensive	-0.102	0.054	-1.894	0.058
fixed	NA	foul.diff:personal	-0.056	0.032	-1.753	0.080
fixed	NA	foul.diff:time	-0.009	0.003	-2.764	0.006
ran_pars	game	sd__(Intercept)	0.425	NA	NA	NA
ran_pars	hometeam	sd__(Intercept)	0.278	NA	NA	NA
ran_pars	visitor	sd__(Intercept)	0.208	NA	NA	NA

Conclusions from full model

Based on the full model, what are your conclusions about the factors that impact the odds of a foul on the home team?

See Section 11.3.1 of Roback and Legler (2021) or full codebook.

Multilevel Generalized Linear Models

Announcements

Topics

Data: College Basketball referees

Data: College basketball referees

Model 1: Composite model

Model 1 in R

Boundary constraints

Illustrating boundary constraints

Model 2: Refit model without $ρ_{u v}$

Model 2

Crossed random effects

Crossed random effects

Notation

Model 3: Models by level

Model 3: Composite model

Model 3 in R

Model 3 coefficients

Keep the crossed random effects?

Keep the crossed random effects?

Parametric bootstrap CI

AIC and BIC

Full model

Full model

Conclusions from full model

Estimated random effects

Estimated random effects for each team

Estimated random effects for each team

Distribution of random home team effects

Estimated home random effects by team

Code

References