Define the covariance structure of observations for a given model
Understand how the covariance structure of observations differs from the covariance structure of error terms
Calculate variance and covariance from model estimates
Data: Charter schools in MN
Today’s data set contains standardized test scores and demographic information for schools in Minneapolis, MN from 2008 to 2010. The data were collected by the Minnesota Department of Education. Understanding the effectiveness of charter schools is of particular interest, since they often incorporate unique methods of instruction and learning that differ from public schools.
MathAvgScore: Average MCA-II score for all 6th grade students in a school (response variable)
urban: urban (1) or rural (0) location school location
charter: charter school (1) or a non-charter public school (0)
schPctfree: proportion of students who receive free or reduced lunches in a school (based on 2010 figures).
year08: Years since 2008
Data
schoolName
year08
urban
charter
schPctfree
MathAvgScore
RIPPLESIDE ELEMENTARY
0
0
0
0.363
652.8
RIPPLESIDE ELEMENTARY
1
0
0
0.363
656.6
RIPPLESIDE ELEMENTARY
2
0
0
0.363
652.6
RICHARD ALLEN MATH&SCIENCE ACADEMY
0
1
1
0.545
NA
RICHARD ALLEN MATH&SCIENCE ACADEMY
1
1
1
0.545
NA
RICHARD ALLEN MATH&SCIENCE ACADEMY
2
1
1
0.545
631.2
Exploratory data analysis
Exploratory data analysis
Model
We will use Model C1: Uncontrolled effects for school type.
how to use the intraclass correlation coefficient to get an idea of the average correlation between observations nested in the same Level Two group (school)
Questions we want to answer
Now we want to be able to answer more specific questions about the covariance (and correlation) structure of observations at different levels.
How does the variability in 2008 and 2010 scores from the same school compare?
What is the correlation between 2008 and 2009 scores from the same school? What is the correlation between 2009 and 2010 scores? 2008 and 2010?
Covariance structure
The covariance structure of the three time points (2008, 2009, 2010) for School \(i\) is
Suppose \(Y_1 = a_1 X_1 + a_2 X_2 + a_3\) and \(Y_2 = b_1 X_1 + b_2 X_2 + b_3\), where \(X_1\) and \(X_2\) are random variables and \(a_i\) and \(b_i\) are constants for \(i = 1, 2, 3\). Then we know from probability theory that
Often observe higher correlation between observations that are closer in time.
Is this the case in the MN schools data?
Often observe similar variability in all time points.
Is this the case in the MN schools data?
Two-level model structure is very flexible. Note that the time points do not need to be evenly spaced nor does each school have to have the same number of measurements.
These concepts apply for all multilevel models not just those for longitudinal data.
Other multilevel data
Recall the data from Sadler and Miller (2010) on musicians and performance anxiety and the model
Every musician will have a unique covariance matrix depending on the number of performances and whether they are large or small ensemble.
Alternative covariance structures
The standard covariance structure calculated from the multilevel model is useful in most situations. Sometimes, however, there may be a different covariance structure that better fits the data. A few alternatives are
Unstructured: Every variance and covariance term for observations with each level is a separate parameter and is uniquely estimated. No patterns among variances or correlations are assumed. Very flexible but requires the estimation of many parameters.
Compound Symmetry: Assume variance is constant across all Level One observations and correlation is constant across all pairs of Level One observations. Restrictive but few parameters to estimate.
Alternative covariance structures
Autoregressive: Assume constant variance across all time points, but correlation reduces in a systematic way such that closer time points are more correlated than those further apart.
Toeplitz: Similar to autoregressive but there is no imposed structure on the decreased correlation for time periods further apart.
Heterogeneous variances: Allows for equal variances across time points. Requires additional parameters to be estimated to allow for the unequal variances.
Trying different covariance structures
There is generally little difference in estimates of fixed effects, and the impact on standard errors tends to be minimal.
If the primary analysis objective is inference and conclusions for fixed effects, it is often not worth spending too much time modeling different covariance structures.
If the analysis is also greatly interested in the random effects and estimated variance components, then the covariance structure can make a difference and it is worth modeling different covariance structures.
Tip
See “Fitting Linear Mixed Models in R” for details on R packages and code for multilevel models with a predetermined covariance structure.
References
Roback, Paul, and Julie Legler. 2021. Beyond multiple linear regression: applied generalized linear models and multilevel models in R. CRC Press.
Sadler, Michael E, and Christopher J Miller. 2010. “Performance Anxiety: A Longitudinal Study of the Roles of Personality and Experience in Musicians.”Social Psychological and Personality Science 1 (3): 280–87.