Covariance structure of observations

Prof. Maria Tackett

Mar 18, 2024

Announcements

  • HW 04 due Wed, March 20 at 11:59pm

  • Project 02

    • Draft report due Friday at noon

Topics

  • Define the covariance structure of observations for a given model
  • Understand how the covariance structure of observations differs from the covariance structure of error terms
  • Calculate variance and covariance from model estimates

Notes are based on Section 9.7 of Roback and Legler (2021).

Data: Charter schools in MN

Today’s data set contains standardized test scores and demographic information for schools in Minneapolis, MN from 2008 to 2010. The data were collected by the Minnesota Department of Education. Understanding the effectiveness of charter schools is of particular interest, since they often incorporate unique methods of instruction and learning that differ from public schools.

  • MathAvgScore: Average MCA-II score for all 6th grade students in a school (response variable)
  • urban: urban (1) or rural (0) location school location
  • charter: charter school (1) or a non-charter public school (0)
  • schPctfree: proportion of students who receive free or reduced lunches in a school (based on 2010 figures).
  • year08: Years since 2008

Data

schoolName year08 urban charter schPctfree MathAvgScore
RIPPLESIDE ELEMENTARY 0 0 0 0.363 652.8
RIPPLESIDE ELEMENTARY 1 0 0 0.363 656.6
RIPPLESIDE ELEMENTARY 2 0 0 0.363 652.6
RICHARD ALLEN MATH&SCIENCE ACADEMY 0 1 1 0.545 NA
RICHARD ALLEN MATH&SCIENCE ACADEMY 1 1 1 0.545 NA
RICHARD ALLEN MATH&SCIENCE ACADEMY 2 1 1 0.545 631.2

Exploratory data analysis

Exploratory data analysis

Model

We will use Model C1: Uncontrolled effects for school type.

Yij=α0+α1Charteri+β0Year08ij+β1CharteriYear08ij+ui+viYear08ij+ϵijϵij∼N(0,σ2)[uivi]∼N([00],[σu2σuvσuvσv2])

  1. From Section 9.6.1

What we’ve done

So far we have discussed…

  • the covariance structure between error terms at a given level, e.g. the distribution of between ui and vi from a Level Two model:

[uivi]∼N([00],[σu2σuvσuvσv2])

  • how to use the intraclass correlation coefficient to get an idea of the average correlation between observations nested in the same Level Two group (school)

Questions we want to answer

Now we want to be able to answer more specific questions about the covariance (and correlation) structure of observations at different levels.

  • How does the variability in 2008 and 2010 scores from the same school compare?

  • What is the correlation between 2008 and 2009 scores from the same school? What is the correlation between 2009 and 2010 scores? 2008 and 2010?

Covariance structure

The covariance structure of the three time points (2008, 2009, 2010) for School i is

Cov(Yi)=[Var(Yi1)Cov(Yi1,Yi2)Cov(Yi1,Yi3)Cov(Yi1,Yi2)Var(Yi2)Cov(Yi2,Yi3)Cov(Yi1,Yi3)Cov(Yi2,Yi3)Var(Yi3)]


Do you expect the covariances to be positive or negative? Why?

Covariance structure and error terms

Note that covariance structure of observations is not the same as the error structure at Level Two.

Cov(Yi)≠[uivi]∼N([00],[σu2σuvσuvσv2])

Calculating variance and covariance

Suppose Y1=a1X1+a2X2+a3 and Y2=b1X1+b2X2+b3, where X1 and X2 are random variables and ai and bi are constants for i=1,2,3. Then we know from probability theory that

Var(Y1)=a12Var(X1)+a22Var(X2)+2a1a2Cov(X1,X2)Cov(Y1,Y2)=a1b1Var(X1)+a2b2Var(X2)+(a1b2+a2b1)Cov(X1,X2)

Note

This extends beyond two random variables

We will use these properties to define the covariance structure of the observations in the model.

Variance and covariance for Model C

Var(Yij)=σu2+tij2σv2+σ2+2tijσuvCov(Yij,Yik)=σu2+tijtikσv2+(tij+tik)σuv

where tij is the jth time period for school i.

Let’s see how these equations were derived.

Model estimates

Get the estimates for ρ, σ, σu, and σv from the model output

model <- lmer(MathAvgScore ~ charter + year08 + charter:year08 +
                (year08|schoolid), data = charter)
tidy(model) |> kable(digits = 3)
effect group term estimate std.error statistic
fixed NA (Intercept) 652.058 0.284 2291.998
fixed NA charter1 -6.018 0.866 -6.953
fixed NA year08 1.197 0.094 12.698
fixed NA charter1:year08 0.856 0.314 2.723
ran_pars schoolid sd__(Intercept) 5.986 NA NA
ran_pars schoolid cor__(Intercept).year08 0.880 NA NA
ran_pars schoolid sd__year08 0.362 NA NA
ran_pars Residual sd__Observation 2.964 NA NA

Estimated variances and covariances

Within-school variance for 2008 time point (ti1=0)

Var^(Yi1)=5.9862+02∗0.3622+2.9642+2∗0∗(0.880∗5.986∗0.362)=44.617


Within-school covariance between 2008 and 2009 (ti1=0,ti2=1)

Cov^(Yi1,Yi2)=5.9862+0∗1∗0.3622+(0+1)(0.880∗5.986∗0.362)=37.739

Estimated covariance structure

Cov^(Y)=[44.6237.7439.6537.7448.5641.8139.6541.8152.77]

Correlation between observations

Corr(Y1,Y2)=Cov(Y1,Y2)Var(Y1)Var(Y2)

Corr^(Yi1,Yi2)=37.7444.62∗48.56=0.811

Write the within-school correlation matrix.

Notes on covariance and correlation matrices

  • Often observe higher correlation between observations that are closer in time.

    • Is this the case in the MN schools data?
  • Often observe similar variability in all time points.

    • Is this the case in the MN schools data?
  • Two-level model structure is very flexible. Note that the time points do not need to be evenly spaced nor does each school have to have the same number of measurements.

  • These concepts apply for all multilevel models not just those for longitudinal data.

Other multilevel data

Recall the data from Sadler and Miller (2010) on musicians and performance anxiety and the model

Yij=(α0+α1 Orchestrai+β0 LargeEnsembleij+β1 Orchestrai:LargeEnsembleij)+(ui+vi LargeEnsembleij+ϵij)ϵij∼N(0,σ2)[uivi]∼N([00],[σu2σuvσuvσv2])

  1. Write the equation for Var(Yij).
  2. Write the equation for Cov(Yij,Yik).

Other multilevel data

Var(Yij)={σ2+σu2if Largeij=0σ2+σu2+σv2+2σuvif Largeij=1


Cov(Yij,Yik)={σu2if Largeij=Largeik=0σu2+σuvif Largeij=0, Largeik=1 or vice versaσu2+σv2+2σuvif Largeij=Largeik=1

Note

Every musician will have a unique covariance matrix depending on the number of performances and whether they are large or small ensemble.

Alternative covariance structures

The standard covariance structure calculated from the multilevel model is useful in most situations. Sometimes, however, there may be a different covariance structure that better fits the data. A few alternatives are

Unstructured: Every variance and covariance term for observations with each level is a separate parameter and is uniquely estimated. No patterns among variances or correlations are assumed. Very flexible but requires the estimation of many parameters.

Compound Symmetry: Assume variance is constant across all Level One observations and correlation is constant across all pairs of Level One observations. Restrictive but few parameters to estimate.

Alternative covariance structures

Autoregressive: Assume constant variance across all time points, but correlation reduces in a systematic way such that closer time points are more correlated than those further apart.

Toeplitz: Similar to autoregressive but there is no imposed structure on the decreased correlation for time periods further apart.

Heterogeneous variances: Allows for equal variances across time points. Requires additional parameters to be estimated to allow for the unequal variances.

Trying different covariance structures

  • There is generally little difference in estimates of fixed effects, and the impact on standard errors tends to be minimal.

  • If the primary analysis objective is inference and conclusions for fixed effects, it is often not worth spending too much time modeling different covariance structures.

  • If the analysis is also greatly interested in the random effects and estimated variance components, then the covariance structure can make a difference and it is worth modeling different covariance structures.

Tip

See “Fitting Linear Mixed Models in R” for details on R packages and code for multilevel models with a predetermined covariance structure.

References

Roback, Paul, and Julie Legler. 2021. Beyond multiple linear regression: applied generalized linear models and multilevel models in R. CRC Press.
Sadler, Michael E, and Christopher J Miller. 2010. “Performance Anxiety: A Longitudinal Study of the Roles of Personality and Experience in Musicians.” Social Psychological and Personality Science 1 (3): 280–87.

🔗 STA 310 - Spring 2024

1 / 25
Covariance structure of observations Prof. Maria Tackett Mar 18, 2024

  1. Slides

  2. Tools

  3. Close
  • Covariance structure of observations
  • Announcements
  • Topics
  • Data: Charter schools in MN
  • Data
  • Exploratory data analysis
  • Exploratory data analysis
  • Model
  • What we’ve done
  • Questions we want to answer
  • Covariance structure
  • Covariance structure and error terms
  • Calculating variance and covariance
  • Variance and covariance for Model C
  • Model estimates
  • Estimated variances and covariances
  • Estimated covariance structure
  • Correlation between observations
  • Notes on covariance and correlation matrices
  • Other multilevel data
  • Other multilevel data
  • Alternative covariance structures
  • Alternative covariance structures
  • Trying different covariance structures
  • References
  • f Fullscreen
  • s Speaker View
  • o Slide Overview
  • e PDF Export Mode
  • b Toggle Chalkboard
  • c Toggle Notes Canvas
  • d Download Drawings
  • ? Keyboard Help