HW 03: Logistic regression + Multilevel models
- This assignment is due on Wednesday, February 28 at 11:59pm with a grace period (i.e., no late penalty) until Thursday, February 29 noon (12pm).
- Your access to the repo will be removed at the end of the grace period. If you wish to submit the HW late, please email me and I will extend your access to the repo.
- You will have access to your HW repo again when grades are returned.
Instructions
- Write all narrative using full sentences. Write all interpretations and conclusions in the context of the data.
- Be sure all analysis code is displayed in the rendered pdf.
- If you are fitting a model, display the model output in a neatly formatted table. (The
tidy
andkable
functions can help!) - If you are creating a plot, use clear and informative labels and titles.
- Render, commit, and push your work to GitHub regularly, at least after each exercise. Write short and informative commit messages.
- When you’re done, we should be able to render the final version of the Quarto document in your GitHub repo to fully reproduce your pdf.
Exercises
Exercise 1
The data for this exercise come from a study about the satisfaction with housing conditions in Copenhagen. The data were obtained from Dobson and Barnett (2018) and are originally from Madsen (1976). Residents in selected areas living in rented homes built between 1960 and 1968 were asked about their satisfaction and the level of contact with other residents.
The data are in madsen1971.csv
in the data
folder of your GitHub repo. The variables in the data are
satisfaction
: Overall satisfaction with housing conditions (Low, Medium, High)contact
: Level of contact with other residents (Low, High)type
: Type of housing unit (Tower block, Apartment, House)
The goal of the analysis is to use type
and contact
to understand satisfaction
.
Visualize the relationship of the response versus each predictor variable. Write 1 - 2 observations from the visualizations.
Do you think it would be appropriate to fit an ordinal model for this analysis? Briefly explain.
Fit the proportional odds model. Display the model.
Calculate the following probabilities for a resident living in a house who has high contact with other residents:
\(P(\text{Satisfaction} \leq \text{Medium})\)
\(P(\text{Satisfaction} = \text{Medium})\)
Exercise 2
Ibanez and Roussel (2022) conducted an experiment to understand the impact of watching a nature documentary on pro-environmental behavior. The researchers randomly assigned the 113 participants to watch an video about architecture in NYC (control) or a video about Yellowstone National Park (treatment). As part of the experiment, participants played a game in which they had an opportunity to donate to an environmental organization.
The data set is available in nature-experiment.csv
in the data
folder. We will use the following variables:
donation_binary
:- 1 - participant donated to environmental organization
- 0 - participant did not donate
Age
: Age in yearsGender
: Participant’s reported genderTreatment
:- “Urban (T1)” - the control group,
- “Nature (T2)” - the treatment group
NEP_high
:- 1 - score of 4 or higher on the New Ecological Paradigm (NEP)
- 0 - score less than 4
See the Introduction and Methods sections of Ibanez and Roussel (2022) for more detail about the variables.
Click here to access the paper on Canvas.
Figure 2 on pg. 9 of the article visualizes the relationship between donation amount and treatment. Use the visualization to describe the relationship between donating and the treatment.
Fit a probit regression model using
age
,gender
,treatment
,nep_high
and the interaction betweennep_high
andtreatment
predict the likelihood of donating. (Note: Your model will be similar (but not exactly the same) as the “Likelihood” model in Table 5 on pg. 11.) Display the model.Describe the effect of watching the documentary on the likelihood of donating.
Based on the model, what is the predicted probability of donating for a 20-year old female in the treatment group with a NEP score of 3?
Exercise 3
Brown and Uyar (2004) describe “A Hierarchical Linear Model Approach for Assessing the Effects of House and Neighborhood Characteristics on Housing Prices”.
Give the observational units at Level One and Level Two based on the title of the paper.
Why can’t we assume all houses in the data set are independent?
Suppose we have the following set of predictors: Square footage (
sqft
), rating of neighborhood schools (rating
), median neighborhood housing price (medprice
). Write the two-level model for predicting housing prices.Write the full model such that (1) the Level Two predictors are used to estimate the intercept and slope for each Level One predictor, and (2) there are random effects for the slopes and intercepts.
Write the composite model corresponding to part (c)
List the fixed effects that must be estimated in the model in part (d).
Describe the variance components that must be estimated in the model in part (d).
Exercise 4
- Try to reproduce Model 2 presented in Table 2 of Sadler and Miller (2010). You can expect small differences in the parameter estimates, since the authors use SAS (with an unstructured covariance structure) instead of R. The data and codebook are available in
music-data.csv
in thedata
folder. - How do the parameter estimates, standard errors and AIC compare between your model in part (a) and the results in Table 2 of the paper?
Click here to access Sadler and Miller (2010) on Canvas.
Exercise 5
The goal of this analysis is to build a model that can be used to describe the factors that explain the success of NBA teams. The data from Kaggle (2018) includes various statistics and winning percentages for NBA teams in the 2017-18 season. The data set and codebook are available in the data
folder in your GitHub repo.
Include the following in your analysis:
Comprehensive exploratory data analysis
Create the best model you can that can be used to predict winning percentage. Be sure to consider collinearity between predictors as you select the final model.
Use the EDA and model results to describe the factors that help explain an NBA team’s success.
Submission
To submit the assignment, push your final changes to your GitHub repo. Then, you’re done! We will grade the latest versions of the files that were pushed to the GitHub repo by the deadline unless otherwise notified that you wish to submit late work.
Grading
Total | 50 |
---|---|
Ex 1 | 9 |
Ex 2 | 10 |
Ex 3 | 12 |
Ex 4 | 6 |
Ex 5 | 10 |
Workflow & formatting | 3 |
The “Workflow & formatting” grade is to based on the organization of the assignment write up along with the reproducible workflow. This includes having an organized write up with neat and readable headers, code, and narrative, including properly rendered mathematical notation. It also includes having a reproducible Quarto document that can be rendered to reproduce the submitted PDF, along with implementing version control using multiple commits with informative commit messages.