Project 02: Multilevel models + Reproducibility

For this project you and your team will read and evaluate a scholarly article that uses a multilevel model in the analysis. You will also use data from the article to reproduce one model and conduct additional analysis.

The learning objectives are to

Explore how multilevel models are used in practice.
Understand how statistical results are presented in scholarly research.
Recognize what information is needed for an analysis to be reproducible.
Gain experience analyzing real-world multilevel data.

Team assignments

You will work in the same teams the Project 01 teams. A reminder of those team assignments is available in Slack. Before getting started, I encourage you to briefly discuss the following as a group in case anything has changed since the first project:

Come up with a plan to communicate and work together outside of lab.
Come up with a plan for remote work if some team members are unable to attend lab or other in-person team meetings.

Workflow

Project Week 01 (week of Feb 26): Select the article and accompanying data set. Write and submit the proposal.
Project Week 02 (week of March 4): Read and evaluate the article. Develop an analysis plan.
Spring Break: March 11 - 15
Project Week 03 (week of March 18): Work on analysis and submit draft report.
Project Week 04 (Week of March 25): Finalize analysis results, make presentation, and write report.
Project Week 05 (Week of Apr 1): Present during lecture and submit final report.

Due dates

Note

All work will be submitted in your team’s project GitHub repo.

Proposal: due Friday, March 1 at 12pm (noon)
Analysis plan (optional): due Friday, March 8 at 12pm (noon)
- Optional submission to receive feedback from the teaching team.
Draft report: due Friday, March 22 at 12pm (noon)
Presentation: due Wed, Apr 3 at 3:30pm
Written report: due Thursday, Apr 4 at 9pm

Article + data set

The article for this project must be published in a scholarly journal. Please ask a member of the teaching team if you are unsure whether the article is published in a scholarly journal. The article must

Incorporate the use one or more multilevel models in the analysis.

Have the original data set or a comparable data set available.

See the Tips on finding articles for tips on searching databases to find articles and data.

Proposal

The main goal of the proposal is to ensure you have an article that will set you up for a successful project. The proposal should include the following:

The citation for the article. If you’re using a .bib file you can use the default citation format in R Markdown (Chicago author-date format). Otherwise, use MLA format.
Brief summary about why you chose this article.
Brief summary of the article’s primary research objective.
A description of the data analyzed in the article. Include
- A description of the observational units at each level. (Note: Most articles will have level-one and level-two observational units, but some may have more levels to the data structure.)
- A description of the response variable.
- A description of within-group variability.
- A description of the fixed and random effects.
A glimpse of the data set

Write the proposal in the file proposal.qmd, then push the .qmd and rendered PDF to the GitHub repo for submission.

Important

The proposal is due on Friday, March 1 at noon.

You will not be able to commit new work to your GitHub repo after the deadline until we have completed grading. If your group needs to submit your work late, please send me a message on Slack or email to reopen the repo.

Grading criteria

The proposal will be graded based on the following:

All required components of the proposal are included and accurate (10 pts)
Data set is in the data folder of the GitHub repo (2 pts)
All team members have contributed (3 pts)
- This will be assessed based on the repo’s commit history.

Analysis plan (optional)

The goal of the analysis plan is for you team to outline your approach for the two analysis components of the project. These components are

Try to reproduce one multilevel model in the article.
Conduct original analysis using the data from the article and compare your results and conclusions to those from the article. The original analysis may include
- Fitting a model that incorporates new variables and/or data
- Fitting a new model (e.g., using different parameter adjustments, transforming the response variable, etc.)
- Assessing the authors’ choice of any adjustments to the model (e.g., adding a penalty term)
- …other areas of exploration

Questions to consider in your analysis plan

Below are a few questions to consider as you outline the approach to reproduce one model in the paper:

What data cleaning (if any) is required to prepare the data for modeling? Did the authors remove any observations? Did they create or transform variables?
Did the authors conduct model selection? If so, what was their approach?
Did the authors make any adjustments to the model (e.g., including weighting or a penalty term)? If so, how will you incorporate these adjustments in your model? What questions do you have about these additional model adjustments?
Did the authors assess the model fit or performance? If so, how? Otherwise, how will you assess the model fit or performance?

Below are a few questions to consider as you outline the approach for the original analysis:

What question do you want to explore as part of your original analysis?
Do you need to add new data and/or create new variables? If so, what do you need to add?
What model will you use? How will you assess model fit and performance?
How do your results compare to the results in the article?

Submit the analysis plan (optional)

You may turn in the analysis plan along with any initial results to receive feedback from the teaching team. To do so,

Write your analysis plan in analysis-plan.qmd . Render and push the updated document to your team’s GitHub repo.
Open a new issue called “Analysis Plan”. In the body of the issue add the tag @sta310-sp24/teaching-team. If you have any specific questions you’d like the teaching team to address, add those in the body of the issue as well.

Important

The analysis plan is optional. If you would like to receive feedback from the teaching team, you must open the issue and submit the analysis plan on GitHub by Friday, March 8 at 12pm (noon).

I recommend sketching an analysis plan before diving into the analysis even if you don’t turn it in for feedback.

Draft report

The goal of the draft is to get initial feedback on your analysis and report.

At a minimum, the draft should include the following:

Summary of the article and the primary research questions
Exploratory data analysis for the response variable
Description of the process to reproduce the model
First attempt to reproduce the model with description of any differences between your results and results in the article
Description of what you’re exploring in the original analysis
First attempt at original analysis with initial conclusions

Write the draft in written-report.qmd. Push the .qmd and rendered PDF to the GitHub repo for submission.

Important

The draft is due on Friday, March 22 at noon.

Grading criteria

The draft will be graded based on the following:

All required components of the draft are included and accurate (12 pts)
All team members have contributed (3 pts)
- This will be assessed based on the repo’s commit history.

Written report

The final written report should include the sections below. There is a 10-page limit.

You are welcome to include an appendix with additional work at the end of the written report document; however, grading will overwhelmingly be based on the content in the main body of the report. You should assume the reader will not see the material in the appendix unless prompted to view it in the main body of the report. The appendix should be neatly formatted and easy for the reader to navigate. It is not included in the 10-page limit.

Introduction

This section includes a brief summary of the article and its primary research objective. It will also include a description of the data set and relevant variables. This section should be written as if the reader has not read the article nor has seen the data dictionary in your GitHub repo. You do not need to include a description of every variable, but you want to provide enough information that the reader has an idea of the type of information in the data set.

Reproducing the model

This section will include a description of model you’re reproducing along with a description of the response variable and any relevant descriptive statistics and visualizations. Describe the process you used to reproduce the model (data cleaning or preparation, model selection, etc.) and if there were places where your process differed from that in the original article.

Include the output from the model and the conclusions from the model. These conclusions can include those from the original paper and/or any conclusions your group derived that were in the paper.

Original analysis

This section will include a summary and results from your original analysis. Describe the question you’re exploring in this analysis and your motivation for choosing this question. Describe the analysis process (data cleaning, model selection, model evaluation, etc.). Include the relevant output from your results and a summary of the conclusions from this analysis. Note any conclusions that may have differed from those in the original article.

Discussion

This section will include a summary of your conclusions along with any limitations to the data and/or analysis. Also include any challenges your group may have faced with reproducing the model and suggestions to improve the reproducibility of the analysis.

Write the report in written-report.qmd. Push the .qmd and rendered PDF to the GitHub repo for submission.

Important

The final written report is due on Thursday, April 4 at 9pm.

Grading criteria

The written report (35 pts) will be assessed on how clearly, comprehensively and accurately each section is written. Additionally, the report will be assessed on the following:

Thoroughness of analysis
- This is an assessment of whether a thorough approach was taken in at least one of the primary analyses in the report - reproducing the model or original analysis. Does the report demonstrate an an in-depth approach to reproduce the model, an in-depth evaluation of the authors’ choices, an in-depth exploration of new variables, model types, etc, or an in-depth exploration in some other part of the analysis?
Formatting
- This is an assessment of the overall presentation and formatting of the written report. This includes neatly formatted text and tables, appropriate labels on figures, suppressing all code and extraneous output, properly rendered LaTex, etc.
Reproducibility
- This is an assessment of the reproducibility of the report. Is the PDF produced by rendering the .qmd document?
All team members have contributed
- This will be assessed based on the repo’s commit history.

Presentation

You will present on Wednesday, April 3 during lecture. Each team will have 10 minutes for the presentation along with a few minutes for questions. Every team member is expected to speak about an equal amount of time during the presentation.

You can make the presentation slides using the software of your choice. You can use as many slide as you wish, just be mindful of what can reasonably be presented in 10 minutes. A suggested outline is

Introduction to the article and research question
Description of the data
Description of the process, challenges and results from reproducing the model
Description and results from original analysis
Discussion and conclusion

The presentation order and peer review assignments will be given closer to the presentation date.

The presentation is worth 20 points. The points are broken down as follows:

Teaching team scores: 15 pts
Peer scores: 5 pts

Grading criteria - Teaching Team (15 pts)

This portion of the grade will the average of the scores from the members of the teaching team.

Professionalism (3 pts)
- Was the team prepared for the presentation? Did each team member have a meaningful contribution to the presentation?
- Was the time reasonably divided among team members? Was the presentation within the time limit?
- Did the team present a unified story?
Content (8 pts)
- Is the content presented in a clear and accurate way? This includes clearly and accurately describing the components described in the presentation outline.
Slides (4 pts)
- Are the slides well organized, readable, not full of text, featuring figures with legible labels, legends, etc.?

Grading criteria - Peers (5 pts)

This portion of the grade will the average of the scores from the peer reviewers. Peer review assignments will be posted on Slack.

Professionalism (1 pt)
- Was the team prepared for the presentation? Did each team member have a meaningful contribution to the presentation?
- Did the team present a unified story?
Content (2 pt)
- Is the content presented in a clear and accurate way? This includes clearly and accurately describing the components described in the presentation outline.
Slides (2 pt)
- Are the slides well organized, readable, not full of text, featuring figures with legible labels, legends, etc.?

Presentation peer review

You will be assigned two presentations to peer review. Submitting the scores for your assigned presentations is worth 5 points. This portion of the grade will be individually assessed.

GitHub repo organization

You should have the following files and folders in the project repo. The repo and brief summary in the README should be updated by Thursday, April 4 at 9pm.

README.md: Title and 3 - 5 sentence summary of the project
*.bib: BibTex file for references (optional)
proposal.qmd
proposal.pdf
analysis-plan.qmd (you can remove this file if you did not write an analysis plan)
analysis-plan.pdf (you can remove this file if you did not write an analysis plan)
written-report.qmd
written-report.pdf
/presentation
- /presentation/*: Presentation file (if not linked in README)
- /presentation/README.md: Link to project (if not in presentation folder)
/data/:
- /data/*: File containing data set
- /data/README.md: Codebook for data set.

Grading (100 points)

Component	Points
Proposal	15 pts
Draft report	15 pts
Written report	35 pts
Presentation	20 pts
Presentation peer review (individually assessed)	5 pts
Repo organization	5 pts
Teamwork evaluation (individually assessed)	5 pts

Tips on finding articles

Below are tips to help you find articles based on information from Jodi Psoter, the former Librarian for Chemistry and Statistical Science at Duke Libraries and current Head Librarian for the Marine Lab Library.

PubMed

Articles in health-related fields

The PubMed heading tree lets you search by topic. The link will direct you to the results under the category of “Statistics as a Topic”.

Click on the model or distribution of interest, e.g. “Logistic Models”.
Click “Add to search builder” under the PubMed Search Builder in the top right corner. You should now see the model/analysis type you chose in the search box.
Click “Search PubMed”, and a page of search results will appear.
There are options to narrow your results on the left-hand side. Under Article Attributes, check “Associated Data,” to limit the results to articles with data sets available.

PsycInfo

Articles in psychology

PsycInfo will allow users to search by analysis type.

Put the name of the model in the search bar, e.g., “Poisson Regression”. Then, in the drop down menu next to the search bar, select “DE Subjects [exact]”. Click Search.
You can use the options on the left-hand size to narrow down the search results.

Web of Science

Articles on all topics

Web of Science Data Citation Index lets you search for data sets based on the topic of interest.

Use the search bar to search based on a topic of interest. You can also search for the model or distribution name.
On the left-hand side, check “Data Set” under Content Type and check “Dataset” under Data Types. Click “Refine” to limit the results.
Click on the article of interest.
You can use the options on the left-hand size to narrow down the search results.

Additional tips

Plos One is an open-access peer-reviewed journal with articles from a variety of disciplines. Many of the articles include the data set as part of the “Supplemental Data”.
You can use terms such as “multilevel models” , “mixed effects models”, “random effects models”, “longitudinal models”, and “hierarchical models” to find models for multilevel data.

Acknowledgements

Grading criteria and the repo organization for this project were adapted from Project 1 on vizdata.org.