For this project you and your team will be reading and evaluating a scholarly article that incorporates generalized linear models (GLMs) in the analysis.
The learning objectives of the project are to
- Explore how GLMs are used in practice.
- Understand how statistical results are presented in scholarly research.
Team assignments
You will work in small teams for this project. You will find the team assignments in the #project-01
channel in Slack.
Before getting started, I encourage you to discuss the following as a group:
- Come up with a plan to communicate and work together outside of lab.
- Come up with a plan for remote work if some team members are unable to attend lab or other in-person team meetings.
Workflow
- Project Week 01 (week of Mon, Jan 22): Select article and submit proposal.
- Project Week 02 (week of Mon, Jan 29): Read article and complete article evaluation.
- Project Week 03 (week of Mon, Feb 05 ): Work on draft reports and presentations.
- Project Week 04 (week of Mon, Feb 12): Presentations and submit report.
Due dates
All work will be submitted in your team’s project GitHub repo.
Proposal: due Fri, Jan 26 at noon
Article evaluation: due Sun, Feb 04 at noon
Presentation: due Wed, Feb 14 at 3:05pm
Written report: due Thu, Feb 15 at 9pm
Article
The article for this project must be published in a scholarly journal. Please ask a member of the teaching team if you are unsure whether the article is published in a scholarly journal. The article must incorporate the use one or more generalized linear models, that is not a linear regression model, in the analysis.
- Common GLMs are Poisson regression, Logistic regression, Probit regression, and Negative binomial regression.
- You can also look for models based on the distribution of the response variable: Binary, Binomial, Poisson, Exponential, Gamma, Geometric.
- See Section 3.6 in Beyond Multiple Linear Regression for a list of types of GLMs.
The model used in the paper does not have to be one we discuss in class. I’d encourage you to explore articles that use modeling beyond the scope of the class!
Below are a few useful places to search for articles:
See the Tips on finding articles for tips on searching journal databases to find an article.
Proposal
The main goal of the proposal is to ensure you have an article that will set you up for a successful project. Include the following in the proposal.
The citation for the article. If you’re using a .bib
file you can use the default citation format in Quarto (Chicago author-date format). Otherwise, use MLA format.
Brief summary about why you chose this article.
Brief summary of the article’s primary research objective.
Name of the GLM(s) used in the article and a short description of the response variable for each model.
You are only required to write the proposal for one article. Write the proposal in the file proposal.qmd
, then push the .qmd
and rendered PDF to the GitHub repo for submission.
The proposal is due on Fri, Jan 25 at noon.
You will not be able to commit new work to your GitHub repo after the deadline until we have completed grading. If your group needs to submit your work late, please send me a message on Slack or email to reopen the repo.
Grading criteria
The proposal will be graded based on the following:
All required components of the proposal are included and accurate (8 pts)
All team members have made meaningful contribution, as determined by Git commit history (2 pts)
Article evaluation
The purpose of the article evaluation is for you to begin describing and evaluating the statistical analysis and argument in the article. Write your responses to the following questions in article-evaluation.qmd
. The anticipated length is about 1 - 2 pages and should be no more than 4 pages. There is no minimum page requirement, as long as each section is comprehensively addressed.
Grading criteria
The article evaluation will be graded based on the following
All required components of the article evaluation are included and accurate (15 pts)
The team has worked collaboratively using GitHub and all all team members have made a meaningful contribution, as determined by Git commit history (5 pts)
The article evaluation is due on Sun, Feb 04 at noon.
You will not be able to commit new work to your GitHub repo after the deadline until we have completed grading. If your group needs to submit your work late, please send me a message on Slack or email to reopen the repo.
Write up
The anticipated length is about 5 pages. There is no minimum or maximum page requirement as long as each section is accurately and comprehensively addressed.
Introduction
Briefly summarize the article, the research objective and purpose, and key conclusions. Also include a description of the data used for the analysis.
Methods
Describe the GLM used for the analysis. Describe the response variable and and its distribution. Describe predictor variables. Write the equation of the statistical models using mathematical notation.
Results
Interpret the results form the model. Write the interpretations / conclusions from the model, using estimates from the article when possible. If the article does not include estimates for some or all of the estimated effects effects, you can write general interpretations using the appropriate mathematical symbol (e.g., \(\hat{\beta}_1\)) in place of the estimated value.
Communication
The objective of this section of the written report is to assess the authors’ argument and communication. Reading and identifying how others communicate statistical results is a key way to develop your statistical writing skills. This section will include an assessment of the following:
Audience: Describe the primary audience for the article.
Methods: Consider the detail in the data and methods sections. What aspects of the analysis are mentioned in detail? What aspects are mentioned without detail? How does the level of detail correlate to the statistical background of the primary audience?
Graphs and figures: How are the graphs, figures, and tables used to support the findings? How are they used for exploratory data analysis? How how are they used to assess or support modeling results? Would additional graphs, figures, or tables be helpful? If so, what kind?
- Identify one key graph. Where is it located in the article? What message does it convey with respect to the objective and conclusion of the study? If there are no graphs in the article, describe one key graph you would include and how it would be used in the article (e.g., support conclusions, provide clarity, etc.).
Limitations: Are there limitations or difficulties in generalizing beyond the data? How are these limitations noted, if at all? Do you have any other concerns about the study?
Impact: According to the author, how does the study advance knowledge in the field? Taking into account the year the article was published, do the author’s claims seem adequately justified, overblown, or unduly cautious?
You can use these questions as a guide to shape the narrative. This section should still be written in narrative form, not as a list of questions and answers.
The questions in this section are adapted from Communicating with Data: The Art of Writing for Data Science by Deborah Nolan and Sara Stoudt. Click here for more details about the questions and how to read scientific articles. You can borrow a copy of the book from Duke Libraries.
Grading criteria
Each section will assessed on whether the components of the section are clearly, comprehensively, and accurately discussed in the report. (35 pts total)
The report will also be assessed based on the following:
- Formatting & reproducibility: 3 pts
- This is an assessment of the overall presentation and formatting of the written report, along with reproducibility. This includes neatly formatted text and tables, appropriate labels on figures, suppressing all code and extraneous output, properly rendered LaTex, and being able to obtain the PDF by rendering the Quarto document.
- Collaboration: 3 pts
- The team has worked collaboratively using GitHub and all all team members have made a meaningful contribution, as determined by Git commit history
The written report is due on Thu, Feb 15 at 9pm.
You will not be able to commit new work to your GitHub repo after the deadline until we have completed grading. If your group needs to submit your work late, please send me a message on Slack or email to reopen the repo.
Presentation
You will present on Wed, Feb 14 during lecture. Each team will have up to 10 minutes for the presentation along with a few minutes for questions, and every team member should speak about an equal amount of time during the presentation.
You can make the presentation slides using the software of your choice. You can use as many slide as you wish, just be mindful of what can reasonably be presented in the time frame. A suggested outline is
- 1 slide to introduce article
- 1 - 2 slides to describe the model
- 1 - 2 slides for key interpretations and results
- 1 slide for key highlights about the communication and writing (e.g., what the authors did particularly well or areas of improvement)
You will be assigned two presentations to peer review. You must submit the peer review scores for both presentations to have the “Peers” scores for your team’s presentation included in your presentation grade.
The presentation order and peer review assignments will be given closer to the presentation date.
The presentation is worth 25 points. The points are broken down as follows:
- Teaching team grading: 18 pts
- Peer grading: 4 pts
- Providing presentation comments: 3pts
Grading criteria - Teaching Team (18 pts)
This portion of the grade will the average of the scores from the members of the teaching team.
- Professionalism (3 pts)
- Was the team prepared for the presentation? Did each team member have a meaningful contribution to the presentation?
- Was the time reasonably divided among team members? Was the presentation within the time limit?
- Did the team present a unified story?
- Content (10 pts): Is the content presented in a clear and accurate way? This includes clearly and accurately describing the components described in the presentation outline.
- Slides (5 pts): Are the slides well organized, readable, not full of text, featuring figures with legible labels, legends, etc.?
Grading criteria - Peers (4 pts)
This portion of the grade will the average of the scores from the peer reviewers. Peer review assignments will be posted on Slack.
Introduction (1 pt)
- Did the team clearly describe the primary research objective, primary takeaways, and intended audience for the article?
Data (1 pt)
- Did the team clearly describe the data used in the analysis?
Model (1 pt)
- Did the team clearly and accurately describe the model?
Slides (1 pt)
- Are the slides well organized, readable, not full of text, featuring figures with legible labels, legends, etc.?
GitHub repo organization
You should have the following files and folders in the project repo. The repo and brief summary in the README should be updated by Thu, Feb 15 at pm. README.md
: 3 - 5 sentence summary of the project and citation for the article.
proposal.qmd
proposal.pdf
article-evaluation.qmd
article-evaluation.pdf
writeup.qmd
writeup.pdf
/presentation
/presentation/*
: Presentation file (if not linked in README)
/presentation/README.md
: Link to project (if not in presentation folder)
Optional
Grading (100 points)
Proposal |
10 pts |
Article evaluation |
20 pts |
Written report |
35 pts |
Presentation |
25 pts |
Repo organization |
5 pts |
Teamwork evaluation (individually assessed) |
5 pts |
Tips on finding articles
Below are tips to help you find articles based on information from Jodi Psoter, the former Librarian for Chemistry and Statistical Science at Duke Libraries and current Head Librarian for the Marine Lab Library.
PubMed
Articles in health-related fields
The PubMed heading tree lets you search by topic. The link will direct you to the results under the category of “Statistics as a Topic”.
- Click on the model or distribution of interest, e.g. “Logistic Models”.
- Click “Add to search builder” under the PubMed Search Builder in the top right corner. You should now see the model/analysis type you chose in the search box.
- Click “Search PubMed”, and a page of search results will appear.
- There are options to narrow your results on the left-hand side based on your team’s interest.
PsycInfo
Articles in psychology
PsycInfo will allow users to search by analysis type.
Put the name of the model in the search bar, e.g., “Poisson Regression”. Then, in the drop down menu next to the search bar, select “DE Subjects [exact]”. Click Search.
You can use the options on the left-hand size to narrow down the search results.
Web of Science
Articles on all topics
Web of Science Data Citation Index lets you search for data sets based on the topic of interest.
Use the search bar to search based on a topic of interest. You can also search for the model or distribution name.
On the left-hand side, check “Data Set” under Content Type and check “Dataset” under Data Types. Click “Refine” to limit the results.
3.Click on the article of interest.
- You can use the options on the left-hand size to narrow down the search results.
Acknowledgements
- Grading criteria and the repo organization for this project were adapted from Project 1 on vizdata.org.
- Some questions for the Article Evaluation adapted from “How to Evaluate Journal Articles”.
- Questions in the “Communication” section of the Written Report are adapted from Communicating with Data: The Art of Writing for Data Science by Deborah Nolan and Sara Stoudt.