Project 01: Generalized Linear Models

For this project you and your team will be reading and evaluating a scholarly article that incorporates generalized linear models (GLMs) in the analysis.

The learning objectives of the project are to

Team assignments

You will work in small teams for this project. You will find the team assignments in the #project-01 channel in Slack.

Before getting started, I encourage you to discuss the following as a group:

  • Come up with a plan to communicate and work together outside of lab.
  • Come up with a plan for remote work if some team members are unable to attend lab or other in-person team meetings.

Workflow

  • Project Week 01 (week of Mon, Jan 22): Select article and submit proposal.
  • Project Week 02 (week of Mon, Jan 29): Read article and complete article evaluation.
  • Project Week 03 (week of Mon, Feb 05 ): Work on draft reports and presentations.
  • Project Week 04 (week of Mon, Feb 12): Presentations and submit report.

Due dates

Note

All work will be submitted in your team’s project GitHub repo.

  • Proposal: due Fri, Jan 26 at noon

  • Article evaluation: due Sun, Feb 04 at noon

  • Presentation: due Wed, Feb 14 at 3:05pm

  • Written report: due Thu, Feb 15 at 9pm

Article

The article for this project must be published in a scholarly journal. Please ask a member of the teaching team if you are unsure whether the article is published in a scholarly journal. The article must incorporate the use one or more generalized linear models, that is not a linear regression model, in the analysis.

  • Common GLMs are Poisson regression, Logistic regression, Probit regression, and Negative binomial regression.
  • You can also look for models based on the distribution of the response variable: Binary, Binomial, Poisson, Exponential, Gamma, Geometric.
  • See Section 3.6 in Beyond Multiple Linear Regression for a list of types of GLMs.

The model used in the paper does not have to be one we discuss in class. I’d encourage you to explore articles that use modeling beyond the scope of the class!

Below are a few useful places to search for articles:

See the Tips on finding articles for tips on searching journal databases to find an article.

Proposal

The main goal of the proposal is to ensure you have an article that will set you up for a successful project. Include the following in the proposal.

  • The citation for the article. If you’re using a .bib file you can use the default citation format in Quarto (Chicago author-date format). Otherwise, use MLA format.

  • Brief summary about why you chose this article.

  • Brief summary of the article’s primary research objective.

  • Name of the GLM(s) used in the article and a short description of the response variable for each model.

You are only required to write the proposal for one article. Write the proposal in the file proposal.qmd, then push the .qmd and rendered PDF to the GitHub repo for submission.

Important

The proposal is due on Fri, Jan 25 at noon.

You will not be able to commit new work to your GitHub repo after the deadline until we have completed grading. If your group needs to submit your work late, please send me a message on Slack or email to reopen the repo.

Grading criteria

The proposal will be graded based on the following:

  • All required components of the proposal are included and accurate (8 pts)

  • All team members have made meaningful contribution, as determined by Git commit history (2 pts)

Article evaluation

The purpose of the article evaluation is for you to begin describing and evaluating the statistical analysis and argument in the article. Write your responses to the following questions in article-evaluation.qmd. The anticipated length is about 1 - 2 pages and should be no more than 4 pages. There is no minimum page requirement, as long as each section is comprehensively addressed.

  • Audience and purpose

    • Who is the primary audience for this article, i.e., for what type of readers are the authors writing?
    • What is the general purpose of the article, e.g., to persuade the reader to do something, to prove something, to inform the reader, etc.?
  • Data

    • How were the data generated - from an experiment, online survey, interviews, etc?
    • Under what conditions were the data collected, e.g., the time period, location, how subjects were selected, response rate / drop-out rate, etc.?
  • Graphs and tables

    • Describe the types of visualizations and tables used in the article.
    • How are they primarily used - for exploratory data analysis, to support a candidate, etc.?
    • What visualizations or tables might you add to the article? Briefly explain.
  • Generalized Linear Model

    If your paper has multiple, GLMs you only have to write this section up for one model.

    • What is the response variable, and what is its distribution?
    • What are the predictor variables? Which predictor(s) are of particular interest in the research?
    • Write the statistical model in mathematical notation.
  • Overall argument

    • Are there limitations or difficulties with generalizing beyond the data? Briefly explain.
    • When was the article published? Are the findings up-to-date, out-dated, or timeless? Briefly explain.
    • How does the study advance knowledge in the field?

Grading criteria

The article evaluation will be graded based on the following

  • All required components of the article evaluation are included and accurate (15 pts)

  • The team has worked collaboratively using GitHub and all all team members have made a meaningful contribution, as determined by Git commit history (5 pts)

Important

The article evaluation is due on Sun, Feb 04 at noon.

You will not be able to commit new work to your GitHub repo after the deadline until we have completed grading. If your group needs to submit your work late, please send me a message on Slack or email to reopen the repo.

Write up

The anticipated length is about 5 pages. There is no minimum or maximum page requirement as long as each section is accurately and comprehensively addressed.

Introduction

Briefly summarize the article, the research objective and purpose, and key conclusions. Also include a description of the data used for the analysis.

Methods

Describe the GLM used for the analysis. Describe the response variable and and its distribution. Describe predictor variables. Write the equation of the statistical models using mathematical notation.

Results

Interpret the results form the model. Write the interpretations / conclusions from the model, using estimates from the article when possible. If the article does not include estimates for some or all of the estimated effects effects, you can write general interpretations using the appropriate mathematical symbol (e.g., \(\hat{\beta}_1\)) in place of the estimated value.

Communication

The objective of this section of the written report is to assess the authors’ argument and communication. Reading and identifying how others communicate statistical results is a key way to develop your statistical writing skills. This section will include an assessment of the following:

  • Audience: Describe the primary audience for the article.

  • Methods: Consider the detail in the data and methods sections. What aspects of the analysis are mentioned in detail? What aspects are mentioned without detail? How does the level of detail correlate to the statistical background of the primary audience?

  • Graphs and figures: How are the graphs, figures, and tables used to support the findings? How are they used for exploratory data analysis? How how are they used to assess or support modeling results? Would additional graphs, figures, or tables be helpful? If so, what kind?

    • Identify one key graph. Where is it located in the article? What message does it convey with respect to the objective and conclusion of the study? If there are no graphs in the article, describe one key graph you would include and how it would be used in the article (e.g., support conclusions, provide clarity, etc.).
  • Limitations: Are there limitations or difficulties in generalizing beyond the data? How are these limitations noted, if at all? Do you have any other concerns about the study?

  • Impact: According to the author, how does the study advance knowledge in the field? Taking into account the year the article was published, do the author’s claims seem adequately justified, overblown, or unduly cautious?

You can use these questions as a guide to shape the narrative. This section should still be written in narrative form, not as a list of questions and answers.

The questions in this section are adapted from Communicating with Data: The Art of Writing for Data Science by Deborah Nolan and Sara Stoudt. Click here for more details about the questions and how to read scientific articles. You can borrow a copy of the book from Duke Libraries.

Grading criteria

Each section will assessed on whether the components of the section are clearly, comprehensively, and accurately discussed in the report. (35 pts total)

The report will also be assessed based on the following:

  • Formatting & reproducibility: 3 pts
    • This is an assessment of the overall presentation and formatting of the written report, along with reproducibility. This includes neatly formatted text and tables, appropriate labels on figures, suppressing all code and extraneous output, properly rendered LaTex, and being able to obtain the PDF by rendering the Quarto document.
  • Collaboration: 3 pts
    • The team has worked collaboratively using GitHub and all all team members have made a meaningful contribution, as determined by Git commit history
Important

The written report is due on Thu, Feb 15 at 9pm.

You will not be able to commit new work to your GitHub repo after the deadline until we have completed grading. If your group needs to submit your work late, please send me a message on Slack or email to reopen the repo.

Presentation

You will present on Wed, Feb 14 during lecture. Each team will have up to 10 minutes for the presentation along with a few minutes for questions, and every team member should speak about an equal amount of time during the presentation.

You can make the presentation slides using the software of your choice. You can use as many slide as you wish, just be mindful of what can reasonably be presented in the time frame. A suggested outline is

  • 1 slide to introduce article
  • 1 - 2 slides to describe the model
  • 1 - 2 slides for key interpretations and results
  • 1 slide for key highlights about the communication and writing (e.g., what the authors did particularly well or areas of improvement)

You will be assigned two presentations to peer review. You must submit the peer review scores for both presentations to have the “Peers” scores for your team’s presentation included in your presentation grade.

The presentation order and peer review assignments will be given closer to the presentation date.

The presentation is worth 25 points. The points are broken down as follows:

  • Teaching team grading: 18 pts
  • Peer grading: 4 pts
  • Providing presentation comments: 3pts

Grading criteria - Teaching Team (18 pts)

This portion of the grade will the average of the scores from the members of the teaching team.

  • Professionalism (3 pts)
    • Was the team prepared for the presentation? Did each team member have a meaningful contribution to the presentation?
    • Was the time reasonably divided among team members? Was the presentation within the time limit?
    • Did the team present a unified story?
  • Content (10 pts): Is the content presented in a clear and accurate way? This includes clearly and accurately describing the components described in the presentation outline.
  • Slides (5 pts): Are the slides well organized, readable, not full of text, featuring figures with legible labels, legends, etc.?

Grading criteria - Peers (4 pts)

This portion of the grade will the average of the scores from the peer reviewers. Peer review assignments will be posted on Slack.

  • Introduction (1 pt)

    • Did the team clearly describe the primary research objective, primary takeaways, and intended audience for the article?
  • Data (1 pt)

    • Did the team clearly describe the data used in the analysis?
  • Model (1 pt)

    • Did the team clearly and accurately describe the model?
  • Slides (1 pt)

    • Are the slides well organized, readable, not full of text, featuring figures with legible labels, legends, etc.?

GitHub repo organization

You should have the following files and folders in the project repo. The repo and brief summary in the README should be updated by Thu, Feb 15 at pm. README.md: 3 - 5 sentence summary of the project and citation for the article.

  • proposal.qmd

  • proposal.pdf

  • article-evaluation.qmd

  • article-evaluation.pdf

  • writeup.qmd

  • writeup.pdf

  • /presentation

    • /presentation/*: Presentation file (if not linked in README)
    • /presentation/README.md: Link to project (if not in presentation folder)

Optional

  • *.bib: BibTex file for references

  • /data/: Folder containing data

Grading (100 points)

Component Points
Proposal 10 pts
Article evaluation 20 pts
Written report 35 pts
Presentation 25 pts
Repo organization 5 pts
Teamwork evaluation (individually assessed) 5 pts

Tips on finding articles

Below are tips to help you find articles based on information from Jodi Psoter, the former Librarian for Chemistry and Statistical Science at Duke Libraries and current Head Librarian for the Marine Lab Library.

PubMed

Articles in health-related fields

The PubMed heading tree lets you search by topic. The link will direct you to the results under the category of “Statistics as a Topic”.

  1. Click on the model or distribution of interest, e.g. “Logistic Models”.
  2. Click “Add to search builder” under the PubMed Search Builder in the top right corner. You should now see the model/analysis type you chose in the search box.
  3. Click “Search PubMed”, and a page of search results will appear.
  4. There are options to narrow your results on the left-hand side based on your team’s interest.

PsycInfo

Articles in psychology

PsycInfo will allow users to search by analysis type.

  1. Put the name of the model in the search bar, e.g., “Poisson Regression”. Then, in the drop down menu next to the search bar, select “DE Subjects [exact]”. Click Search.

  2. You can use the options on the left-hand size to narrow down the search results.

Web of Science

Articles on all topics

Web of Science Data Citation Index lets you search for data sets based on the topic of interest.

  1. Use the search bar to search based on a topic of interest. You can also search for the model or distribution name.

  2. On the left-hand side, check “Data Set” under Content Type and check “Dataset” under Data Types. Click “Refine” to limit the results.

3.Click on the article of interest.

  1. You can use the options on the left-hand size to narrow down the search results.

Acknowledgements

  • Grading criteria and the repo organization for this project were adapted from Project 1 on vizdata.org.
  • Some questions for the Article Evaluation adapted from “How to Evaluate Journal Articles”.
  • Questions in the “Communication” section of the Written Report are adapted from Communicating with Data: The Art of Writing for Data Science by Deborah Nolan and Sara Stoudt.