Generalized Linear Models
Jan 10, 2024
Teaching assistant
Hun Kang
PhD student in statistics
Lectures
Mondays and Wednesdays, 3:05 - 4:20pm, Physics 205
Labs
Lab 01: Thursdays, 3:05 - 4:20pm, Link #5
Lab 02: Thursdays, 4:45- 5:55pm, Link #5
In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.1
Example: Logistic regression
\[\begin{aligned}\pi = P(y = 1 | x) \hspace{2mm} &\Rightarrow \hspace{2mm} \text{Link function: } \log\big(\frac{\pi}{1-\pi}\big) \\ &\Rightarrow \log\big(\frac{\pi}{1-\pi}\big) = \beta_0 + \beta_1~x\end{aligned}\]
By the end of the semester, you will be able to …
Generalized Linear Models
Modeling correlated data
Introduce multilevel models for correlated and longitudinal data
Estimation, interpretation, and inference
Mathematical details, particularly diving into covariance structures
“…we used negative binomial regression to model the association between the number of questions produced, race, and group after adjusting for the additional covariates age and years of education. Poisson and zero-inflated Poisson regression models were also considered…the negative binomial model was a good fit for the data given the overdispersion in the distribution of number of questions asked.”1
” …a logistic regression model is used to test how the likelihood of a foul is affected by which team is the home team, the foul differential, and the score differential…The logistic regression was run under several specifications … using clustered observation standard errors, with each game as a cluster. This is done as an attempt to adjust for the fact that observations may not be independent as required under the logistic specification.1
Get in groups of 2 - 3
Each person in the group…
Everyone will introduce one person from your group to the class
Pre-reqs
STA 210 and STA 230 / STA 240
Background knowledge
Statistical methods
Computing
Canvas: canvas.duke.edu/courses/25310
GitHub: github.com/sta310-sp24
Slack (link in Canvas)
Lectures
Labs (start January 18)
Primary textbook: Beyond Multiple Linear Regression by Roback and Legler
Other texts:
R for Data Science (2nd edition) by Wickham, Çetinkaya-Rundel, and Grolemund
Tidy Modeling with R by Kuhn and Silge
Articles and videos periodically assigned
1️⃣ Install R and RStudio on your laptop
or
2️⃣ Access RStudio through Docker container provided by Duke OIT
GitHub course organization: github.com/sta310-sp24
Will receive and submit assignments through a private GitHub repo in the course Github organization
Will receive assignment feedback as a GitHub issue. Final grades on each assignment will be available in Canvas
All work and feedback are private
Online discussion forum (like Piazza, Ed Discussion, etc.)
Platform to ask questions about course content, logistics, assignments, etc.
Content organized by channels. Before posting, please browse previous posts to see if your question has already been answered. If not, please post your question in the relevant channel.
Questions about grades, absences, and other private matters should be emailed to me with “STA 310” in the subject line.
6 individual online quizzes
Covers content since the previous quiz, including readings, lecture notes, in-class activities, and homework
Lowest quiz grade is dropped
Project 01 (Team project, 10%)
Project 02 (Team project, 10%)
Final project (20%)
Individual project to apply what you’ve learned to analyze correlated data
Includes write up
Final grades will be calculated as follows
Category | Percentage |
---|---|
Homework | 40% |
Project 01 | 10% |
Project 02 | 10% |
Final project | 20% |
Quizzes | 20% |
See syllabus for letter grade thresholds.
Uphold the Duke Community Standard:
- I will not lie, cheat, or steal in my academic endeavors;
- I will conduct myself honorably in all my endeavors;
- I will act if the Standard is compromised.
Commit to respect, honor, and celebrate our diverse community
Commit to being part of a learning environment that is welcoming and accessible to everyone
The Student Disability Access Office (SDAO) is available to ensure that students are able to engage with their courses and related assignments.
If you have documented accommodations from SDAO, please send the documentation as soon as possible.
I am committed to making all course activities and materials accessible. If any course component is not accessible to you in any way, please don’t hesitate to let me know.
Office hours to meet with a member of the teaching team.
Slack for questions about course logistics, content, and assignments
Email for questions not appropriate for Slack, e.g., regarding personal matters or grades
See the syllabus and support page for additional academic and mental health and wellness resources
Please do not come to class if you have tested positive for COVID-19, have possible symptoms and have not yet been tested, or have other illness.
Read and follow the university guidelines regarding COVID-19 at coronavirus.duke.edu.
Homework will be accepted up to 48 hours after the deadline. There will be a 5% deduction for each 24-hour period the assignment is late.
No late work is accepted on quizzes, and there are no makeups for missed quizzes.
Late policy for projects:
Presentation: Late presentations are not accepted and there are no make ups for missed presentations.
Write up: There will be a 5% deduction for write ups submitted late but the same day, a 10% deduction for write ups submitted the next day, and a 15% deduction for write ups submitted two days late (by 11:59pm). No credit given for write ups submitted more than 2 days after the deadline.
Peer evaluation: No late work is accepted on peer evaluations.
You should treat AI tools, such as ChatGPT, the same as other online resources.
There are two guiding principles that govern how you can use AI in this course:1
(1) Cognitive dimension: Working with AI should not reduce your ability to think clearly. We will practice using AI to facilitate—rather than hinder—learning.
(2) Ethical dimension: Students using AI should be transparent about their use and make sure it aligns with academic integrity.
✅ AI tools for code: You may make use of the technology for coding examples on assignments; if you do so, you must explicitly cite where you obtained the code.
❌ No AI tools for narrative: Unless instructed otherwise, AI is not permitted for writing narrative on assignments.
Important
In general, you may use AI as a resource as you complete assignments but not to answer the exercises for you. You are ultimately responsible for the work you turn in; it should reflect your understanding of the course content.
See announcement on Canvas and complete the following:
(Wednesday, January 17)
Understand statistical models
Review multiple linear regression