library(tidyverse)
library(tidymodels)
library(knitr)
hh_data <- read_csv("data/fHH1.csv")Lecture 04 AE: Estimating coefficients for Poisson Regression
Introduction
The data in fHH1.csv come from the 2015 Family Income and Expenditure Survey conducted by the Philippine Statistics Authority.
The overall goal is understand the association between household size and various characteristics of the household.
Response:
total: Number of people in the household other than the head
Predictors:
location: Where the house is locatedage: Age of the head of householdroof: Type of roof on the residence (proxy for wealth)
Other variables:
numLT5: Number in the household under 5 years old
We will focus on the relationship between total and age for this AE.
Iteratively reweighted least squares
Step 1. Initial estimates of lambda
Using total + 0.1 instead of total as initial starting values to avoid dividing by 0 or taking the log of 0.
hh_data <- hh_data |>
mutate(lambda = hh_data$total + 5)Step 2. Calculate the working response values .
For Poisson regression,
hh_data <- hh_data |>
mutate(z = log(lambda) + (hh_data$total - lambda) / lambda)Step 3. Calculate the working weights
For Poisson regression,
hh_data <- hh_data |>
mutate(w = lambda)hh_data |>
select(total, lambda, z, w) |>
slice(1:5)# A tibble: 5 × 4
total lambda z w
<dbl> <dbl> <dbl> <dbl>
1 0 5 0.609 5
2 3 8 1.45 8
3 4 9 1.64 9
4 3 8 1.45 8
5 3 8 1.45 8
Step 4. Find the coefficient estimates of the weighted least squares model
wls <- lm(z ~ age, weight = w, data = hh_data)
tidy(wls) |>
kable(digits = 4)| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 1.7594 | 0.0419 | 41.9664 | 0e+00 |
| age | -0.0026 | 0.0008 | -3.4172 | 6e-04 |
Use this model to get new estimates of lambda
hh_data <- hh_data |>
mutate(lambda = exp(predict(wls)))Repeat steps 2 - 4 until the estimates
IRLS automated
nreps <- 10
hh_data <- hh_data %>%
mutate(lambda = hh_data$total + 5)
betas <- tibble(beta0 = rep(0, nreps),
beta1 = rep(0, nreps))
for(i in 1:nreps){
hh_data <- hh_data %>%
mutate(z = log(lambda) + (hh_data$total - lambda)/ lambda,
w = lambda)
wls <- lm(z ~ age, weight = w, data = hh_data)
betas$beta0[i] = tidy(wls)$estimate[1]
betas$beta1[i] = tidy(wls)$estimate[2]
hh_data <- hh_data %>%
mutate(lambda = exp(predict(wls)))
}
betas# A tibble: 10 × 2
beta0 beta1
<dbl> <dbl>
1 1.76 -0.00265
2 1.57 -0.00414
3 1.55 -0.00468
4 1.55 -0.00471
5 1.55 -0.00471
6 1.55 -0.00471
7 1.55 -0.00471
8 1.55 -0.00471
9 1.55 -0.00471
10 1.55 -0.00471
Fit using the glm function
## add code here