library(tidyverse)
library(tidymodels)
library(knitr)
<- read_csv("data/fHH1.csv") hh_data
Lecture 04 AE: Estimating coefficients for Poisson Regression
Introduction
The data in fHH1.csv
come from the 2015 Family Income and Expenditure Survey conducted by the Philippine Statistics Authority.
The overall goal is understand the association between household size and various characteristics of the household.
Response:
total
: Number of people in the household other than the head
Predictors:
location
: Where the house is locatedage
: Age of the head of householdroof
: Type of roof on the residence (proxy for wealth)
Other variables:
numLT5
: Number in the household under 5 years old
We will focus on the relationship between total
and age
for this AE.
Iteratively reweighted least squares
Step 1. Initial estimates of lambda
Using total + 0.1
instead of total
as initial starting values to avoid dividing by 0 or taking the log of 0.
<- hh_data |>
hh_data mutate(lambda = hh_data$total + 5)
Step 2. Calculate the working response values \(z_i\).
For Poisson regression, \(z_i = \log(\lambda) + \frac{(y_i - \lambda_i)}{\lambda_i}\)
<- hh_data |>
hh_data mutate(z = log(lambda) + (hh_data$total - lambda) / lambda)
Step 3. Calculate the working weights \(W_i\)
For Poisson regression, \(W_i = \lambda_i\) .
<- hh_data |>
hh_data mutate(w = lambda)
|>
hh_data select(total, lambda, z, w) |>
slice(1:5)
# A tibble: 5 × 4
total lambda z w
<dbl> <dbl> <dbl> <dbl>
1 0 5 0.609 5
2 3 8 1.45 8
3 4 9 1.64 9
4 3 8 1.45 8
5 3 8 1.45 8
Step 4. Find the coefficient estimates of the weighted least squares model
<- lm(z ~ age, weight = w, data = hh_data)
wls tidy(wls) |>
kable(digits = 4)
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 1.7594 | 0.0419 | 41.9664 | 0e+00 |
age | -0.0026 | 0.0008 | -3.4172 | 6e-04 |
Use this model to get new estimates of lambda
<- hh_data |>
hh_data mutate(lambda = exp(predict(wls)))
Repeat steps 2 - 4 until the estimates \(\hat{\beta}_0\) and \(\hat{\beta}_1\) converge to their respective values.
IRLS automated
<- 10
nreps <- hh_data %>%
hh_data mutate(lambda = hh_data$total + 5)
<- tibble(beta0 = rep(0, nreps),
betas beta1 = rep(0, nreps))
for(i in 1:nreps){
<- hh_data %>%
hh_data mutate(z = log(lambda) + (hh_data$total - lambda)/ lambda,
w = lambda)
<- lm(z ~ age, weight = w, data = hh_data)
wls $beta0[i] = tidy(wls)$estimate[1]
betas$beta1[i] = tidy(wls)$estimate[2]
betas
<- hh_data %>%
hh_data mutate(lambda = exp(predict(wls)))
}
betas
# A tibble: 10 × 2
beta0 beta1
<dbl> <dbl>
1 1.76 -0.00265
2 1.57 -0.00414
3 1.55 -0.00468
4 1.55 -0.00471
5 1.55 -0.00471
6 1.55 -0.00471
7 1.55 -0.00471
8 1.55 -0.00471
9 1.55 -0.00471
10 1.55 -0.00471
Fit using the glm
function
## add code here