Lecture 04 AE: Estimating coefficients for Poisson Regression

Published

January 24, 2024

library(tidyverse)
library(tidymodels)
library(knitr)

hh_data <- read_csv("data/fHH1.csv")

Introduction

The data in fHH1.csv come from the 2015 Family Income and Expenditure Survey conducted by the Philippine Statistics Authority.

The overall goal is understand the association between household size and various characteristics of the household.

Response:

  • total: Number of people in the household other than the head

Predictors:

  • location: Where the house is located
  • age: Age of the head of household
  • roof: Type of roof on the residence (proxy for wealth)

Other variables:

  • numLT5: Number in the household under 5 years old

We will focus on the relationship between total and age for this AE.

Iteratively reweighted least squares

Step 1. Initial estimates of lambda

Using total + 0.1 instead of total as initial starting values to avoid dividing by 0 or taking the log of 0.

hh_data <- hh_data |>
  mutate(lambda = hh_data$total + 5)

Step 2. Calculate the working response values \(z_i\).

For Poisson regression, \(z_i = \log(\lambda) + \frac{(y_i - \lambda_i)}{\lambda_i}\)

hh_data <- hh_data |>
  mutate(z = log(lambda) + (hh_data$total - lambda) / lambda)

Step 3. Calculate the working weights \(W_i\)

For Poisson regression, \(W_i = \lambda_i\) .

hh_data <- hh_data |>
  mutate(w = lambda)
hh_data |>
  select(total, lambda, z, w) |>
  slice(1:5)
# A tibble: 5 × 4
  total lambda     z     w
  <dbl>  <dbl> <dbl> <dbl>
1     0      5 0.609     5
2     3      8 1.45      8
3     4      9 1.64      9
4     3      8 1.45      8
5     3      8 1.45      8

Step 4. Find the coefficient estimates of the weighted least squares model

wls <- lm(z ~ age, weight = w, data = hh_data)
tidy(wls) |>
  kable(digits = 4)
term estimate std.error statistic p.value
(Intercept) 1.7594 0.0419 41.9664 0e+00
age -0.0026 0.0008 -3.4172 6e-04

Use this model to get new estimates of lambda

hh_data <- hh_data |>
  mutate(lambda = exp(predict(wls)))

Repeat steps 2 - 4 until the estimates \(\hat{\beta}_0\) and \(\hat{\beta}_1\) converge to their respective values.

IRLS automated

nreps <- 10
hh_data <- hh_data %>%
  mutate(lambda = hh_data$total + 5)


betas <- tibble(beta0 = rep(0, nreps), 
                beta1 = rep(0, nreps))

for(i in 1:nreps){
  hh_data <- hh_data %>%
    mutate(z = log(lambda) + (hh_data$total - lambda)/ lambda, 
         w = lambda)
  wls <- lm(z ~ age, weight = w, data = hh_data)
  betas$beta0[i] = tidy(wls)$estimate[1]
  betas$beta1[i] = tidy(wls)$estimate[2]
  
  hh_data <- hh_data %>% 
    mutate(lambda = exp(predict(wls)))
}
  
betas
# A tibble: 10 × 2
   beta0    beta1
   <dbl>    <dbl>
 1  1.76 -0.00265
 2  1.57 -0.00414
 3  1.55 -0.00468
 4  1.55 -0.00471
 5  1.55 -0.00471
 6  1.55 -0.00471
 7  1.55 -0.00471
 8  1.55 -0.00471
 9  1.55 -0.00471
10  1.55 -0.00471

Fit using the glm function

## add code here