Module 2 Cheat Sheet

Generalized Linear Models (GLMs)

What is a GLM?

A Generalized Linear Model extends linear regression to handle: - Non-normal outcomes (e.g., binary, count) - Non-constant variance (heteroscedasticity) - A link function to transform the mean

Components of a GLM

  1. Random component: Distribution of the response (from the exponential family)
  2. Systematic component: Linear predictor \(\eta_i = x_i^T \beta\)
  3. Link function: Relates expected value \(\mu_i = E[Y_i|x_i]\) to \(\eta_i\) \[ g(\mu_i) = \eta_i = x_i^T \beta \]

Exponential Family Form

A distribution belongs to the exponential family if it can be written as: \[ f(y; \theta, \phi) = \exp \left[ \frac{y\theta - b(\theta)}{a(\phi)} + c(y, \phi) \right] \]

Mean and variance: - \(E[Y] = b'(\theta)\) - \(Var[Y] = b''(\theta) \cdot a(\phi)\)


Common GLM Examples

Logistic Regression (Binary outcome)

\[ Y_i \sim Bernoulli(p_i), \quad \log\left(\frac{p_i}{1 - p_i}\right) = x_i^T \beta \] - \(p_i = P(Y_i = 1)\) - Link: logit - \(\beta_j\) = change in log-odds - \(\exp(\beta_j)\) = odds ratio (OR)

Poisson Regression (Count outcome)

\[ Y_i \sim Poisson(\lambda_i), \quad \log(\lambda_i) = x_i^T \beta \] - \(E[Y_i] = Var[Y_i] = \lambda_i\) - Link: log - \(\exp(\beta_j)\) = incidence rate ratio (IRR)

With an Offset:

\[ \log(\lambda_i) = \log(N_i) + x_i^T \beta \] Used when modeling rates per population/time.


Model Assumptions

Model Assumptions
Logistic Independent observations, correct link, \(Var(Y_i) = p_i(1 - p_i)\)
Poisson Mean = variance, independence, no overdispersion
All GLMs Correct link and linear predictor, exponential family distribution

Generalized Linear Mixed Models (GLMMs)

Why GLMMs?

Use GLMMs when: - Observations are grouped or clustered (e.g., repeated measures, schools, hospitals) - There is within-group correlation - Need to model between-group variability

GLMM Structure

\[ g(\mu_{ij}) = x_{ij}^T \beta + z_{ij}^T u_j \]

Where: - \(x_{ij}\) = fixed effects covariates - \(z_{ij}\) = random effects design vector - \(u_j \sim N(0, \sigma^2)\) = random effect for group \(j\)


Types of Random Effects

Random Effects Syntax Meaning
(1 | group) Random intercept by group
(slope | group) Random slope for a covariate by group
(1 + slope | group) Random intercept and random slope by group

Estimation & Interpretation

  • Fit using glmer() from lme4
  • Interpretation of \(\beta\): population-level effects
  • Random effect variances tell you how much variability is due to group differences

Examples

# Random intercept logistic model
glmer(y ~ x1 + (1 | group_id), family = binomial, data = df)

# Poisson model with offset
glmer(count ~ x1 + offset(log(exposure)) + (1 | site), family = poisson, data = df)

Common Pitfalls & Notes

  • Check overdispersion in Poisson models
  • For binary outcomes with few clusters, estimates may be biased
  • GLMMs are sensitive to missing data and convergence

Interpretation Cheat Sheet

Model Coefficient \(\beta_j\) \(\exp(\beta_j)\)
Logistic Change in log-odds Odds Ratio (OR)
Poisson Change in log count Incidence Rate Ratio (IRR)
GLMM Marginal fixed effect Population-average OR or IRR