Module 2 Cheat Sheet
Generalized Linear Models (GLMs)
What is a GLM?
A Generalized Linear Model extends linear regression to handle: - Non-normal outcomes (e.g., binary, count) - Non-constant variance (heteroscedasticity) - A link function to transform the mean
Components of a GLM
- Random component: Distribution of the response (from the exponential family)
- Systematic component: Linear predictor \(\eta_i = x_i^T \beta\)
- Link function: Relates expected value \(\mu_i = E[Y_i|x_i]\) to \(\eta_i\) \[ g(\mu_i) = \eta_i = x_i^T \beta \]
Exponential Family Form
A distribution belongs to the exponential family if it can be written as: \[ f(y; \theta, \phi) = \exp \left[ \frac{y\theta - b(\theta)}{a(\phi)} + c(y, \phi) \right] \]
Mean and variance: - \(E[Y] = b'(\theta)\) - \(Var[Y] = b''(\theta) \cdot a(\phi)\)
Common GLM Examples
Logistic Regression (Binary outcome)
\[ Y_i \sim Bernoulli(p_i), \quad \log\left(\frac{p_i}{1 - p_i}\right) = x_i^T \beta \] - \(p_i = P(Y_i = 1)\) - Link: logit - \(\beta_j\) = change in log-odds - \(\exp(\beta_j)\) = odds ratio (OR)
Poisson Regression (Count outcome)
\[ Y_i \sim Poisson(\lambda_i), \quad \log(\lambda_i) = x_i^T \beta \] - \(E[Y_i] = Var[Y_i] = \lambda_i\) - Link: log - \(\exp(\beta_j)\) = incidence rate ratio (IRR)
With an Offset:
\[ \log(\lambda_i) = \log(N_i) + x_i^T \beta \] Used when modeling rates per population/time.
Model Assumptions
Model | Assumptions |
---|---|
Logistic | Independent observations, correct link, \(Var(Y_i) = p_i(1 - p_i)\) |
Poisson | Mean = variance, independence, no overdispersion |
All GLMs | Correct link and linear predictor, exponential family distribution |
Generalized Linear Mixed Models (GLMMs)
Why GLMMs?
Use GLMMs when: - Observations are grouped or clustered (e.g., repeated measures, schools, hospitals) - There is within-group correlation - Need to model between-group variability
GLMM Structure
\[ g(\mu_{ij}) = x_{ij}^T \beta + z_{ij}^T u_j \]
Where: - \(x_{ij}\) = fixed effects covariates - \(z_{ij}\) = random effects design vector - \(u_j \sim N(0, \sigma^2)\) = random effect for group \(j\)
Types of Random Effects
Random Effects Syntax | Meaning |
---|---|
(1 | group) |
Random intercept by group |
(slope | group) |
Random slope for a covariate by group |
(1 + slope | group) |
Random intercept and random slope by group |
Estimation & Interpretation
- Fit using
glmer()
fromlme4
- Interpretation of \(\beta\): population-level effects
- Random effect variances tell you how much variability is due to group differences
Examples
# Random intercept logistic model
glmer(y ~ x1 + (1 | group_id), family = binomial, data = df)
# Poisson model with offset
glmer(count ~ x1 + offset(log(exposure)) + (1 | site), family = poisson, data = df)
Common Pitfalls & Notes
- Check overdispersion in Poisson models
- For binary outcomes with few clusters, estimates may be biased
- GLMMs are sensitive to missing data and convergence
Interpretation Cheat Sheet
Model | Coefficient \(\beta_j\) | \(\exp(\beta_j)\) |
---|---|---|
Logistic | Change in log-odds | Odds Ratio (OR) |
Poisson | Change in log count | Incidence Rate Ratio (IRR) |
GLMM | Marginal fixed effect | Population-average OR or IRR |