Module 2 Cheat Sheet

Generalized Linear Models (GLMs)

What is a GLM?

A Generalized Linear Model extends linear regression to handle: - Non-normal outcomes (e.g., binary, count) - Non-constant variance (heteroscedasticity) - A link function to transform the mean

Components of a GLM

Random component: Distribution of the response (from the exponential family)
Systematic component: Linear predictor \(\eta_i = x_i^T \beta\)
Link function: Relates expected value \(\mu_i = E[Y_i|x_i]\) to \(\eta_i\) \[ g(\mu_i) = \eta_i = x_i^T \beta \]

Exponential Family Form

A distribution belongs to the exponential family if it can be written as: \[ f(y; \theta, \phi) = \exp \left[ \frac{y\theta - b(\theta)}{a(\phi)} + c(y, \phi) \right] \]

Mean and variance: - \(E[Y] = b'(\theta)\) - \(Var[Y] = b''(\theta) \cdot a(\phi)\)

Common GLM Examples

Logistic Regression (Binary outcome)

\[ Y_i \sim Bernoulli(p_i), \quad \log\left(\frac{p_i}{1 - p_i}\right) = x_i^T \beta \] - \(p_i = P(Y_i = 1)\) - Link: logit - \(\beta_j\) = change in log-odds - \(\exp(\beta_j)\) = odds ratio (OR)

Poisson Regression (Count outcome)

\[ Y_i \sim Poisson(\lambda_i), \quad \log(\lambda_i) = x_i^T \beta \] - \(E[Y_i] = Var[Y_i] = \lambda_i\) - Link: log - \(\exp(\beta_j)\) = incidence rate ratio (IRR)

With an Offset:

\[ \log(\lambda_i) = \log(N_i) + x_i^T \beta \] Used when modeling rates per population/time.

Model Assumptions

Model	Assumptions
Logistic	Independent observations, correct link, \(Var(Y_i) = p_i(1 - p_i)\)
Poisson	Mean = variance, independence, no overdispersion
All GLMs	Correct link and linear predictor, exponential family distribution

Generalized Linear Mixed Models (GLMMs)

Why GLMMs?

Use GLMMs when: - Observations are grouped or clustered (e.g., repeated measures, schools, hospitals) - There is within-group correlation - Need to model between-group variability

GLMM Structure

\[ g(\mu_{ij}) = x_{ij}^T \beta + z_{ij}^T u_j \]

Where: - \(x_{ij}\) = fixed effects covariates - \(z_{ij}\) = random effects design vector - \(u_j \sim N(0, \sigma^2)\) = random effect for group \(j\)

Types of Random Effects

Random Effects Syntax	Meaning
`(1 \| group)`	Random intercept by group
`(slope \| group)`	Random slope for a covariate by group
`(1 + slope \| group)`	Random intercept and random slope by group

Estimation & Interpretation

Fit using glmer() from lme4
Interpretation of \(\beta\): population-level effects
Random effect variances tell you how much variability is due to group differences

Examples

# Random intercept logistic model
glmer(y ~ x1 + (1 | group_id), family = binomial, data = df)

# Poisson model with offset
glmer(count ~ x1 + offset(log(exposure)) + (1 | site), family = poisson, data = df)

Common Pitfalls & Notes

Check overdispersion in Poisson models
For binary outcomes with few clusters, estimates may be biased
GLMMs are sensitive to missing data and convergence

Interpretation Cheat Sheet

Model	Coefficient \(\beta_j\)	\(\exp(\beta_j)\)
Logistic	Change in log-odds	Odds Ratio (OR)
Poisson	Change in log count	Incidence Rate Ratio (IRR)
GLMM	Marginal fixed effect	Population-average OR or IRR