G The Generalized Linear Model

G.1 The General Formulation

First we provide a brief review of the linear regression model \[ Y_i = \boldsymbol{x}_i^{\top} \boldsymbol{\beta} + \epsilon_i \, \mbox{ with } \, \epsilon_i \sim \mathop{\mathrm{N}}(0, \sigma^2) \quad i=1,\dots,n, \] with explanatory variables \(\boldsymbol{x}_i \in \mathbb{R}^p\) and regression coefficients \(\boldsymbol{\beta} \in \mathbb{R}^p\), rewritten as \[Y_i \sim \mathop{\mathrm{N}}(\mu_i=\boldsymbol{ x}_i^{\top} \boldsymbol{ \beta}, \sigma^2)\] with mean \(\mu_i\) and variance \(\sigma^2\).

This can be generalized in two ways: firstly we can consider other distributions than the normal distribution for the response variable \(Y_i\) (for example, Bernoulli, Binomial, Poisson, \(\ldots\)). Secondly, the mean \(\mathop{\mathrm{\mathsf{E}}}(Y_i)=\mu_i\) will be linked to the linear predictor \(\eta_i = \boldsymbol{ x}_i^{\top} \boldsymbol{ \beta}\) using a (monotone) link function \(g = h^{-1}\): \[ g(\mu_i) = \eta_i \, \mbox{ so } \, \mu_i = h(\eta_i). \] Notice how this is different from general linear models (Appendix F.2): general linear models assume a normal distribution of residuals, while generalized linear models relax this assumption and allow for other distributions from the exponential family. A linear model can be viewed as a generalized linear model with normal errors and identity link.

G.1.1 Logistic regression

Instead of the normality assumption on \(Y_i\) in linear models, other distributions of \(Y_i\) are allowed for generalization.

For logistic regression, consider \[\begin{align*} Y_i &\sim \mathop{\mathrm{Bin}}(1, \mu_i=\pi_i) \\ \mathop{\mathrm{logit}}(\pi_i) &= \boldsymbol{x}_i^{\top} \boldsymbol{\beta}\\ \pi_i &= \text{expit}(\boldsymbol{ x}_i^{\top} \boldsymbol{ \beta}). \end{align*}\] So here the link function is \[ g(\pi_i) = \mathop{\mathrm{logit}}(\pi_i) = \log \frac{\pi_i}{1-\pi_i}. \]

Except for the intercept, we can interpret \(\exp(\beta_k), (k=1,\ldots,p)\) as odds ratio in this model. Also notice that the variance is a function of the mean: \(\mathop{\mathrm{Var}}(Y_i)=\pi_i(1-\pi_i)\).

If covariates \(x_i\) are identical for some observations \(Y_i\), then aggregation to the binomial form \(Y_j = \sum_{i} Y_i \sim \mathop{\mathrm{Bin}}(n_j, \pi_j)\) is possible.

An example of a logistic regression can be found in Section 8.2.1.

G.1.2 Log-linear Poisson regression

In addition to binary data, we may consider count data, for which the random variable \(Y_i\) follows a Poisson distribution.

Poisson regression assumes \[\begin{align*} Y_i &\sim \mathop{\mathrm{Po}}(\mu_i)\\ \log(\mu_i) &= \boldsymbol{x}_i^{\top} \boldsymbol{\beta} ,\\ \mu_i &= \exp( \boldsymbol{x}_i^{\top} \boldsymbol{\beta} ),\\ \end{align*}\] so the link function in this case is \[g(\mu_i) = \log(\mu_i).\] As a consequence, \(\exp(\beta_k), (k=1,\ldots,p)\) can be interpreted as a multiplicative rate ratio for the expected count. It is also important to note that the variance increases with the mean in this model: \(\mathop{\mathrm{Var}}(Y_i)=\mathop{\mathrm{\mathsf{E}}}[Y_i]=\mu_i\).

If several observations share identical covariate values \(\boldsymbol{x}_i\), then they can be combined into a single grouped Poisson response with outcome \(Y_j = \sum_i Y_i\) and corresponding mean \(\mu_j = \sum_i \mu_i\).

G.2 Generalized estimating equations

G.2.1 Quasi-likelihood

Quasi-likelihood is used to deal with overdispersion, which means that the variance of the outcome is greater than what is assumed by the model. For example, in a Poisson regression model we have the assumption \(\mathop{\mathrm{Var}}(Y_i)=\mathop{\mathrm{\mathsf{E}}}(Y_i)=\lambda_i\), but a more flexible quasi-likelihood model for the variance of \(Y_i\) is \(\mathop{\mathrm{Var}}^*(Y_i)=\phi \cdot \lambda_i\). When \(\phi>1\), there is overdispersion.

The score equation in the ordinary generalized linear model is given by \[\begin{equation} {S}(\boldsymbol{\beta}) = \sum_{i=1}^m \left(\frac{\partial {\mu}_i}{\partial \boldsymbol{\beta}}\right) \mathop{\mathrm{Var}}(Y_i)^{-1} (Y_i - \mu_i) = {0}. \tag{G.1} \end{equation}\] To increase flexibility, the quasi-likelihood approach considers the generalized estimating equation with some working variance function \(\mathop{\mathrm{Var}}^*(Y_i)\). \[ {S}^*(\boldsymbol{\beta}) = \sum_{i=1}^m \left(\frac{\partial {\mu}_i}{\partial \boldsymbol{\beta}}\right) {\mathop{\mathrm{Var}}}^*(Y_i)^{-1} (Y_i - \mu_i). \] Then the estimate \(\widehat{\boldsymbol{\beta}}\) is the root of \({S}^*(\boldsymbol{\beta})\).

G.2.1.1 Quasi-Poisson and quasi-binomial models

Two widely used extensions of the quasi-likelihood framework are the quasi-Poisson and quasi-binomial models.

Quasi-Poisson:

The quasi-Poisson model relaxes the standard assumption by introducing a dispersion parameter \(\phi\), so that \[ {\mathop{\mathrm{Var}}}^*(Y_i) = \phi \cdot \mathop{\mathrm{Var}}(Y_i), \quad \phi > 0. \] This allows the model to handle overdispersion (\(\phi > 1\)) or underdispersion (\(\phi < 1\)) in count data, while keeping the mean structure unchanged. Notice that the estimates of \(\beta\) are the same from models with and without overdispersion if \(\mathop{\mathrm{Var}}^*(Y_i) = \phi \cdot \mathop{\mathrm{Var}}(Y_i)\), but the standard errors of \(\hat{\beta}\) will be inflated by \(\sqrt{\hat{\phi}}\) and variances will be inflated by \(\hat{\phi}\).

We can estimate the dispersion parameter \(\phi\) by \(\hat{\phi}=W/(n-p),\) where \(W\) is the likelihood ratio test statistic and \(n-p\) are the associated degrees of freedom.

Quasi-binomial:

In standard binomial regression with \(n_i\) trials, the variance is \(\mathop{\mathrm{Var}}(Y_i) = n_i \pi_i (1-\pi_i)\), where \(\pi_i\) is the success probability. The quasi-binomial model generalizes this to \[ {\mathop{\mathrm{Var}}}^*(Y_i) = \phi \cdot n_i \pi_i (1-\pi_i), \] again introducing a dispersion parameter \(\phi\). This is particularly useful in cases where the observed proportion of successes shows greater variability than expected under the binomial model.

G.2.2 Multivariate quasi-likelihood

For longitudinal data, a multivariate quasi-likelihood approach allows explicitly for correlation between observations made from the same individual.

Recall in Appendix F.2.1 Equation (F.4) the weighted least-squares estimate is \[\widehat{\boldsymbol{\beta}}_{\boldsymbol{W}} = \left( \sum_{i=1}^m \boldsymbol{X}_i^{\top}\boldsymbol{W}_0 \boldsymbol{X}_i \right)^{-1} \sum_{i=1}^m \boldsymbol{X}_i^{\top}\boldsymbol{W}_0 \boldsymbol{Y}_i\] when the working correlation matrix \(\boldsymbol{W}^{-1}\) is block-diagonal with entries \(\boldsymbol{W}_0^{-1}\).

\(\hat{\boldsymbol{\beta}}_{\boldsymbol{W}}\) can be seen as solution of the multivariate score equation \[\begin{equation} \boldsymbol{S}(\boldsymbol{\beta}) = \sum_{i=1}^m \left(\frac{\partial \boldsymbol{\mu}_i}{\partial \boldsymbol{\beta}}\right) \boldsymbol{W}_0 (\boldsymbol{Y}_i - \boldsymbol{\mu}_i)=\boldsymbol{0}, \tag{G.2} \end{equation}\] where \(\boldsymbol{\mu}_i=\mathop{\mathrm{\mathsf{E}}}(\boldsymbol{Y}_i)=\boldsymbol{X}_i \boldsymbol{\beta}\) and therefore \(\left(\frac{\partial \boldsymbol{\mu}_i}{\partial\boldsymbol{\beta}}\right)=\boldsymbol{X}_i^{\top}\).

The score function (G.2) differs from the score equation in the ordinary generalized linear model ((G.1)) by replacing \(\mathop{\mathrm{Var}}(Y_i)^{-1}\) with a weight \(\boldsymbol{W}_0\).

The basic idea is to use the multivariate quasi-likelihood approach for non-normal data to take into account correlation between components of \(Y_i\). Therefore we have the generalized estimating equation \[\begin{equation} \boldsymbol{S}(\boldsymbol{\beta}) = \sum_{i=1}^m \left(\frac{\partial \boldsymbol{\mu}_i}{\partial \boldsymbol{\beta}}\right) \mathop{\mathrm{Cov}}(\boldsymbol{Y}_i)^{-1} (\boldsymbol{Y}_i - \boldsymbol{\mu}_i)=\boldsymbol{0}, \tag{G.3} \end{equation}\] where the multivariate \(\mathop{\mathrm{Cov}}(\boldsymbol{Y}_i)^{-1}\) allows for dependence and can be replaced with some working correlation matrix \(\boldsymbol{W}^{-1}\). This approach is called quasi-likelihood because typically no true likelihood function exists with (G.3) as the score equation.

It is possible to obtain robust standard errors using the sandwich formula. The estimates \(\hat{\boldsymbol{\beta}}\) (and \(\hat{\boldsymbol{ \alpha}}\)) are asymptotically normal.

The estimate \(\hat{\boldsymbol{\beta}}\) is consistent (for \(m \rightarrow \infty\)), even if the correlation structure is not correct, which means \(\mathop{\mathrm{Corr}}(\boldsymbol{Y}_i) \neq \boldsymbol{W}^{-1}\), while the estimates of \(\boldsymbol{\beta}\) are efficient if \(\mathop{\mathrm{Corr}}(\boldsymbol{Y}_i) \approx \boldsymbol{W}^{-1}\).

G.2.3 Bounds of correlation between binary variables

Here we discuss the bounds of correlation between binary variable. Consider two binary variables \(Y_1\) and \(Y_2\) with success probabilities \(\pi_1\) and \(\pi_2\), say. One can show \[\max(0,\pi_1+\pi_2-1) \leq \operatorname{\mathsf{Pr}}(Y_1=1,Y_2=1) \leq \min(\pi_1,\pi_2).\] This implies certain constraints on the correlation \[\begin{align*} \rho(Y_1,Y_2) &= \frac{\mathop{\mathrm{\mathsf{E}}}(Y_1 Y_2) - \mathop{\mathrm{\mathsf{E}}}(Y_1) \mathop{\mathrm{\mathsf{E}}}(Y_2)}{\{\mathop{\mathrm{Var}}(Y_1)\mathop{\mathrm{Var}}(Y_2)\}^{1/2}} \\ &=\frac{\operatorname{\mathsf{Pr}}(Y_1=1,Y_2=1) - \pi_1 \pi_2}{\{\pi_1(1-\pi_1)\pi_2(1-\pi_2)\}^{1/2}}. \end{align*}\]

For example, suppose \(\pi_1=0.5\) and \(\pi_2=0.9\), then the correlation between \(Y_1\) and \(Y_2\) cannot be larger than 1/3 in absolute value:

p1 <- 0.5
p2 <- 0.9
(lim.joint <- c(max(c(0, p1+p2-1)), min(c(p1, p2))))

## [1] 0.4 0.5

(lim.corr <- (lim.joint - p1*p2) / sqrt(p1*(1-p1)*p2*(1-p2)))

## [1] -0.3333333  0.3333333

Therefore, correlation as a measure of association may not be optimal for binary variables. Alternatively, we can use the marginal odds ratio \[\mbox{OR}(Y_1,Y_2) = \frac{\operatorname{\mathsf{Pr}}(Y_1=1,Y_2=1)\operatorname{\mathsf{Pr}}(Y_1=0,Y_2=0)}{\operatorname{\mathsf{Pr}}(Y_1=1,Y_2=0)\operatorname{\mathsf{Pr}}(Y_1=0,Y_2=1)}.\]

G.3 Generalized Linear Models with Random Effects

Conditional on random effects \(\boldsymbol{U}_i \sim \mathop{\mathrm{N}}_q(\boldsymbol{0}, \boldsymbol{G})\) , \(Y_{ij}\) follows a generalized linear model with mean \[\mu_{ij} = \mathop{\mathrm{\mathsf{E}}}(Y_{ij}) = h(\eta_{ij})\] and linear predictor \[ \eta_{ij}=\boldsymbol{x}_{ij}^T\boldsymbol{\beta}^* + \boldsymbol{d}_{ij}^T\boldsymbol{U}_i,\] where \(\boldsymbol{\beta}^*\) are the conditional coefficients. In this conditional model, parameters estimation is based on the marginal distribution of \(\boldsymbol{Y}_i\), which can be obtained through numerical integration.

G.3.1 Logistic regression model with random intercept

Consider a logistic regression model with random intercept \[\mathop{\mathrm{logit}}\operatorname{\mathsf{Pr}}(Y_{ij}=1\,\vert\,U_i) = \boldsymbol{x}_{ij}^T\boldsymbol{\beta}^* + U_i, \quad U_i \sim \mathop{\mathrm{N}}(0, \nu^2).\] In a marginal model, we directly specify \[\mathop{\mathrm{logit}}\operatorname{\mathsf{Pr}}(Y_{ij}=1) = \boldsymbol{x}_{ij}^T\boldsymbol{\beta}.\] It can be shown that \[\begin{equation} \boldsymbol{\beta} \approx \underbrace{(c^2 \nu^2 + 1)^{-1/2}}_{q(\nu^2)} \boldsymbol{\beta}^*, \mbox{ where } c \approx 10/17. \tag{G.4} \end{equation}\] Therefore, marginal coefficients \(\boldsymbol{\beta}\) and conditional coefficients \(\boldsymbol{\beta}^*\) have the same sign with \(|\boldsymbol{\beta}| \leq |\boldsymbol{\beta}^*|\). Moreover, their ratio \(q(\nu^2)\) decreases with increasing variance \(\nu^2\), as shown in Table G.1.

Table G.1: Values of \(q(\nu^2)\) for different \(\nu^2\).
\[\nu^2\]	0.00	0.10	0.50	1.00	2.00	5.00
\[q(\nu^2)\]	1.23	1.45	2.01	2.34	2.89	3.12

Examples of a logistic regression with random effects can be found in 13.1.1.1 and 14.2.2.