Lecture 6 Bayesian Modelling
6.1 Overview
In this part of the lecture, we delve deeper into the concepts of Bayesian inference by exploring various modeling techniques. Using the example of linear models, we present different approaches for modeling the full conditionals of the parameters included. Our main focus will be on the Gibbs Sampling algorithm. To help clarify the different components and hyperparameters of the algorithm, we include an interactive Shiny app.
6.2 Example: Linear Model
6.2.1 Setup
For the Bayesian modeling approaches, we take the linear model that was introduced in Lecture 3.Reminder: Dimensions
\[\begin{eqnarray} \boldsymbol{Y} = \underbrace{\left( \begin{array}{c} y_1 \\ \vdots \\ y_n \end{array} \right)}_{n\times 1} \end{eqnarray}\]
\[\begin{eqnarray} \boldsymbol{\mu_Y} = \boldsymbol{X\beta} = \underbrace{\left(\begin{array}{c} 1 & x_11 & \cdots & x_1p \\ \vdots & \ddots & \vdots \\ 1 & x_n1 & \cdots & x_np \end{array} \right)}_{n \times (p+1)} \underbrace{\left(\begin{array}{c} \beta_0 \\ \vdots \\ \beta_p \end{array} \right)}_{(p+1) \times 1} \end{eqnarray}\]
\[\begin{eqnarray} \boldsymbol{\Sigma_Y} = \sigma^2\boldsymbol{I} = \underbrace{\left(\begin{array}{c} \sigma^2 & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & \sigma^2 \end{array} \right)}_{n \times n} \end{eqnarray}\]6.2.2 Bayesian Considerations
In Bayesian inference we assume that parameters have distributions. In the linear model the random parameters for which we need distributions are \(\bb\) and \(\sigma^2\). To find suitable distributions, we have to consider the domain for each of the parameters:
- The parameter vector \(\bb\) is defined of the entire real line: \(-\infty < \beta < \infty\). Since the vector may contain more than one random variable, the Multivariate Normal Distribution is a good choice.
- For the variance \(\sigma^2\) to be defined correctly, the distribution must have finite non-negative values and is typically defined over real numbers: \(\sigma^2\) > 0. This includes the Chi-Squared, Gamma, and Inverse Gamma distributions. We choose the latter since it is conjugate to the Multivariate Normal Distribution.
6.2.3 Bayesian Ingredients
Every Bayesian Model needs a well-defined Likelihood and Prior distribution(s) for the parameter(s):
Likelihood:
- \(\boldsymbol{Y}\sim\MVN(\boldsymbol{X}\boldsymbol{\beta}, \sigma^2\boldsymbol{I})\)
Priors:
- \(\bb|\sigma^2\sim\MVN(\boldsymbol{m},\sigma^2\boldsymbol{M})\)
- \(\sigma^2\sim \IG(a_0,b_0)\)
Question: How would an uninformative prior for the coefficient vector look like?
When we have no prior knowledge about the possible values of the coefficient vector \(\bb\) could look like, an typical approach is to set:
\[ \boldsymbol{m} = \left(\begin{array}{c} 0 \\ \vdots \\ 0 \end{array} \right) \] and
\[ \boldsymbol{M} = \left(\begin{array}{c} \psi & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & \psi \end{array} \right) \] where \(\psi\) is an arbitrary large number, such as 100,000. This choice effectively results in a flat prior distribution for \(\bb\).