Chair of Spatial Data Science and Statistical Learning

Lecture 6 Bayesian Modelling

6.1 Overview

In this part of the lecture, we delve deeper into the concepts of Bayesian inference by exploring various modeling techniques. Using the example of linear models, we present different approaches for modeling the full conditionals of the parameters included. Our main focus will be on the Gibbs Sampling algorithm. To help clarify the different components and hyperparameters of the algorithm, we include an interactive Shiny app.

6.2 Example: Linear Model

6.2.1 Setup

For the Bayesian modeling approaches, we take the linear model that was introduced in Lecture 3.
\[ \begin{eqnarray*} \boldsymbol{y} &=& \boldsymbol{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}\\ \varepsilon_i&\overset{iid}{\sim}&\N(0,\sigma^2), i \in 1,\ldots,n\\ Y_i&\overset{iid}{\sim}&\N(\beta_0+\sum_{j=1}^{p}\beta_jx_{ij},\sigma^2)\\ \boldsymbol{Y}&\sim&\MVN(\boldsymbol{X}\boldsymbol{\beta}, \sigma^2\boldsymbol{I}) \end{eqnarray*} \]
Reminder: Dimensions

\[\begin{eqnarray} \boldsymbol{Y} = \underbrace{\left( \begin{array}{c} y_1 \\ \vdots \\ y_n \end{array} \right)}_{n\times 1} \end{eqnarray}\]

\[\begin{eqnarray} \boldsymbol{\mu_Y} = \boldsymbol{X\beta} = \underbrace{\left(\begin{array}{c} 1 & x_11 & \cdots & x_1p \\ \vdots & \ddots & \vdots \\ 1 & x_n1 & \cdots & x_np \end{array} \right)}_{n \times (p+1)} \underbrace{\left(\begin{array}{c} \beta_0 \\ \vdots \\ \beta_p \end{array} \right)}_{(p+1) \times 1} \end{eqnarray}\]

\[\begin{eqnarray} \boldsymbol{\Sigma_Y} = \sigma^2\boldsymbol{I} = \underbrace{\left(\begin{array}{c} \sigma^2 & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & \sigma^2 \end{array} \right)}_{n \times n} \end{eqnarray}\]

6.2.2 Bayesian Considerations

In Bayesian inference we assume that parameters have distributions. In the linear model the random parameters for which we need distributions are \(\bb\) and \(\sigma^2\). To find suitable distributions, we have to consider the domain for each of the parameters:

  1. The parameter vector \(\bb\) is defined of the entire real line: \(-\infty < \beta < \infty\). Since the vector may contain more than one random variable, the Multivariate Normal Distribution is a good choice.
  2. For the variance \(\sigma^2\) to be defined correctly, the distribution must have finite non-negative values and is typically defined over real numbers: \(\sigma^2\) > 0. This includes the Chi-Squared, Gamma, and Inverse Gamma distributions. We choose the latter since it is conjugate to the Multivariate Normal Distribution.

6.2.3 Bayesian Ingredients

Every Bayesian Model needs a well-defined Likelihood and Prior distribution(s) for the parameter(s):

Likelihood:

  • \(\boldsymbol{Y}\sim\MVN(\boldsymbol{X}\boldsymbol{\beta}, \sigma^2\boldsymbol{I})\)

Priors:

  • \(\bb|\sigma^2\sim\MVN(\boldsymbol{m},\sigma^2\boldsymbol{M})\)
  • \(\sigma^2\sim \IG(a_0,b_0)\)
Question: How would an uninformative prior for the coefficient vector look like?

When we have no prior knowledge about the possible values of the coefficient vector \(\bb\) could look like, an typical approach is to set:

\[ \boldsymbol{m} = \left(\begin{array}{c} 0 \\ \vdots \\ 0 \end{array} \right) \] and

\[ \boldsymbol{M} = \left(\begin{array}{c} \psi & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & \psi \end{array} \right) \] where \(\psi\) is an arbitrary large number, such as 100,000. This choice effectively results in a flat prior distribution for \(\bb\).

6.3 Bayesian Inference

Overview: - analytical calculation of posterior distribution - Monte Carlo Markov Chain (MCMC) techniques - Gibbs Sampler - Metropolis Hastings Sampler

6.3.1 Gibbs Sampler

hier noch mal die Nummerierung neu.

6.3.1.1 General Algorithm

6.3.1.2 Example: Multivariate Normal Distribution

6.3.1.2.1 Full Conditional: \(\bb\)
6.3.1.2.2 Full Conditional: \(\sigma^2\)
6.3.1.2.3 Application: Algorithm
6.3.1.2.4 Interactive Shiny App