2.6 A simple working example
We will illustrate some conceptual differences between the Bayesian and Frequentist statistical approaches by performing inference on a random sample \(\mathbf{Y} = [Y_1, Y_2, \dots, Y_N]\), where \(Y_i \stackrel{iid}{\sim} N(\mu, \sigma^2)\) for \(i = 1, 2, \dots, N\).
In particular, we set \(\pi(\mu, \sigma) = \pi(\mu) \pi(\sigma) \propto \frac{1}{\sigma}\). This is a standard non-informative improper prior (Jeffreys prior, see Chapter 3). That is, this prior is perfectly compatible with the sample information. Additionally, we assume independent priors for \(\mu\) and \(\sigma\).
\[ \begin{aligned} \pi(\mu,\sigma|\mathbf{y}) &\propto \frac{1}{\sigma}\times (\sigma^2)^{-N/2}\exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^N (y_i-\mu)^2\right\} \\ &= \frac{1}{\sigma}\times (\sigma^2)^{-N/2}\exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^N \left((y_i-\bar{y}) - (\mu-\bar{y})\right)^2\right\} \\ &= \frac{1}{\sigma}\exp\left\{-\frac{N}{2\sigma^2}(\mu-\bar{y})^2\right\}\times (\sigma)^{-N}\exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^N (y_i-\bar{y})^2\right\} \\ &= \frac{1}{\sigma}\exp\left\{-\frac{N}{2\sigma^2}(\mu-\bar{y})^2\right\}\times (\sigma)^{-(\alpha_n+1)}\exp\left\{-\frac{\alpha_n\hat{\sigma}^2}{2\sigma^2}\right\}, \end{aligned} \]
where \(\bar{y} = \frac{\sum_{i=1}^N y_i}{N}\), \(\alpha_n = N-1\), and \(\hat{\sigma}^2 = \frac{\sum_{i=1}^N (y_i-\bar{y})^2}{N-1}\).
The first term in the last expression is the kernel of a normal density, \(\mu|\sigma,\mathbf{y} \sim N(\bar{y}, \sigma^2 / N)\). The second term is the kernel of an inverted gamma density (Zellner 1996), \(\sigma|\mathbf{y} \sim IG(\alpha_n, \hat{\sigma}^2)\). Therefore,
\[ \pi(\mu|\sigma,\mathbf{y}) = \frac{1}{\sqrt{2\pi \sigma^2 / N}} \exp\left\{-\frac{N}{2\sigma^2}(\mu-\bar{y})^2\right\}, \]
and
\[ \pi(\sigma|\mathbf{y}) = \frac{2}{\Gamma(\alpha_n / 2)} \left(\frac{\alpha_n \hat{\sigma}^2}{2}\right)^{\alpha_n / 2} \frac{1}{\sigma^{\alpha_n+1}} \exp\left\{-\frac{\alpha_n \hat{\sigma}^2}{2 \sigma^2}\right\}. \]
Observe that \(\mathbb{E}[\mu | \sigma, \mathbf{y}] = \bar{y}\); this is also the maximum likelihood (Frequentist) point estimate of \(\mu\) in this setting. In addition, the Frequentist \((1-\alpha)\%\) confidence interval and the Bayesian \((1-\alpha)\%\) credible interval have exactly the same form, \(\bar{y} \pm |z_{\alpha/2}| \frac{\sigma}{\sqrt{N}}\), where \(z_{\alpha/2}\) is the \(\alpha/2\) percentile of a standard normal distribution. However, the interpretations are entirely different. The confidence interval has a probabilistic interpretation under sampling variability of \(\bar{Y}\): in repeated sampling, \((1-\alpha)\%\) of the intervals \(\bar{Y} \pm |z_{\alpha/2}| \frac{\sigma}{\sqrt{N}}\) would include \(\mu\). However, given an observed realization of \(\bar{Y}\), say \(\bar{y}\), the probability of \(\bar{y} \pm |z_{\alpha/2}| \frac{\sigma}{\sqrt{N}}\) including \(\mu\) is either 1 or 0. This is why we refer to it as a \((1-\alpha)\%\) confidence interval. On the other hand, \(\bar{y} \pm |z_{\alpha/2}| \frac{\sigma}{\sqrt{N}}\) has a straightforward probabilistic interpretation in the Bayesian framework: there is a \((1-\alpha)\%\) probability that \(\mu\) lies within this interval.
If we want to get the marginal posterior density of \(\mu\),
\[\begin{align*} \pi(\mu|\mathbf{y})&=\int_{0}^{\infty} \pi(\mu,\sigma|\mathbf{y}) d\sigma\\ &\propto \int_{0}^{\infty} \frac{1}{\sigma}\times (\sigma^2)^{-N/2}\exp\left\{-\frac{1}{2\sigma^2}\sum_{i=1}^N (y_i-\mu)^2\right\} d\sigma\\ &= \int_{0}^{\infty} \left(\frac{1}{\sigma}\right)^{N+1} \exp\left\{-\frac{N}{2\sigma^2}\frac{\sum_{i=1}^N (y_i-\mu)^2}{N}\right\} d\sigma\\ &=\left[\frac{2}{\Gamma(N/2)}\left(\frac{N\sum_{i=1}^N (y_i-\mu)^2}{2N}\right)^{N/2}\right]^{-1}\\ &\propto \left[\sum_{i=1}^N (y_i-\mu)^2\right]^{-N/2}\\ &=\left[\sum_{i=1}^N ((y_i-\bar{y})-(\mu-\bar{y}))^2\right]^{-N/2}\\ &=[\alpha_n\hat{\sigma}^2+N(\mu-\bar{y})^2]^{-N/2}\\ &\propto \left[1+\frac{1}{\alpha_n}\left(\frac{\mu-\bar{y}}{\hat{\sigma}/\sqrt{N}}\right)^2\right]^{-(\alpha_n+1)/2}. \end{align*}\]
The fourth line arises from the kernel of an inverted gamma density with \(N\) degrees of freedom in the integral (Zellner 1996).
The last expression represents the kernel of a Student’s \(t\)-distribution with \(\alpha_n = N - 1\) degrees of freedom, expected value equal to \(\bar{y}\), and variance \(\frac{\hat{\sigma}^2}{N} \left( \frac{\alpha_n}{\alpha_n - 2} \right)\). Therefore, \(\mu | \mathbf{y} \sim t \left( \bar{y}, \frac{\hat{\sigma}^2}{N} \left( \frac{\alpha_n}{\alpha_n - 2} \right), \alpha_n \right)\).
Observe that a \((1-\alpha)\%\) confidence interval and a \((1-\alpha)\%\) credible interval have exactly the same form, \(\bar{y} \pm |t_{\alpha/2}^{\alpha_n}| \frac{\hat{\sigma}}{\sqrt{N}}\), where \(t_{\alpha/2}^{\alpha_n}\) is the \(\alpha/2\) percentile of a Student’s \(t\)-distribution. However, the interpretations are entirely different.
The mathematical similarity between the Frequentist and Bayesian expressions in this example arises from the use of an improper prior.
Example: Math test
You have a random sample of math scores of size \(N = 50\) from a normal distribution, \(Y_i \sim N(\mu, \sigma^2)\). The sample mean and variance are equal to 102 and 10, respectively. Assuming an improper prior equal to \(\frac{1}{\sigma}\), we proceed with the following tasks:
- Compute the 95% confidence and credible intervals for \(\mu\).
- Determine the posterior probability that \(\mu > 103\).
Using the fact that \(\mu | \mathbf{y} \sim t\left(\bar{y}, \frac{\hat{\sigma}^2}{N} \left( \frac{\alpha_n}{\alpha_n - 2} \right), \alpha_n \right)\), which implies that the confidence and credible intervals for \(\mu\) are given by:
\[ \begin{aligned} \bar{y} \pm |t_{\alpha/2}^{\alpha_n}| \frac{\hat{\sigma}}{\sqrt{N}}, \end{aligned} \]
where \(\bar{y} = 102\), \(\hat{\sigma}^2 = 10\), and \(\alpha_n = 49\). Thus, the 95% confidence and credible intervals for \(\mu\) are the same, namely \((101.1, 102.9)\), and the posterior probability that \(\mu > 103\) is 1.49% given the sample information.
N <- 50 # Sample size
y_bar <- 102 # Sample mean
s2 <- 10 # Sample variance
alpha <- N - 1
serror <- (s2/N)^0.5
LimInf <- y_bar - abs(qt(0.025, alpha)) * serror
LimInf
## [1] 101.1013
## [1] 102.8987
# Upper bound
y.cut <- 103
P <- 1-metRology::pt.scaled(y.cut, df = alpha, mean = y_bar, sd = serror)
P
## [1] 0.01496694