5 Day 5 (February 4)

5.1 Announcements

  • Please read (and re-read) Ch. 3 and 4 in BBM2L book.

  • Selected questions/clarifications from journals

    • How to choose/select a distribution
      • Definition of a model
      • Combining data and models/assumptions gives use prediction/forecasts/inference.
    • Sample size questions
      • n = Inf
      • n = 0
      • Power analysis
      • Anxiety/statistical therapy
      • Adaptive designs
  • Good reading from The American Statistician link

5.2 Building our first statistical model

  • The backstory
  • Building a statistical model using a likelihood-based (classical) approach
    • Specify (write out) the likelihood
    • Select an approach to estimate unknown parameters (e.g., maximum likelihood)
    • Quantify uncertainty in unknown parameters (e.g., using normal approximation, see here)
  • Building a statistical model using a Bayesian approach
    • Specify (write out) the likelihood/data model
    • Specify the parameter model (or prior) including hyper-parameters
    • Select an approach to obtain the posterior distribution
      • Analytically (i.e., pencil and paper)
      • Simulation-based (e.g., Metropolis-Hastings, MCMC, importance sampling, ABC, etc)

5.3 Numerical Integration

  • Why do we need integrals to do Bayesian statistics?

    • Example using Bayes theorem to estimate prevalence rate of rabies
    • Why it is important to keep track of what we are calculating (i.e., clarity in what is being estimated)
  • Numerical approximation vs. analytical solutions

  • Definition of a definite integral \[\int_{a}^b f(z)dz = \lim_{Q\to\infty} \sum_{q=1}^{Q}\Delta q f(z_q)\] where \(\Delta q =\frac{b-a}{Q}\) and \(z_q = a + \frac{q}{2}\Delta q\).

  • Riemann approximation (midpoint rule)\[\int_{a}^b f(z)dz \approx \sum_{q=1}^{Q}\Delta q f(z_q)\] where \(\Delta q =\frac{b-a}{Q}\) and \(z_q = a + \frac{2q - 1}{2}\Delta q\).

  • Using similar approach in R (Adaptive quadrature)

    fn <- function(y){dnorm(y,0,1)}
    integrate(f=fn,lower=-4,upper=4,subdivisions=10)
    ## 0.9999367 with absolute error < 4.8e-12

5.4 Monte Carlo Integration

  • Deterministic vs stochastic methods to approximate integrals

    • Work well for high-dimensional multiple integrals
    • Easy to program
  • Monte Carlo integration

    • \[\begin{eqnarray} \text{E}(g(y)) &=& \int g(y)[y|\theta]dy\\ &\approx& \frac{1}{Q}\sum_{q=1}^{Q}g(y_q) \end{eqnarray}\]
    • Examples:
    1. \[\text{E}(y) = \int_{-\infty}^\infty y\frac{1}{\sqrt{2\pi\sigma^2}}\textit{e}^{-\frac{1}{2\sigma^2}(y - \mu)^2}dy\]
    y <- rnorm(n = 10^6, mean = 2, sd = 3)
    mean(y)
    ## [1] 1.999428
    1. \[\text{E}((y-\mu)^2) = \int_{-\infty}^\infty (y-\mu)^2\frac{1}{\sqrt{2\pi\sigma^2}}\textit{e}^{-\frac{1}{2\sigma^2}(y - \mu)^2}dy\]
    y <- rnorm(n = 10^6, mean = 2, sd = 3)
    mean((y - 2)^2)
    ## [1] 9.014978
    1. \[\text{E}(\frac{1}{y} ) = \int_{-\infty}^\infty \frac{1}{y}\frac{1}{\sqrt{2\pi\sigma^2}}\textit{e}^{-\frac{1}{2\sigma^2}(y - \mu)^2}dy\]
    y <- rnorm(n = 10^6, mean = 2, sd = 4)
    mean(1/y)
    ## [1] 0.4597593
  • Questions about activity 2?

  • Live example using bat and coin data/model