Chapter 13 Bootstrap Resampling in Modelling

In Chapter 11, we introduced the bootstrap, an exceptionally versatile technique. While it is often applied to estimate the standard deviation of a quantity when direct calculation is difficult or impossible, here we encounter it in a very different role: as a tool to enhance model estimation.


Motivation: Most tests related to model building (e.g. testing the significance of parameters) assumes normality and/or large samples. This is not always the case.


For example, in classical simple linear regression: the Gaussian model assumes that the errors are normally distributed. That is:

\[ y_i = \beta_0+\beta x_i +\varepsilon_i\\ \varepsilon_i \overset{iid}{\sim} N(0,\sigma^2) \]

This further implies the distribution of the OLS estimator has the following distribution:

\[ \hat{\beta} = \frac{\sum(y_i-\bar{y})(x_i-\bar{x})}{\sum(x_i-\bar{x})^2} \sim N\left(\beta,\frac{\sigma^2}{\sum(x_i-\bar{x})^2}\right) \]

And finally,

\[ \frac{\hat{\beta}-\beta}{\widehat{se(\hat{\beta})}} \sim t_{\nu = n-2} \]

where \(\widehat{se(\hat{\beta})}=\sqrt{MSE/\sum(x_i-\bar{x})^2}\)

Inferences about the coefficients \(\beta\) that are based on the T-distribution (e.g. confidence intervals and t-test p-values) will be invalid if the error terms are not normally distributed.

Bootstrap counterparts may be conducted when the data fail to meet these assumptions or data requirement.

13.1 Non-parametric Bootstrap in Modeling

Suppose \(y_i\) is the value of the response for the \(i^{th}\) observation, and \(x_{ij}\) is the value of the \(j^{th}\) predictor for the \(i^{th}\) observation. In this example, \(y_i\)s are assumed to be independent such that:

\[ E(y_i|x_i) = \sum_{j=1}^px_{ij}\beta_j+\beta_0 \]

This means the data is cross-sectional, where rows are independent of each other.

Let’s say that the model being fitted is:

\[ y_i=\sum_{j=1}^p x_{ij}\beta_j+ \beta_0 +\varepsilon_i \]

where \(E(\varepsilon_i)=0\), \(Var(\varepsilon_i)=\sigma^2\), \(cov(\varepsilon_i,\varepsilon_j) = 0\)

Do the following to perform inference on \(\beta_j\) via nonparametric bootstrap:

  1. Take a simple random sample of size \(n\) with replacement from the data set. (this is your bootstrap resample)

  2. Using OLS (or whatever fitting procedure applies), fit a model, and compute the estimates \(\hat{\beta}_j^*\)

  3. Repeat steps 1 and 2 \(B\) times. (\(B\) must be large)

  4. Collect all \(B\) \(\hat{\beta}_j^*\)s and compute measures that apply.

For POINT estimation

  • The average of \(\hat{\beta}_j^*\)s is the bootstrap estimate

  • The estimated standard error is the standard deviation of the \(\hat{\beta}_j^*\)s

  • Note: the method of averaging the \(\hat{\beta}_j^*\)s is also referred as “Bagging” (See Section 13.3)

For INTERVAL estimation

  • The simplest approach for constructing a \((1 − \alpha)100\%\) Confidence Interval Estimate is using Percentiles.

  • \((P_{\alpha/2},P_{1-\alpha/2})\) where \(P_k\) is the \(k^{th}\) quantile.

For interval-based HYPOTHESIS TEST

  • The usual hypothesis is \(Ho: \beta_j=0\) vs \(Ha:\beta_j\neq0\)

  • You can use the computed C.I. estimate.

  • At \(\alpha\) level of significance, reject \(Ho\) when 0 is not in the \((1-\alpha)100\%\) interval estimate.

13.2 Semiparametric Bootstrap

Semi-parametric Bootstrap resampling in modelling is also known as “residual resampling”. The process is semi-parametric because the first step is to fit a model, commonly using parametric methods like MLE, then residuals are calculated for resampling.

The following is an algorithm that implements residual bootstrapping

  1. Fit the model using the original data to obtain:

    • the estimates \(\hat{\beta}_0=\bar{y}-\hat{\beta}\bar{x}\) and \(\hat{\beta}=\sum_{i=1}^n\frac{(y_i-\bar{y})(x_i-\bar{x})}{\sum_{i=1}^n(x_i-\bar{x})^2}\)

    • the fitted values \(\hat{y}_i=\hat{\beta}_0 + \hat{\beta}x_i\)

    • the residuals \(e_i=y_i-\hat{y}_i\)

  2. From the residuals \((e_1,...,e_n)\), sample with replacement to obtain bootstrap residuals \((e_1^*,e_2^*,...e_n^*)\).

  3. Using the resampled residuals, create a synthetic response variable \(y_i^*=\hat{y}_i+e_i^*\)

  4. Using the synthetic response variable \(y_i^*\), refit the model to obtain bootstrap estimate of the coefficient \(\hat{\beta}_0^*\) and \(\hat{\beta}^*\)

  5. Repeat 2,3,4 \(B\) times to obtain \(B\) values of \(\hat{\beta}_0^*\) and \(\hat{\beta}^*\).

Interval Estimation and Hypothesis Test follow the same concept.

Exercise

Create a function boot_reg that implements residual bootstrap in simple linear regression. It should have the following parameters:

  • y - a vector of the dependent variable

  • x - a vector of the independent variable

  • alpha - the \(\alpha\) level of significance

  • B - the number of bootstrap replicates

It should have the following as output in a list:

  • reg_table: a data frame with the following layout:

  • intercept_vec:vector of estimated coefficients \(\hat{\beta_0}^*\)

  • betahat_vec: vector of estimated coefficients \(\hat{\beta}^*\)

Apply this function on the vectors mtcars$mpg and mtcars$wt as the dependent and independent variables respectively, with \(B = 2000\) and \(\alpha=0.05\)

13.3 Bagging Algorithm

Bootstrap aggregating (or bagging) is a useful technique to improve the predictive performance of models, e.g., for additive models with high-dimensional predictors.

  • The idea is to generate several models via bootstrap and aggregate predicted values via averaging.

  • Bagging is commonly used to improve tree models or decision trees (such method is called random forest)

  • It is ideal for minimizing the instability or variance of a model in terms of prediction

Basic Bagging Algorithm for Predicted Values

Let \((\textbf{y},\textbf{X})\) be the data matrix. Note that each row is an independent observation.

  • DO the following \(B\) times:

    • GENERATE a bootstrap sample \((\textbf{y}^*_b,\textbf{X}^*_b)\)

    • FIT a model \(\hat{f}(\textbf{X}_b^*)=\textbf{X}_b^*\hat{\boldsymbol{\beta}}_b\)

    • COMPUTE predicted value \(\widehat{\textbf{y}_b}=\textbf{X}_b^*\hat{\boldsymbol{\beta}}_b\)

  • END

  • COMPUTE “Bagged” predicted value \(\widehat{\textbf{y}}_{bag}=\frac{1}{B}\sum_{b=1}^B\widehat{\textbf{y}_b}\)

Random Subsampling

Random Permutations of Predictors

A random sample of predictors is selected. Hence, some models in the ensemble do not contain some predictors. It appeals to relationships where the set of significant predictors vary across different “groups” or “types” of observation.

Suggestions for Research using Bagging

  • explore bagging algorithm for a real predictive modeling task. (e.g. improve a linear model via bagging)

  • bagging algorithm for modeling high dimensional data (i.e. p >> n). Hence, only a subset of variables can be considered at a time.

  • design a strategic randomization method and/or aggregation strategy for Bagging a certain model.

  • explore the impact score as a tool for variable selection.


© 2025 Siegfred Roi L. Codia. All rights reserved.