5.2 Bandwidth selection

Cross-validatory bandwidth selection, as studied in Section 4.3, extends neatly to the mixed multivariate case. For the fully continuous case, the least-squares cross validation selector is defined as

\[\begin{align*} \mathrm{CV}(\mathbf{h})&:=\frac{1}{n}\sum_{i=1}^n(Y_i-\hat{m}_{-i}(\mathbf{X}_i;q,\mathbf{h}))^2,\\ \hat{\mathbf{h}}_\mathrm{CV}&:=\arg\min_{h_1,\ldots,h_p>0}\mathrm{CV}(\mathbf{h}). \end{align*}\]

The cross-validation objective function becomes more challenging to minimize as \(p\) grows. This is the reason why employing several starting values for optimizing it (as np does) is advisable.

The mixed case is defined in a completely analogous manner by just replacing continuous kernels \(K_h(\cdot)\) with categorical \(l_u(\cdot,\cdot;\lambda)\) or ordered discrete \(l_o(\cdot,\cdot;\eta)\) kernels.

Importantly, the trick described in Proposition 4.1 holds with obvious modifications. It also holds for the mixed case and the Nadaraya–Watson estimator.

Proposition 5.1 For \(q=0,1,\) the weights of the leave-one-out estimator \(\hat{m}_{-i}(\mathbf{x};q,h)=\sum_{\substack{j=1\\j\neq i}}^nW_{-i,j}^q(\mathbf{x})Y_j\) can be obtained from \(\hat{m}(\mathbf{x};q,h)=\sum_{i=1}^nW_{i}^q(\mathbf{x})Y_i\):

\[\begin{align} W_{-i,j}^q(\mathbf{x})=\frac{W^q_j(\mathbf{x})}{\sum_{\substack{k=1\\k\neq i}}^nW_k^q(\mathbf{x})}=\frac{W^q_j(\mathbf{x})}{1-W_i^q(\mathbf{x})}.\tag{5.13} \end{align}\]

This implies that

\[\begin{align} \mathrm{CV}(\mathbf{h})=\frac{1}{n}\sum_{i=1}^n\left(\frac{Y_i-\hat{m}(\mathbf{X}_i;q,h)}{1-W_i^q(\mathbf{X}_i)}\right)^2.\tag{5.14} \end{align}\]

Remark. As in the univariate case, computing (5.14) requires evaluating the local polynomial estimator at the sample \(\{\mathbf{X}_i\}_{i=1}^n\) and obtaining \(\{W_i^q(\mathbf{X}_i)\}_{i=1}^n\) (which are needed to evaluate \(\hat{m}(\mathbf{X}_i;q,h)\)). Both tasks can be achieved simultaneously from the \(n\times n\) matrix \(\big(W_{i}^q(\mathbf{X}_j)\big)_{ij}.\) Evaluating \(\hat{m}_{-i}(\mathbf{x};q,h),\) because of (5.13), can be done with the weights \(\{W_i^q(\mathbf{x})\}_{i=1}^n.\)

Exercise 5.6 Implement an R function to compute (5.14) for the local constant estimator with multivariate (continuous) predictor. The function must receive as arguments the sample \((\mathbf{X}_1,Y_1),\ldots,(\mathbf{X}_n,Y_n)\) and the bandwidth vector \(\mathbf{h}.\) Use the normal kernel. Test your implementation by:

Simulating a random sample from a regression model with two predictors.
Computing its cross-validation bandwidths via np::npregbw.
Plotting a contour of the function \((h_1,h_2)\mapsto \mathrm{CV}(h_1,h_2)\) and checking that the minimizers and minimum of this surface coincide with the solution given by np::npregbw.

Consider several regression models for testing the implementation.

Exercise 5.7 Perform a simulation study similar to that of Exercise 4.19 to illustrate the erratic behavior of local constant and linear estimators “holes” in the support of two predictors \((X_1,X_2).\)

Design a distribution pattern for \((X_1,X_2)\) that features an internal “hole”. An example of such distribution is the “oval” density simulated in Section 3.5.4.
Define a regression function \(m\) that is neither constant nor linear in both predictors, and that behaves differently at different sides of the hole, or at the hole.
Simulate \(n\) observations from a regression model \(Y=m(X_1,X_2)+\varepsilon\) for a sample size of your choice.
Compute the CV bandwidths and the associated local constant and linear fits.
Plot the fits as surfaces. Trick: adjust the transparency of each the surface for better visualization.
Repeat Steps 2–4 \(M=50\) times.

Comment on the obtained results.