Example 1

Context

The admissions committee of a comprehensive state university selected at random the records of 200 second-semester freshmen. The results, first-semester college GPA and SAT scores, are stored in the data frame GRADES.

Read in the data using GRADES is part of this package:

library(PASWR2)

The admissions committee wants to study the linear relationship between first-semester college grade point average (gpa) and scholastic aptitude test (sat) scores. Assume that the requirements for model are satisfied.


Questions

Question 1)

Find the variance-covariance matrix for \(\hat \beta\) using the below

\[\hat{\sigma}^2(\mathbf{X}'\mathbf{X})^{-1} = MSE(\mathbf{X}'\mathbf{X})^{-1} = \begin{bmatrix} s^2_{\hat{\beta}_0} & s_{\hat{\beta}_0, \hat{\beta}_1} & \cdots & s_{\hat{\beta}_0, \hat{\beta}_{p-1}} \\ s_{\hat{\beta}_0, \hat{\beta}_1} & s^2_{\hat{\beta}_1} & \cdots & s_{\hat{\beta}_1, \hat{\beta}_{p-1}} \\ \vdots & \vdots & \ddots & \vdots \\ s_{\hat{\beta}_{p-1}, \hat{\beta}_0} & s_{\hat{\beta}_{p-1}, \hat{\beta}_1} & \cdots & s^2_{\hat{\beta}_{p-1}} \end{bmatrix} = \boldsymbol{s}^2_\hat{\beta} \]

Question 2)

Test whether there is a linear relationship at the \(\alpha\) = 0.10 significance level.

Question 3)

Construct 90% confidence intervals for \(\beta_0\) and \(\beta_1\).


Solutions

Question 1)

Recall that \(\hat \epsilon_i\)=\(Y_i-\hat Y_i\), \(\hat \sigma^2\) =\(\sum^n_{i=1} \frac{\hat \epsilon_i^2}{n-p}\) and the variance-covariance matrix is \(s^2_{\hat \beta}\)=\(\hat \sigma^2\) \((X'X)^{-1}\):

The first method below uses the standard error from your summary table. Second method is a short way

model.lm <- lm(gpa ~ sat, data = GRADES)
XTXI <- summary(model.lm)$cov.unscaled  # cov.unscaled is unscaled covariance matrix
XTXIA <- summary(model.lm)

MSE <- summary(model.lm)$sigma^2

var.cov.b <- MSE * XTXI
var.cov.b
##               (Intercept)           sat
## (Intercept)  4.948408e-02 -4.290866e-05
## sat         -4.290866e-05  3.781665e-08

\[\boldsymbol{s}^2_\hat{\beta} = \begin{bmatrix} 0.0495 & 0 \\ 0 & 0 \end{bmatrix} \]

To compute \(s^2_{\hat \beta}\) type

vcov(model.lm)
##               (Intercept)           sat
## (Intercept)  4.948408e-02 -4.290866e-05
## sat         -4.290866e-05  3.781665e-08

Question 2)

The five-step procedure is used to test for a linear relationship between sat and gpa.

Step 1: Hypotheses — \(H_0 : \beta_1=0\) versus \(H_1: \beta_1 \neq 0\)

step 2: Test Statistic — \(\hat \beta_1=0.0031\) is the test statistic. Assuming the assumptions of Model are satisfied,

\(\hat \beta_1\) ~ \(N(\beta_1,\sigma^2_{\hat \beta_1})\)

standardized test statistic under the assumption that \(H_0\) is true and its distribution are

\(\frac{\hat \beta_1-\beta_1}{s_{\hat \beta_1}}\)~\(t_{200-2}\)

Step 3: Rejection Region Calculations — Because the standardized test statistic is distributed \(t_198\) and \(H_1\) is a two-sided hypothesis, the rejection region is \(|t_{obs}| >t_{0.95;198}\) = 1.6526. The value of the standardized test statistic is \(t_{obs}\) =\(\frac{0.0031-0}{2e-04}=15.9117\).

Using the summary function you can get the test data outputs from there.

Step 4: Statistical Conclusion — The p-value is \(2 \times P(t_{198} \ge 15.9117)=2 \times 0=0\)

  • From the rejection region, reject \(H_0\) because |15.9117| is greater than 1.6526.

  • From the p-value, reject \(H_0\) because the p-value = 0 is less than 0.10.

the conclusion that the test statistic is in the rejection region with p-value (Pr(>|t|)in summary(model.lm))

Step 5: English Conclusion — There is evidence to suggest a linear relationship between sat and gpa.

To see the test statistics and their p-values for the model.lm, enter
summary(model.lm)$coef
 summary(model.lm)$coef
##                Estimate  Std. Error  t value     Pr(>|t|)
## (Intercept) -1.19206381 0.222450180 -5.35879 2.316666e-07
## sat          0.00309427 0.000194465 15.91171 2.922995e-37

As the p-value from the above results is smaller than 0.10 it is considered to be statistically significant, in which case the null hypothesis should be rejected.

Question 3)

90% Confidence intervals for \(\beta_0\) and \(\beta_1\) are

\(CI_{0.90}(\beta_0)\)=[ \(\hat\beta_0\) -\(t_{0.95;n-p}\cdot s_{\hat \beta_0}\) , \(\hat\beta_0\)+\(t_{0.95;n-p}\cdot s_{\hat\beta_0}\) ]

\(CI_{0.90}(\beta_0)\)=[\(-1.1921-1.6526(0.2225),-1.1921+1.6526(0.2225)\)]

\(CI_{0.90}(\beta_0)\)=[\(-1.5597,-0.8244\)]

and

\(CI_{0.90}(\beta_1)\)=[ \(\hat\beta_1\) -\(t_{0.95;n-p}\cdot s_{\hat \beta_1}\) , \(\hat\beta_1\)+\(t_{0.95;n-p}\cdot s_{\hat\beta_1}\) ]

\(CI_{0.90}(\beta_1)\)=[\(0.0031-1.6526(2e-04),0.0031+16526(2e-04))\)]

\(CI_{0.90}(\beta_1)\)=[\(0.0028,0.0034\)]

R Code below computes the requested confidence intervals extracting the pieces neededas well as computing the answers directly with the function confint()

 b0 <- coef(summary(model.lm))[1, 1]
s.b0 <- coef(summary(model.lm))[1, 2]
b1 <- coef(summary(model.lm))[2, 1]
s.b1 <- coef(summary(model.lm))[2, 2]
ct <- qt(1 - 0.1/2, 198) # alpha = 0.10
CI.B0 <- b0 + c(-1, 1) * ct * s.b0
CI.B0
## [1] -1.5596818 -0.8244458
CI.B1 <- b1 + c(-1, 1) * ct * s.b1
CI.B1
## [1] 0.00277290 0.00341564

Or by using the model command:

confint(model.lm, level = 0.9)
##                    5 %        95 %
## (Intercept) -1.5596818 -0.82444581
## sat          0.0027729  0.00341564

Hence a 95% confidence interval for the coefficient of SAT scores is (0.00278, 0.00342). As this interval does not contain 0, we conclude that the predictor SAT scores makes a statistically significant contribution in addition to the predictor GPA. The coefficient for log(diameter) is highly likely to lie between 0.00278 and 0.00342.