Example 1

Context

The admissions committee of a comprehensive state university selected at random the records of 200 second-semester freshmen. The results, first-semester college GPA and SAT scores, are stored in the data frame GRADES.

Read in the data using GRADES is part of this package:

library(PASWR2)

The admissions committee wants to study the linear relationship between first-semester college grade point average (gpa) and scholastic aptitude test (sat) scores. Assume that the requirements for model are satisfied.

Questions

Question 1)

Find the variance-covariance matrix for $\hat \beta$ using the below

\[\hat{\sigma}^2(\mathbf{X}'\mathbf{X})^{-1} = MSE(\mathbf{X}'\mathbf{X})^{-1} = \begin{bmatrix} s^2_{\hat{\beta}_0} & s_{\hat{\beta}_0, \hat{\beta}_1} & \cdots & s_{\hat{\beta}_0, \hat{\beta}_{p-1}} \\ s_{\hat{\beta}_0, \hat{\beta}_1} & s^2_{\hat{\beta}_1} & \cdots & s_{\hat{\beta}_1, \hat{\beta}_{p-1}} \\ \vdots & \vdots & \ddots & \vdots \\ s_{\hat{\beta}_{p-1}, \hat{\beta}_0} & s_{\hat{\beta}_{p-1}, \hat{\beta}_1} & \cdots & s^2_{\hat{\beta}_{p-1}} \end{bmatrix} = \boldsymbol{s}^2_\hat{\beta} \]

Question 2)

Test whether there is a linear relationship at the $\alpha$ = 0.10 significance level.

Question 3)

Construct 90% confidence intervals for $\beta_0$ and $\beta_1$.

Solutions

Question 1)

Recall that $\hat \epsilon_i$=$Y_i-\hat Y_i$, $\hat \sigma^2$ =$\sum^n_{i=1} \frac{\hat \epsilon_i^2}{n-p}$ and the variance-covariance matrix is $s^2_{\hat \beta}$=$\hat \sigma^2$ $(X'X)^{-1}$:

The first method below uses the standard error from your summary table. Second method is a short way

model.lm <- lm(gpa ~ sat, data = GRADES)
XTXI <- summary(model.lm)$cov.unscaled  # cov.unscaled is unscaled covariance matrix
XTXIA <- summary(model.lm)

MSE <- summary(model.lm)$sigma^2

var.cov.b <- MSE * XTXI
var.cov.b

##               (Intercept)           sat
## (Intercept)  4.948408e-02 -4.290866e-05
## sat         -4.290866e-05  3.781665e-08

\[\boldsymbol{s}^2_\hat{\beta} = \begin{bmatrix} 0.0495 & 0 \\ 0 & 0 \end{bmatrix} \]

To compute $s^2_{\hat \beta}$ type

vcov(model.lm)

##               (Intercept)           sat
## (Intercept)  4.948408e-02 -4.290866e-05
## sat         -4.290866e-05  3.781665e-08

Question 2)

The five-step procedure is used to test for a linear relationship between sat and gpa.

Step 1: Hypotheses — $H_0 : \beta_1=0$ versus $H_1: \beta_1 \neq 0$

step 2: Test Statistic — $\hat \beta_1=0.0031$ is the test statistic. Assuming the assumptions of Model are satisfied,

$\hat \beta_1$ ~ $N(\beta_1,\sigma^2_{\hat \beta_1})$

standardized test statistic under the assumption that $H_0$ is true and its distribution are

$\frac{\hat \beta_1-\beta_1}{s_{\hat \beta_1}}$~$t_{200-2}$

Step 3: Rejection Region Calculations — Because the standardized test statistic is distributed $t_198$ and $H_1$ is a two-sided hypothesis, the rejection region is $|t_{obs}| >t_{0.95;198}$ = 1.6526. The value of the standardized test statistic is $t_{obs}$ =$\frac{0.0031-0}{2e-04}=15.9117$.

Using the summary function you can get the test data outputs from there.

Step 4: Statistical Conclusion — The p-value is $2 \times P(t_{198} \ge 15.9117)=2 \times 0=0$

From the rejection region, reject $H_0$ because |15.9117| is greater than 1.6526.
From the p-value, reject $H_0$ because the p-value = 0 is less than 0.10.

the conclusion that the test statistic is in the rejection region with p-value (Pr(>|t|)in summary(model.lm))

Step 5: English Conclusion — There is evidence to suggest a linear relationship between sat and gpa.

To see the test statistics and their p-values for the model.lm, enter summary(model.lm)$coef

 summary(model.lm)$coef

##                Estimate  Std. Error  t value     Pr(>|t|)
## (Intercept) -1.19206381 0.222450180 -5.35879 2.316666e-07
## sat          0.00309427 0.000194465 15.91171 2.922995e-37

As the p-value from the above results is smaller than 0.10 it is considered to be statistically significant, in which case the null hypothesis should be rejected.

Question 3)

90% Confidence intervals for $\beta_0$ and $\beta_1$ are

$CI_{0.90}(\beta_0)$=[ $\hat\beta_0$ -$t_{0.95;n-p}\cdot s_{\hat \beta_0}$ , $\hat\beta_0$+$t_{0.95;n-p}\cdot s_{\hat\beta_0}$ ]

$CI_{0.90}(\beta_0)$=[$-1.1921-1.6526(0.2225),-1.1921+1.6526(0.2225)$]

$CI_{0.90}(\beta_0)$=[$-1.5597,-0.8244$]

and

$CI_{0.90}(\beta_1)$=[ $\hat\beta_1$ -$t_{0.95;n-p}\cdot s_{\hat \beta_1}$ , $\hat\beta_1$+$t_{0.95;n-p}\cdot s_{\hat\beta_1}$ ]

$CI_{0.90}(\beta_1)$=[$0.0031-1.6526(2e-04),0.0031+16526(2e-04))$]

$CI_{0.90}(\beta_1)$=[$0.0028,0.0034$]

R Code below computes the requested confidence intervals extracting the pieces neededas well as computing the answers directly with the function confint()

 b0 <- coef(summary(model.lm))[1, 1]
s.b0 <- coef(summary(model.lm))[1, 2]
b1 <- coef(summary(model.lm))[2, 1]
s.b1 <- coef(summary(model.lm))[2, 2]
ct <- qt(1 - 0.1/2, 198) # alpha = 0.10
CI.B0 <- b0 + c(-1, 1) * ct * s.b0
CI.B0

## [1] -1.5596818 -0.8244458

CI.B1 <- b1 + c(-1, 1) * ct * s.b1
CI.B1

## [1] 0.00277290 0.00341564

Or by using the model command:

confint(model.lm, level = 0.9)

##                    5 %        95 %
## (Intercept) -1.5596818 -0.82444581
## sat          0.0027729  0.00341564

Hence a 95% confidence interval for the coefficient of SAT scores is (0.00278, 0.00342). As this interval does not contain 0, we conclude that the predictor SAT scores makes a statistically significant contribution in addition to the predictor GPA. The coefficient for log(diameter) is highly likely to lie between 0.00278 and 0.00342.