Example 1
Context
The admissions committee of a comprehensive state university selected at random the records of 200 second-semester freshmen. The results, first-semester college GPA and SAT scores, are stored in the data frame GRADES.
Read in the data using GRADES
is part of this package:
library(PASWR2)
The admissions committee wants to study the linear relationship between first-semester college grade point average (gpa) and scholastic aptitude test (sat) scores. Assume that the requirements for model are satisfied.
Question 1)
Find the variance-covariance matrix for \(\hat \beta\) using the below
\[\hat{\sigma}^2(\mathbf{X}'\mathbf{X})^{-1} = MSE(\mathbf{X}'\mathbf{X})^{-1} = \begin{bmatrix} s^2_{\hat{\beta}_0} & s_{\hat{\beta}_0, \hat{\beta}_1} & \cdots & s_{\hat{\beta}_0, \hat{\beta}_{p-1}} \\ s_{\hat{\beta}_0, \hat{\beta}_1} & s^2_{\hat{\beta}_1} & \cdots & s_{\hat{\beta}_1, \hat{\beta}_{p-1}} \\ \vdots & \vdots & \ddots & \vdots \\ s_{\hat{\beta}_{p-1}, \hat{\beta}_0} & s_{\hat{\beta}_{p-1}, \hat{\beta}_1} & \cdots & s^2_{\hat{\beta}_{p-1}} \end{bmatrix} = \boldsymbol{s}^2_\hat{\beta} \]
Question 2)
Test whether there is a linear relationship at the \(\alpha\) = 0.10 significance level.
Question 3)
Construct 90% confidence intervals for \(\beta_0\) and \(\beta_1\).
Question 1)
Recall that \(\hat \epsilon_i\)=\(Y_i-\hat Y_i\), \(\hat \sigma^2\) =\(\sum^n_{i=1} \frac{\hat \epsilon_i^2}{n-p}\) and the variance-covariance matrix is \(s^2_{\hat \beta}\)=\(\hat \sigma^2\) \((X'X)^{-1}\):
The first method below uses the standard error from your summary table. Second method is a short way
<- lm(gpa ~ sat, data = GRADES)
model.lm <- summary(model.lm)$cov.unscaled # cov.unscaled is unscaled covariance matrix
XTXI <- summary(model.lm)
XTXIA
<- summary(model.lm)$sigma^2
MSE
<- MSE * XTXI
var.cov.b var.cov.b
## (Intercept) sat
## (Intercept) 4.948408e-02 -4.290866e-05
## sat -4.290866e-05 3.781665e-08
\[\boldsymbol{s}^2_\hat{\beta} = \begin{bmatrix} 0.0495 & 0 \\ 0 & 0 \end{bmatrix} \]
To compute \(s^2_{\hat \beta}\) type
vcov(model.lm)
## (Intercept) sat
## (Intercept) 4.948408e-02 -4.290866e-05
## sat -4.290866e-05 3.781665e-08
Question 2)
The five-step procedure is used to test for a linear relationship between sat and gpa.
Step 1: Hypotheses — \(H_0 : \beta_1=0\) versus \(H_1: \beta_1 \neq 0\)
step 2: Test Statistic — \(\hat \beta_1=0.0031\) is the test statistic. Assuming the assumptions of Model are satisfied,
standardized test statistic under the assumption that \(H_0\) is true and its distribution are
Step 3: Rejection Region Calculations — Because the standardized test statistic is distributed \(t_198\) and \(H_1\) is a two-sided hypothesis, the rejection region is \(|t_{obs}| >t_{0.95;198}\) = 1.6526. The value of the standardized test statistic is \(t_{obs}\) =\(\frac{0.0031-0}{2e-04}=15.9117\).
Using the summary function you can get the test data outputs from there.
Step 4: Statistical Conclusion — The p-value is \(2 \times P(t_{198} \ge 15.9117)=2 \times 0=0\)
From the rejection region, reject \(H_0\) because |15.9117| is greater than 1.6526.
From the p-value, reject \(H_0\) because the p-value = 0 is less than 0.10.
the conclusion that the test statistic is in the rejection region with p-value (Pr(>|t|)in summary(model.lm))
Step 5: English Conclusion — There is evidence to suggest a linear relationship between sat and gpa.
To see the test statistics and their p-values for the model.lm, entersummary(model.lm)$coef
summary(model.lm)$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.19206381 0.222450180 -5.35879 2.316666e-07
## sat 0.00309427 0.000194465 15.91171 2.922995e-37
As the p-value from the above results is smaller than 0.10 it is considered to be statistically significant, in which case the null hypothesis should be rejected.
Question 3)
90% Confidence intervals for \(\beta_0\) and \(\beta_1\) are
\(CI_{0.90}(\beta_0)\)=[ \(\hat\beta_0\) -\(t_{0.95;n-p}\cdot s_{\hat \beta_0}\) , \(\hat\beta_0\)+\(t_{0.95;n-p}\cdot s_{\hat\beta_0}\) ]
\(CI_{0.90}(\beta_0)\)=[\(-1.1921-1.6526(0.2225),-1.1921+1.6526(0.2225)\)]
\(CI_{0.90}(\beta_0)\)=[\(-1.5597,-0.8244\)]
and
\(CI_{0.90}(\beta_1)\)=[ \(\hat\beta_1\) -\(t_{0.95;n-p}\cdot s_{\hat \beta_1}\) , \(\hat\beta_1\)+\(t_{0.95;n-p}\cdot s_{\hat\beta_1}\) ]
\(CI_{0.90}(\beta_1)\)=[\(0.0031-1.6526(2e-04),0.0031+16526(2e-04))\)]
\(CI_{0.90}(\beta_1)\)=[\(0.0028,0.0034\)]
R Code below computes the requested confidence intervals extracting the pieces neededas well as computing the answers directly with the function confint()
<- coef(summary(model.lm))[1, 1]
b0 <- coef(summary(model.lm))[1, 2]
s.b0 <- coef(summary(model.lm))[2, 1]
b1 <- coef(summary(model.lm))[2, 2]
s.b1 <- qt(1 - 0.1/2, 198) # alpha = 0.10
ct <- b0 + c(-1, 1) * ct * s.b0
CI.B0 CI.B0
## [1] -1.5596818 -0.8244458
<- b1 + c(-1, 1) * ct * s.b1
CI.B1 CI.B1
## [1] 0.00277290 0.00341564
Or by using the model command:
confint(model.lm, level = 0.9)
## 5 % 95 %
## (Intercept) -1.5596818 -0.82444581
## sat 0.0027729 0.00341564
Hence a 95% confidence interval for the coefficient of SAT scores is (0.00278, 0.00342). As this interval does not contain 0, we conclude that the predictor SAT scores makes a statistically significant contribution in addition to the predictor GPA. The coefficient for log(diameter) is highly likely to lie between 0.00278 and 0.00342.