8 Day 8 (June 18)

8.1 Announcements

Assignment 2 questions?
Holiday tomorrow and workday on Friday
Assignment 3 and Final project is posted
Recommended reading
- Chapters 2 and 3 (pgs 13 - 46) in Linear Models with R
Selected questions from journals
- “I’m having some trouble understanding the Nelder-Mead optimization method”
- “We’ve talked a lot about how important the error term is in linear statistical models to account for randomness or variability in a specific dataset. So, in the “lm” function in R Studio, and in the example we discussed using the deer dataset, how can we include the error term inside this lm function?”
- “When we split the data into training and testing sets, what ratio is generally used?”
- “Is it correct to use adjusted R-squared when comparing multiple models to determine whether adding a variable improves the model, and to use R-squared when evaluating a single model to describe how well it fits? When doing basic linear regression, is it more common to use adjusted R-squared or AIC?”

8.2 Estimation

Live example

8.3 Maximum Likelihood Estimation

Maximum likelihood estimation for the linear model
- Assume that \(\mathbf{y}\sim \text{N}(\mathbf{X}\boldsymbol{\beta},\sigma^2\mathbf{I})\)
- Write out the likelihood function \[\mathcal{L}(\boldsymbol{\beta},\sigma^2)=\prod_{i=1}^n\frac{1}{\sqrt{2\pi\sigma^2}}\textit{e}^{-\frac{1}{2\sigma^2}(y_{i} - \mathbf{x}_{i}^{\prime}\boldsymbol{\beta})^2}.\]
- Maximize the likelihood function
  1. First take the natural log of \(\mathcal{L}(\boldsymbol{\beta},\sigma^2)\), which gives \[\mathcal{l}(\boldsymbol{\beta},\sigma^2)=-\frac{n}{2}\text{ log}(2\pi\sigma^2) - \frac{1}{2\sigma^2}\sum_{i=1}^{n}(y_{i} - \mathbf{x}_{i}^{\prime}\boldsymbol{\beta})^2.\]
  2. Recall that \(\sum_{i=1}^{n}(y_{i} - \mathbf{x}_{i}^{\prime}\boldsymbol{\beta})^2 = (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^{\prime}(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})\)
  3. Maximize the log-likelihood function \[\underset{\boldsymbol{\beta}, \sigma^2}{\operatorname{argmax}} -\frac{n}{2}\text{ log}(2\pi\sigma^2) - \frac{1}{2\sigma^2}(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^{\prime}(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})\]
- Visual for data \(\mathbf{y}=(0.16,2.82,2.24)'\) and \(\mathbf{x}=(1,2,3)'\)

Three ways to do it in program R

Using matrix calculus and algebra

# Data 
y <- c(0.16,2.82,2.24)
X <- matrix(c(1,1,1,1,2,3),nrow=3,ncol=2,byrow=FALSE)
n <- length(y)

# Maximum likelihood estimation using closed from estimators
beta <- solve(t(X)%*%X)%*%t(X)%*%y 
sigma2 <- (1/n)*t(y-X%*%beta)%*%(y-X%*%beta)

beta

##       [,1]
## [1,] -0.34
## [2,]  1.04

sigma2

##        [,1]
## [1,] 0.5832

Using modern (circa 1970’s) optimization techniques

# Maximum likelihood estimation estimation using the Nelder-Mead algorithm
y <- c(0.16,2.82,2.24)
X <- matrix(c(1,1,1,1,2,3),nrow=3,ncol=2,byrow=FALSE)

negloglik <- function(par){
beta <- par[1:2]
sigma2 <- par[3]
-sum(dnorm(y,X%*%beta,sqrt(sigma2),log=TRUE))
}
optim(par=c(0,0,1),fn=negloglik,method = "Nelder-Mead")

## $par
## [1] -0.3397162  1.0398552  0.5831210
## 
## $value
## [1] 3.447978
## 
## $counts
## function gradient 
##      150       NA 
## 
## $convergence
## [1] 0
## 
## $message
## NULL

Using modern and user friendly statistical computing software

# Maximum likelihood estimation using lm function
# Note: estimate of sigma2 is not the MLE! 
df <- data.frame(y = c(0.16,2.82,2.24),x = c(1,2,3))
m1 <- lm(y~x,data=df)

coef(m1)

## (Intercept)           x 
##       -0.34        1.04

summary(m1)$sigma^2

## [1] 1.7496

summary(m1)

## 
## Call:
## lm(formula = y ~ x, data = df)
## 
## Residuals:
##     1     2     3 
## -0.54  1.08 -0.54 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  -0.3400     2.0205  -0.168    0.894
## x             1.0400     0.9353   1.112    0.466
## 
## Residual standard error: 1.323 on 1 degrees of freedom
## Multiple R-squared:  0.5529, Adjusted R-squared:  0.1057 
## F-statistic: 1.236 on 1 and 1 DF,  p-value: 0.4663