2 Day 2 (June 10)

2.1 Announcements

  • Journals are uploaded as a pdf to Canvas

  • Assignment 1 is due on Friday.

    • Upload to Canvas
    • Assignment should only take 15-30 min if everything works
    • Do not spend more than 1 hour
    • After 1 hour of trying please visit me during office hours
    • Do not wait until Thursday to do the assignment
  • Recommended reading

    • Chapters 1 and 2 (pgs 1 - 27) in Linear Models with R
  • Today’s lecture in the news

    • Please realize this is a satirical article (link)

2.2 Intro to statistical modelling

  • A difficult question
    • How much money will I have for retirement?
  • What is data?
    • Something in the real world that you can, in some way, observe and measure with or without error
  • What is a statistic?
    • A function of the data
  • What is a model?
    • Mathematical models
    • Statistical models
  • A difficult question
    • How much money will I have for retirement?
      • Point prediction vs. distributional prediction
      • What data/information do I have?
      • What data do I need?
      • How can I answer this question using a statistical model?
  • Example: my retirement
    • Personal information
      • Obviously this isn’t my actual information, but it isn’t too far off!
      • Since I am a millennial I don’t think social security will be around when I retire (i.e., assume social security contributes $0 to my retirement)
      • As of 1/1/25 I have $450,000 in a 401k style retirement account
      • All of money is invested into an S&P 500 index fund (VOO to be exact)
      • I am 38 as of 1/1/25
      • I want to know how much pre-tax money I will have at a given retirement age (e.g., 65, 70, etc)
    • Example using a mathematical model
      • Whiteboard demonstration
      • What are the model assumptions?
      • In program R
# The value of my 401k retirement account as of 1/1/25
y_2025 <- 450000 


# How much money will I add to my 401k each year
q <- 46000
    
    
# Rate of return for S&P 500 index fund
r <- 0.08


# How much $ will I have in 2026
y_2026 <- y_2025*(1+r)+q
y_2026
## [1] 532000
# How much $ will I have in 2027
y_2027 <- y_2026*(1+r)+q
y_2027
## [1] 620560
# How much $ will I have in 2028 
y_2028 <- y_2027*(1+r)+q
y_2028
## [1] 716204.8
# Using a for loop to calculate how much $ will I have 
year <- seq(2025,2025+32,by=1)
y <- matrix(,length(year),1)
rownames(y) <- year
y[1,1] <- 450000

for(t in 1:32){
  y[t+1,1] <- y[t,1]*(1+r)+q
}

plot(year,y/10^6,typ="b",pch=20,col="deepskyblue",xlab="Year",ylab="Pretax retirement amount ($ millions)")

# How much $ will I have when I am 60? 
# Note that units are millions of $
retirement.year <- 2025+22
y[which(year==retirement.year)]/10^6
## [1] 4.997454
  • Example using a Bayesian statistical model
    • S&P 500 return since inception in 1928
# Download S&P 500 returns    
url <- "https://www.dropbox.com/scl/fi/rny5260dfxkk5pkby6bdr/sp500.csv?rlkey=v81v7s91254hn8j5ab33az5t5&dl=1"
df.sp500 <- read.csv(url)

head(df.sp500)
##        DateTime  return
## 1 12/31/28 0:00  0.3788
## 2 12/31/29 0:00 -0.1191
## 3 12/31/30 0:00 -0.2848
## 4 12/31/31 0:00 -0.4707
## 5 12/31/32 0:00 -0.1515
## 6 12/31/33 0:00  0.4659
tail(df.sp500)
##         DateTime  return
## 92 12/31/19 0:00  0.2888
## 93 12/31/20 0:00  0.1626
## 94 12/31/21 0:00  0.2689
## 95 12/31/22 0:00 -0.1944
## 96 12/31/23 0:00  0.2423
## 97 12/31/24 0:00  0.2331
mean(df.sp500$return)
## [1] 0.08023196
hist(df.sp500$return,main="",col="grey",xlab=" Return rate of S&P 500")    

- Example using the prior predictive distribution

# Download S&P 500 returns
url <- "https://www.dropbox.com/scl/fi/rny5260dfxkk5pkby6bdr/sp500.csv?rlkey=v81v7s91254hn8j5ab33az5t5&dl=1"
df.sp500 <- read.csv(url)

# The value of my 401k retirement account as of 1/1/25
y_2025 <- 450000 

# How much money will I add to my 401k each year
q <- 46000


# Using a for loop to calculate how much $ will I have 
year <- seq(2025,2025+32,by=1)
Y <- matrix(,length(year),1000)
rownames(Y) <- year
Y[1,] <- y_2025

set.seed(3410)
for(m in 1:1000){
for(t in 1:32){
  r <- sample(df.sp500$return,1)
  Y[t+1,m] <- Y[t,m]*(1+r)+q
              }
                }


# Prior predictive distribution for a given year
retirement.year <- 2025+22
hist(Y[which(year==retirement.year),]/10^6,col="grey",freq=FALSE,xlab="Pretax retirement amount ($ millions)",main="")

# Summary of prior predictive distribution
mean(Y[which(year==retirement.year),]/10^6)
## [1] 4.890922
max(Y[which(year==retirement.year),]/10^6)
## [1] 29.52506
min(Y[which(year==retirement.year),]/10^6)
## [1] 0.6371549
quantile(Y[which(year==retirement.year),]/10^6,probs=c(0.025,0.975))
##      2.5%     97.5% 
##  1.105865 15.005601
# Plot some financial trajectories (i.e., random draws from the prior predictive distribution)
plot(year,Y[,1]/10^6,typ="l",lwd=1,ylim=c(0,max(Y/10^6)),col=rgb(0.1,0.1,0.1,.05),xlab="Year",ylab="Pretax retirement amount ($ millions)") 
for(i in 1:1000){
points(year,Y[,i]/10^6,typ="l",lwd=1,col=rgb(0.1,0.1,0.1,.2))
}

# Plot of prior predictive distribution for all years
E.Y <- apply(Y/10^6,1,mean)
u.CI <- apply(Y/10^6,1,quantile,prob=0.975)
l.CI <- apply(Y/10^6,1,quantile,prob=0.025)
max.Y <- apply(Y/10^6,1,max)
min.Y <- apply(Y/10^6,1,min)

plot(year,E.Y,typ="l",lwd=3,ylim=c(0,max(max.Y)),xlab="Year",ylab="Pretax retirement amount ($ millions)") 
points(year,max.Y,typ="l",lwd=3,col="red")
points(year,min.Y,typ="l",lwd=3,col="red")
polygon(c(year,rev(year)),c(u.CI,rev(l.CI)),
        col=rgb(0.5,0.5,0.5,0.5),border=rgb(0.5,0.5,0.5,0.5))

  • Further reading/learning
    • Hooten and Hefley (2019) Brining Bayesian Models to Life. CRC press (link)
    • Cook, J.D., D.M. Williams, D.P. Walsh., T.J. Hefley (2024) Bayesian forecasting of disease spread with little or no local data. Scientific Reports (link)
    • STAT 768 - Applied Bayesian Modeling and Prediction (link)