2.1 Stylized facts of returns
- Returns data have some unique empirical properties which should be examined in pre-estimation phase
- Volatility clustering (large returns tend to be followed by larger returns and small returns tend to be followed by smaller returns – periods of high volatility group together and periods of low volatility group together)
- Heavy–tails (extreme values occur more often than one would expect under normal conditions, indicating a very high kurtosis)
- Time–varying variance (implies heteroscedasticity over time)
- Leverage effect (volatility asymmetry)
- Long memory (volatility is highly persistent when observed at higher frequencies)
- Presence of the noise (at higher frequencies volatility is contaminated with the noise such as microstructure effects, bid-ask bounces and other trading imperfections)
For above reasons descriptive statistics is crucial at the beginning of analysis, including visual inspection of the data which helps us to see patterns and other details
The most useful plots are:
- Histogram – used to visualize the distribution of returns within consecutive intervals
- Line plot – used to visualize how returns change over time
- ACF plot – used to visualize the relationship between current returns and lagged returns from the previous periods (autocorrelation)
Along with plots, descriptive statistics of log returns should be provided (mean, standard deviation, skewness, kurtosis, \(\dots\)) and normality should be checked
A normal distribution has a skewness of \(0\) and a kurtosis of \(3\)
Skewness (\(\alpha_3\)) and kurtosis (\(\alpha_4\)) can be used jointly to check for normality according to the Jarque–Bera test \[\begin{equation}JB=n\times \bigg(\frac{\alpha_3^2}{6}+\frac{(\alpha_4-3)^2}{24}\bigg)\sim\chi^2_{(df=2)} \tag{2.5} \end{equation}\]
The null hypothesis of a JB test (\(H_0:~\alpha_3=0~~and~~\alpha_4=3\)) will be rejected if \(p\)-value of \(\chi^2\) distribution with \(2\) degrees of freedom is less than the significance level \(\alpha\) (\(1\%\), \(5\%\) or \(10\%\))
Distribution of returns often deviates from a normality due to the presence of extreme values, and therefore usually exhibits high kurtosis and skewness different from zero
Histogram, in particular, may indicate the presence of extreme values above the mean (distribution has a heavier right–tail) which means that returns are positively skewed
Histogram may also indicate the presence of extreme values bellow the mean (distribution has a heavier left–tail) which means that returns are negatively skewed
The leverage effect means that bad news have a greater impact on volatility than good news. A simple test to determine the presence of the leverage effect is based on calculating the first–order cross–correlation coefficient (CCF) between lagged returns and the current squared returns
If the CCF coefficient is negative and significantly different from zero, it can be concluded that there is volatility asymmetry
Returns \(r_t\) are usually independently distributed (no significant autocorrelation), but there is significant autocorrleation of absolute or squared returns, indicating volatility clustering as well as heteroscedasticity, i.e. the variance of returns is not constant over time. Volatility clustering and heteroscedasticity can be checked by employing Ljunx-Box test and ARCH test.
A formal test to determine whether a time–series exhibits significant autocorrelation is the Ljung-Box test
\[\begin{equation} Q(p)=T(T+2)\sum_{k=1}^p \frac{\hat{\rho}_k^2}{T-k}\sim \chi^2_{(df=p)}~~~~~~~~~~~k=1,2,...,p \tag{2.6} \end{equation}\]
where \(\hat{\rho}_k\) is the \(k-th\) sample autocorrelation coefficient. Test statistic (2.6) follows a \(\chi^2\) distribution with \(p\) degrees of freedom. If the null hypothesis of Ljung-Box test is rejected, one can conclude that there are significant ARCH effects up to and including lag \(p\) (when inputs are squared returns).
ARCH test uses different approach: current squared returns are regressed on lagged squared returns, and \(R^2\) (coefficient of determination) is used to compute LM (Lagrange Multiplier) test statistic
\[\begin{equation} \begin{aligned} r^2_t&=\beta_0+\beta_1 r^2_{t-1}+\beta_2 r^2_{t-2}+ \dots + \beta_p r^2_{t-p}+ u_t \\ \\ LM&=T\times R^2\sim \chi^2_{(df=p)}\end{aligned} \tag{2.7} \end{equation}\]
If the null hypothesis \(H_0:~\beta_1=\beta_2=\dots=\beta_p=0~~~(or~~R^2=0)\) is rejected, the presence of ARCH effects up to and including lag \(p\) is confirmed.
- Various R packages and commands support checking the stylized facts (empirical properties) of returns (Table 2.2)
Package | Command | Description |
---|---|---|
stats (base R) |
acf() |
computes (and by default plots) autocorrelation coefficients |
stats (base R) |
Box.test() |
performes Box-Ljung test up to \(p\) lags |
fBasics |
acfPlot() |
displays autocorrelation function plot |
fBasics |
normalTest() |
performs normality test |
moments |
skewness() |
computes coefficent \(\alpha_3\) |
moments |
kurtosis() |
computes coefficent \(\alpha_4\) |
FinTS |
ArchTest() |
checks for conditional heteroscedasticity |
quantmod
package. Import historical stock prices of Tesla directly into RStudio from Yahoo Finance source, from January \(1\), \(2021\) to December \(31\), \(2024\). Display the first few rows and the last few rows of imported data. Check the structure of the object TSLA
. Plot daily closing prices of Tesla using ggplot()
command from ggplot2
package.
Solution
Copy the code lines to the clipboard, paste them into an R Script file and run them.The symbol (ticker) of the stock you want to fetch can be found on Yahoo Finance web site https://finance.yahoo.com/. For instance,
MSFT
is the ticker for Microsoft, AAPL
is the ticker for Apple, META
is the ticker for Meta, ^GSPC
is the ticker for S&P500
index, etc. Command
getSymbols()
provides an xts
(eXtensible Time Series) object, which is an extension of the zoo
package, and specifically designed class to handle financial time-series data. Keep in mind that src
argument is used to specify the external source of data (yahoo
is default), and can be set to other sources (some of them require API key), such as google
, tiingo
, etc. In this example, xts object
TSLA
covers \(1004\) trading days (observations) over \(4\) years period, because there is no trading during weekends and non–working holidays.
# Installing and loading quantmod package
install.packages("quantmod")
library(quantmod)
# Fetching Tesla stock data from Yahoo Finance source
getSymbols("TSLA", # symbol (ticker) of the stock we want to fetch
src="yahoo", # specifying the source of the data
from = as.Date("2021-01-01"), # starting date in the format "yyyy-mm-dd"
to = as.Date("2024-12-31")) # ending date in the format "yyyy-mm-dd"
head(TSLA) # displaying the first 6 rows
tail(TSLA) # displaying the last 6 rows
str(TSLA) # checking the struscture of TSLA object
# Plotting daily closing prices of Tesla using ggplot() command directly from the xts object
# Requires installing and loading ggplot2 package
install.packages(ggplot2)
library(ggplot2)
ggplot(TSLA, aes(x = Index, y = TSLA.Close)) +
geom_line(color = "orchid") +
labs(title = "Tesla daily closing prices", x = "Days", y = "Closing price") +
theme_minimal()
# Alternative to ggplot() is the simpler plot(), but it's not as aesthetically pleasing
plot(TSLA$TSLA.Close,main="Daily closing prices of Tesla")
\(~~~\)
- The
quantmod
package depends on several other packages and R will automatically install these dependencies if they are not installed before
returns
based on the closing prices. Make a line plot of daily returns using ggplot()
command. Install and load the moments
package, which supports commands for skewness and kurtosis. From modelsummary
package use datasummary()
command to display summary statistics for the daily Tesla returns: minimum, maximum, mean, standard deviation, skewness, and kurtosis. Plot daily returns
by a histogram with relative frequencies and add a normal curve to the same plot. Perform normalTest()
from fBasics
package to check the normality of returns within JB statistic.
Solution
Copy the code lines below to the clipboard, paste them into an R Script file, and run them.Command
dailyReturn()
automatically replaces first missing return with zero, and therefore it is a good alternative for calculating the first differences of the logs. Command datasummary()
requires data to be of type data frame
. The first argument of command datasummary()
is two–sided formula
which uses ~
symbol, while last argument fmt
controls the number of digits. Histogram clearly exhibits heavy–tails and non–normality of returns, which is supported by rejection of the JB null hypothesis at \(1\%\) significance level due to positive skewness and kurtosis grater than \(3\). # Installing and loading additional packages
install.packages("moments")
install.packages("modelsummary")
install.packages("fBasics")
library(moments)
library(modelsummary)
library(fBasics)
# Calculating daily returns as first differences of the logs
<- diff(log(Cl(TSLA))) # extracting closing prices by Cl() command
returns <- na.fill(returns,fill=0) # replacing the first missing return with zero
returns
# Alternative for calculating daily returns is dailyReturn() command
<- dailyReturn(Cl(TSLA),type="log") # argument type="log" defines continuously compounded returns
returns2
# Line plot of daily returns
ggplot(returns, aes(x = Index, y = TSLA.Close)) +
geom_line(color = "coral") +
labs(title = "Tesla daily returns", x = "Days", y = "Log returns") +
theme_minimal()
# Displaying the summary statistics of daily returns
datasummary(min+max+mean+sd+skewness+kurtosis ~ Returns, # two-sided formula
data = setNames(data.frame(returns), "Returns"), # data frame object with renamed column
fmt = 4, # number of digits
title = "Table 1. Summary statistics of daily returns")
# Plotting a histogram and overlay a normal curve
ggplot(returns,aes(x = TSLA.Close)) +
geom_histogram(aes(y = ..density..), bins = 11, fill = "skyblue", color = "skyblue", alpha = 0.5) +
stat_function(fun = dnorm, args = list(mean = 0.0005378, sd = 0.0377), color = "pink",size=1) +
labs(title = "Histogram of daily Tesla returns", x = "Log returns", y = "Density") +
theme_minimal()
# Jarque-Bera test for normality of returns
normalTest(returns, method = "jb") # parametric "jb" test is applied
# An alternative command is jarqueberaTest() from the same package
jarqueberaTest(returns)
\(~~~\)
returns
by a histogram, add a normal distribution as well as and Student’s t-distribution to the same plot (set degress of freedom to five df=5
). Which distribution fits standardized returns better?
Solution
Copy the code lines below to the clipboard, paste them into an R Script file, and run them.Student’s t–distribution fits returns better as it captures heavy–tails when
df
is low. To determine which df
is appropriate in capturing heavy–tails, a relationship between degrees of freedom and kurtosis is established, i.e. df
can be calculated from kurtosis (or can be estimated while fitting the t–distribution to standardized returns): \(~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\) \(df=4+\dfrac{6}{\alpha_4-3}=4+\dfrac{6}{5.1597-3}=6.778\)
# Plotting a histogram of standardized returns and overlaying it with a normal and t-distribution
ggplot(returns, aes(x = scale(TSLA.Close))) + # scaling the returns
geom_histogram(aes(y = ..density..), bins = 11, fill = "skyblue", color = "skyblue", alpha = 0.5) + # plotting a histogram
stat_function(fun = dnorm, args = list(mean = 0, sd = 1), color = "pink", size = 1, aes(linetype = "Normal")) + # adding normal distribution
stat_function(fun = dt, args = list(df = 5), color = "seagreen", size = 1, aes(linetype = "Student's t")) + # adding Student's t-distribution
labs(title = "Histogram of standardized daily returns", x = "Standardized returns", y = "Density") + # customizing labels
scale_linetype_manual(name = "Distribution", values = c("Normal" = "solid", "Student's t" = "solid")) + # changing legend title and line types
theme_minimal() +
theme(legend.title = element_text(size = 13), legend.text = element_text(size = 12),legend.position = c(0.8, 0.6)) # customizing legend appearance
\(~~~\)
returns
up to \(10\) lags using acf()
command. Do the same for squared returns. Test the significance of autocorrelation for both time–series (returns and squared returns) by employing Ljung–Box test at lag \(10\). Likewise, perform ArchTest()
from FinTS
package using returns
time–series.
Solution
Copy the code lines below to the clipboard, paste them into an R Script file, and run them.Ljung-Box test which uses squared returns as inputs exhibit the similar result as ARCH test, both confirming the existence of ARCH effects in returns time–series, i.e. the current volatility (variance of returns) depends on past squared returns up to and including \(10\) lags.
# Installing and loading additional package which supports ArchTest() command
install.packages("FinTS")
library(FinTS)
# Plotting ACF of returns
acf(returns, main = "Correlogram of returns",lag.max=10)
# Plotting ACF of squared returns
acf((returns^2), main = "Correlogram of squared returns",lag.max=10)
# Testing the significance of autocorrelation of returns and squared returns
Box.test(returns, lag = 10, type = "Ljung-Box")
Box.test(returns^2, lag = 10, type = "Ljung-Box")
# Testing for ARCH effects
ArchTest(returns,lags=10)
\(~~~\)