2.1 Stylized facts of returns

Returns data have some unique empirical properties which should be examined in pre-estimation phase

Volatility clustering (large returns tend to be followed by larger returns and small returns tend to be followed by smaller returns – periods of high volatility group together and periods of low volatility group together)
Heavy–tails (extreme values occur more often than one would expect under normal conditions, indicating a very high kurtosis)
Time–varying variance (implies heteroscedasticity over time)
Leverage effect (volatility asymmetry)
Long memory (volatility is highly persistent when observed at higher frequencies)
Presence of the noise (at higher frequencies volatility is contaminated with the noise such as microstructure effects, like bid–ask bounces, and other trading imperfections)

For above reasons descriptive statistics is crucial at the beginning of analysis, including visual inspection of the log returns which helps us to see patterns and other details. The most useful plots are:

Histogram – used to visualize the distribution of returns within consecutive intervals
Line plot – used to visualize how returns change over time
ACF plot – used to visualize the relationship between current returns and lagged returns from the previous periods (autocorrelation)

Along with plots, descriptive statistics of log returns should be provided (mean, standard deviation, skewness, kurtosis, \(\dots\)) and normality should be checked
A normal distribution has a skewness of \(0\) and a kurtosis of \(3\)
Skewness (\(\alpha_3\)) and kurtosis (\(\alpha_4\)) can be used jointly to check for normality according to the Jarque–Bera statistica \[\begin{equation}JB=n\times \bigg(\frac{\alpha_3^2}{6}+\frac{(\alpha_4-3)^2}{24}\bigg)\sim\chi^2_{(df=2)} \tag{2.5} \end{equation}\]
The null hypothesis of a JB test (\(H_0:~\alpha_3=0~~and~~\alpha_4=3\)) is rejected if \(p-\)value of \(\chi^2\) distribution with \(2\) degrees of freedom is less than the significance level \(\alpha\) (\(1\%\), \(5\%\) or \(10\%\))
Distribution of returns often deviates from a normality due to the presence of extreme values, and therefore exhibits high kurtosis and skewness different from zero
Histogram, in particular, may indicate the presence of extreme values above the mean (distribution has a heavier right–tail) which means that returns are positively skewed
Histogram may also indicate the presence of extreme values bellow the mean (distribution has a heavier left–tail) which means that returns are negatively skewed
The leverage effect means that bad news have a greater impact on volatility than good news. A simple test to determine the presence of the leverage effect is based on calculating the first–order cross–correlation coefficient (CCF) between lagged returns and the current squared returns
If the CCF coefficient is negative and significantly different from zero, it can be concluded that there is volatility asymmetry

Returns \(r_t\) are usually independently distributed (no significant autocorrelation), but there is significant autocorrleation of absolute or squared returns, indicating volatility clustering as well as heteroscedasticity, i.e. the variance of returns is not constant over time. Volatility clustering and heteroscedasticity can be checked by employing Ljunx–Box test and ARCH test.
A formal test to determine whether a time–series exhibits significant autocorrelation is the Ljung-Box test

\[\begin{equation} Q(p)=T(T+2)\sum_{k=1}^p \frac{\hat{\rho}_k^2}{T-k}\sim \chi^2_{(df=p)}~~~~~~~~~~~k=1,2,...,p \tag{2.6} \end{equation}\]

where \(\hat{\rho}_k\) is the \(k-\)th sample autocorrelation coefficient.
Test statistic (2.6) follows a \(\chi^2\) distribution with \(p\) degrees of freedom. If the null hypothesis of Ljung–Box test is rejected, one can conclude that there are significant ARCH effects up to and including lag \(p\) (when inputs are squared returns).

ARCH test uses different approach: current squared returns are regressed on lagged squared returns, and \(R^2\) (coefficient of determination) is used to compute LM (Lagrange Multiplier) test statistic

\[\begin{equation} \begin{aligned} r^2_t&=\beta_0+\beta_1 r^2_{t-1}+\beta_2 r^2_{t-2}+ \dots + \beta_p r^2_{t-p}+ u_t \\ \\ LM&=T\times R^2\sim \chi^2_{(df=p)}\end{aligned} \tag{2.7} \end{equation}\]

If the null hypothesis \(H_0:~\beta_1=\beta_2=\dots=\beta_p=0~~~(equivalent~~to~~R^2=0)\) is rejected, the presence of ARCH effects up to and including lag \(p\) is confirmed

Various R packages and commands support checking the stylized facts (empirical properties) of returns (Table 2.2)

TABLE 2.2: R packages and commands for checking stylized facts of returns
Package	Command	Description
`stats` (base R)	`acf()`	Computes (and by default plots) autocorrelation coefficients
`stats` (base R)	`Box.test()`	Performes Box-Ljung test up to \(p\) lags
`fBasics`	`acfPlot()`	Displays autocorrelation function plot
`fBasics`	`normalTest()`	Performs normality test
`moments`	`skewness()`	Calculates coefficent \(\alpha_3\)
`moments`	`kurtosis()`	Calculates coefficent \(\alpha_4\)
`FinTS`	`ArchTest()`	Checks for conditional heteroscedasticity

Example 4. Install and load the quantmod package. Import historical stock prices of Tesla directly into RStudio from Yahoo Finance source, from January \(1\), \(2021\) to December \(31\), \(2024\). Display the first few rows and the last few rows of imported data. Check the structure of the object TSLA. Plot daily closing prices of Tesla using ggplot() command from ggplot2 package.

Solution

Copy the code lines to the clipboard, paste them into an R Script file and run them.
The symbol (ticker) of the stock you want to fetch can be found on Yahoo Finance web site https://finance.yahoo.com/. For instance, MSFT is the ticker for Microsoft, AAPL is the ticker for Apple, META is the ticker for Meta, ^GSPC is the ticker for S&P500 index, etc.
Command getSymbols() provides an xts (eXtensible Time Series) object, which is an extension of the zoo package, and specifically designed class to handle financial time–series data. Keep in mind that src argument is used to specify the external source of data (yahoo is default), and can be set to other sources (some of them require API key), such as google, tiingo, etc.
In this example, xts object TSLA covers \(1004\) trading days (observations) over \(4\) years period, because there is no trading during weekends and non–working holidays.

# Installing and loading quantmod package
install.packages("quantmod")
library(quantmod)
# Fetching Tesla stock data from Yahoo Finance source
getSymbols("TSLA", # symbol (ticker) of the stock we want to fetch
           src="yahoo", # specifying the source of the data
           from = as.Date("2021-01-01"), # starting date in the format "yyyy-mm-dd"
           to = as.Date("2024-12-31")) # ending date in the format "yyyy-mm-dd"

head(TSLA) # displaying the first 6 rows
tail(TSLA) # displaying the last 6 rows
str(TSLA) # checking the struscture of TSLA object

# Plotting daily closing prices of Tesla using ggplot() command directly on the xts object 
# Requires installing and loading ggplot2 package
install.packages("ggplot2")
library(ggplot2)

ggplot(TSLA, aes(x = Index, y = TSLA.Close)) +
  geom_line(color = "orchid") + 
  labs(title = "Tesla daily closing prices", x = "Day", y = "Closing price") +
  theme_minimal()

# Alternative to ggplot() is the simpler plot(), but it's not as aesthetically pleasing
plot(TSLA$TSLA.Close,main="Tesla daily closing prices")

\(~~~\)

The quantmod package depends on several other packages and R will automatically install these dependencies

Example 5. Calculate the daily returns based on the Tesla closing prices. Make a line plot of daily returns using ggplot() command. Install and load the moments package, which supports commands for skewness and kurtosis. From modelsummary package use datasummary() command to display summary statistics for the daily Tesla returns: minimum, maximum, mean, standard deviation, skewness, and kurtosis. Plot daily returns by a histogram with relative frequencies and add a normal curve to the same plot. Perform normalTest() from fBasics package to check the normality of returns within JB statistic.

Solution

Copy the code lines below to the clipboard, paste them into an R Script file, and run them.
Command dailyReturn() automatically replaces first missing return with zero, and therefore it is a good alternative for calculating the first differences of the logs. Command datasummary() requires data to be of type data frame. The first argument of command datasummary() is two–sided formula which uses ~ symbol, while last argument fmt controls the number of digits.
Histogram clearly exhibits heavy–tails and non–normality of returns, which is supported by rejection of the JB null hypothesis at \(1\%\) significance level due to positive skewness and kurtosis grater than \(3\).

# Installing and loading additional packages
install.packages("moments")
install.packages("modelsummary")
install.packages("fBasics")
library(moments)
library(modelsummary)
library(fBasics)

# Calculating daily returns as first differences of the logs
returns <- diff(log(Cl(TSLA))) # extracting closing prices by Cl() command 
returns <- na.fill(returns,fill=0) # replacing the first missing return with zero

# Alternative for calculating daily returns is dailyReturn() command
returns2 <- dailyReturn(Cl(TSLA),type="log") # argument type="log" defines log returns

# Line plot of daily returns
ggplot(returns, aes(x = Index, y = TSLA.Close)) +
  geom_line(color = "coral") + 
  labs(title = "Tesla daily returns", x = "Day", y = "Log return") +
  theme_minimal()

# Displaying the summary statistics of daily returns
datasummary(min+max+mean+sd+skewness+kurtosis ~ Returns, # two-sided formula
            data = setNames(data.frame(returns), "Returns"), # data frame with renamed column
            fmt = 5, # number of digits
            title = "Table 1. Summary statistics of daily returns")

# Plotting a histogram and overlay a normal curve
ggplot(returns,aes(x = TSLA.Close)) +
  geom_histogram(aes(y = ..density..), bins = 11, fill = "skyblue", color = "skyblue", alpha = 0.5) +
  stat_function(fun = dnorm, args = list(mean = 0.0005378, sd = 0.0377), color = "pink",size=1) +
  labs(title = "Histogram of daily Tesla returns", x = "Log returns", y = "Density") +
  theme_minimal()

# Jarque-Bera test for normality of returns
normalTest(returns, method = "jb") # parametric "jb" test is applied

# An alternative command is jarqueberaTest() from the same package
jarqueberaTest(returns)

\(~~~\)

Example 6. Plot daily standardized returns by a histogram, add a normal distribution and Student’s t–distribution to the same plot (set degrees of freedom to five df=5). Which distribution fits standardized returns better?

Solution

Copy the code lines below to the clipboard, paste them into an R Script file, and run them.
Student’s t–distribution fits returns better as it captures heavy–tails when df is low. To determine which df is appropriate in capturing heavy–tails, a relationship between degrees of freedom and kurtosis is established, i.e. df can be estimated while fitting the t–distribution to standardized returns or can be calculated from kurtosis:
\(~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\) \(df=4+\dfrac{6}{\alpha_4-3}=4+\dfrac{6}{5.1597-3}=6.778\)

# Plotting a histogram of standardized returns and overlaying it with a normal and t-distribution
ggplot(returns, aes(x = scale(TSLA.Close))) +  # scaling the returns
  geom_histogram(aes(y = ..density..), bins = 11, fill = "skyblue", color = "skyblue", alpha = 0.5) + # plotting a histogram
  stat_function(fun = dnorm, args = list(mean = 0, sd = 1), color = "pink", size = 1, aes(linetype = "Normal")) +  # adding normal distribution
  stat_function(fun = dt, args = list(df = 5), color = "seagreen", size = 1, aes(linetype = "Student (df=5)")) +  # adding Student's t-distribution
  labs(title = "Histogram of standardized daily returns", x = "Standardized returns", y = "Density") + # customizing labels
  scale_linetype_manual(name = "Distribution", values = c("Normal" = "solid", "Student (df=5)" = "solid")) +  # changing legend title and line types
  theme_minimal() +
  theme(legend.title = element_text(size = 13), legend.text = element_text(size = 12),legend.position = c(0.8, 0.6))  # customizing legend appearance

\(~~~\)

Example 7. Plot autocorrelation function of returns up to \(10\) lags using acf() command. Do the same for squared returns. Test the significance of autocorrelation for both time–series (returns and squared returns) by employing Ljung–Box test at lag \(10\). Likewise, perform ArchTest() from FinTS package using returns time–series.

Solution

Copy the code lines below to the clipboard, paste them into an R Script file, and run them.
Ljung–Box test which uses squared returns as inputs exhibit the similar result as ARCH test, both confirming the existence of ARCH effects in return time–series, i.e. the current volatility (variance of returns) depends on past squared returns up to and including \(10\) lags.

# Installing and loading additional package which supports ArchTest() command
install.packages("FinTS")
library(FinTS)

# Plotting ACF of returns
acf(returns, main = "Correlogram of returns",lag.max=10)

# Plotting ACF of squared returns
acf((returns^2), main = "Correlogram of squared returns",lag.max=10)

# Testing the significance of autocorrelation of returns and squared returns
Box.test(returns, lag = 10, type = "Ljung-Box")
Box.test(returns^2, lag = 10, type = "Ljung-Box")

# Testing for ARCH effects up to and including 10 lags
ArchTest(returns,lags=10)

\(~~~\)