1.1 Basic econometrics reminder
An econometric model can represent as a single equation or a system of equations including either two variables (bivariate model) or more than two variables (multivariate model), and not all variables are required to be numerical, while they can have different roles
Basic steps of econometric analysis:
- Model specification (model should be correctly specified according to economic theory)
- Data collection and preparation (generate new variables or transform existing ones)
- Descriptive statistics of the sample data and examination of their properties
- Parameters estimation according to the chosen estimator, e.g. OLS, WLS, GLS, ML, GMM, etc.
- Significance testing of estimated parameters
- Diagnostic checking if all assumptions are met and how well model fits the data
- Interpretation and forecasting (to explain and predict changes of financial phenomena)
Model specification refers to: \((1)\) appropriate variables selection, \((2)\) assuming causality direction, and \((3)\) appropriate functional form selection
Variables on the right-hand side can have different roles; some may serve as control variables, while others can be multiplied to represent interaction term or moderated effect
When dealing with time-series data (data observed over time) it is common for the dependent variable \(y\) to appear also as an independent variable, making it endogenous
Solution
Variable \(x\) is exogenous because \(x\) causes \(y\), but not the other way around. Variable \(y\) is endogenous as it appears on both sides of the equation, meaning that \(y\) is both the consequence and the cause simultaneously. This is common when dealing with time-series data, and thus subscript \(t\) represents time unit (such as week, day, hour, month or year). Variable \(y_{t-1}\) is agged because it is observed in previous time period (subscript \(t-1\)), e.g. lagged consumption might be used as RHS variable to account for how past consumption impacts present consumption. Likewise, a variable lagged for two periods is noted as \(y_{t-2}\), variable lagged for three periods is noted as \(y_{t-3}\), etc.Example 2. Which variable is endogenous and which one is exogenous in the system of equations? How many parameters we need to estimate? System of two equations write in a matrix form!
\[y_t=\beta_{1,0}+\beta_{1,1}y_{t-1}+\beta_{1,2}x_{t-1}+u_{1,t}\] \[x_t=\beta_{2,0}+\beta_{2,1}y_{t-1}+\beta_{2,2}x_{t-1}+u_{2,t}\]Solution
Considering the system of equations both variables are endogenous, meaning that \(x\) causes \(y\) and \(y\) causes \(x\). From this point none of the variables is strictly exogenous. Matrix form of the system is: \[\begin{bmatrix}y_t \\ x_t \end{bmatrix}=\begin{bmatrix} \beta_{1,0} \\ \beta_{2,0} \end{bmatrix}+ \begin{bmatrix} \beta_{1,1} \quad \beta_{1,2} \\ \beta_{2,1} \quad \beta_{2,2} \end{bmatrix} \begin{bmatrix}y_{t-1} \\ x_{t-1} \end{bmatrix}+\begin{bmatrix} u_{1,t} \\ u_{2,t} \end{bmatrix}\]Solution
It is multivariate model due to more than one observed RHS variable (\(k\ge2\)). Variables are \(y\), \(x\), \(z\) and \(u\), while \(\alpha\), \(\beta\), \(\gamma\) and \(\lambda\) are parameters. Known (observed) variables are \(y\), \(x\) and \(z\). Error term \(u\) is unknown (unobserved) random variable.Parameter \(\lambda\) is the interaction term associated with the multiplication of the two variables \(x\) and \(z\). In a given example, parameter \(\lambda\) represents the difference in the change of inflation with respect to \(1\) unit change in interest rate between crisis and non-crisis period. For instance, if \(\lambda \lt 0\) it indicates that influence of interest rate on inflation was weaker in COVID pandemic period compared to non-pandemic period.
- Keep in mind that raw data are usually transformed:
- Taking the logs, squares, inverse values, square roots, …
- Seasonally and/or calendar adjusted
- First differences are sometimes required as well as lagged values
- Deflating nominal values
- Most common data issues:
- Missing values (NA)
- Measurement errors (collected data may not always present the true values)
- Outliers (extreme values above or below the mean)
Regardless of the functional form and data type you should always consider parsimony principle with respect to the number of variables on the right-hand side (less is better)!
This principle balances model goodness-of-fit with it’s simplicity to avoid overfitting