E Expectation, Variance, Covariance and Correlation

This section describes fundamental summaries of the distribution of random variables, namely their expectation and variance. For multivariate random variables, the association between the components is summarised by their covariance or correlation. Moreover we list some useful inequalities.

E.1 Expectation

For a continuous random variable \(X\) with density function \(f(x)\), the expectation or mean value of \(X\) is the real number

\[\begin{equation} \mathop{\mathrm{\mathsf{E}}}(X) = \int x f(x) \, dx \tag{E.1} \end{equation}\]

The expectation of a function \(h(X)\) of \(X\) is

\[ \mathop{\mathrm{\mathsf{E}}}\{h(X)\} = \int h(x) f(x) \, dx \tag{E.2} \]

Note that the expectation of a random variable not necessarily exists. If the integral exists then \(X\) has finite expectation; otherwise, we say \(X\) has infinite expectation.

For a discrete random variable \(X\) with probability mass function \(f(x)\), the integral in (E.1) and (E.2) is replaced with a sum over the support of \(X\).

For any real numbers \(a\) and \(b\):

\[ \mathop{\mathrm{\mathsf{E}}}(a \cdot X + b) = a \cdot \mathop{\mathrm{\mathsf{E}}}(X) + b \]

For any two random variables \(X\) and \(Y\):

\[ \mathop{\mathrm{\mathsf{E}}}(X + Y) = \mathop{\mathrm{\mathsf{E}}}(X) + \mathop{\mathrm{\mathsf{E}}}(Y) \]

If \(X\) and \(Y\) are independent:

\[ \mathop{\mathrm{\mathsf{E}}}(X \cdot Y) = \mathop{\mathrm{\mathsf{E}}}(X) \cdot \mathop{\mathrm{\mathsf{E}}}(Y) \]

The expectation of a \(p\)-dimensional random variable \(\boldsymbol{X} = (X_1, \dotsc, X_p)^{\top}\) is:

\[ \mathop{\mathrm{\mathsf{E}}}(\boldsymbol{X}) = (\mathop{\mathrm{\mathsf{E}}}(X_1), \dotsc, \mathop{\mathrm{\mathsf{E}}}(X_p))^{\top} \]

The expectation of a real-valued function \(h(\boldsymbol{X})\) of a \(p\)-dimensional random variable \(\boldsymbol{X} = (X_1, \dotsc, X_p)^{\top}\) is

\[ \mathop{\mathrm{\mathsf{E}}}\{h(\boldsymbol{X})\} = \int h(\boldsymbol{X}) f(\boldsymbol{X}) \, d\boldsymbol{x} \tag{E.3} \]

E.2 Variance

The variance of a random variable \(X\) is

\[ \mathop{\mathrm{Var}}(X) = \mathop{\mathrm{\mathsf{E}}}\{X - \mathop{\mathrm{\mathsf{E}}}(X)\}^{2} \]

It can also be expressed as

\[ \mathop{\mathrm{Var}}(X) = \mathop{\mathrm{\mathsf{E}}}(X^{2}) - \mathop{\mathrm{\mathsf{E}}}(X)^{2} \]

and

\[ \mathop{\mathrm{Var}}(X) = \frac{1}{2}\mathop{\mathrm{\mathsf{E}}}\left\{(X_1 - X_2)^2 \right\} \]

where \(X_1\) and \(X_2\) are independent copies of \(X\). The square root \(\sqrt{\mathop{\mathrm{Var}}(X)}\) is called the standard deviation.

For real numbers \(a\) and \(b\):

\[ \mathop{\mathrm{Var}}(a \cdot X + b) = a^2 \cdot \mathop{\mathrm{Var}}(X) \]

E.3 Moments

Let \(k\) be a positive integer. The \(k\)-th moment \(m_k\) of a random variable \(X\) is

\[ m_k = \mathop{\mathrm{\mathsf{E}}}(X^k) \]

The \(k\)-th central moment is

\[ c_k = \mathop{\mathrm{\mathsf{E}}}\{(X - m_1)^k\} \]

The expectation is the first moment, and the variance is the second central moment.

E.4 Conditional Expectation and Variance

For continuous random variables, the conditional expectation of \(Y\) given \(X = x\) is

\[ \mathop{\mathrm{\mathsf{E}}}(Y\mid X = x) = \int y f(y\mid x) \, dy \tag{E.4} \]

For discrete variables, the integral becomes a sum. The conditional variance of \(Y\) given \(X = x\) is

\[ \mathop{\mathrm{Var}}(Y\mid X = x) = \mathop{\mathrm{\mathsf{E}}}\left[ \{Y - \mathop{\mathrm{\mathsf{E}}}(Y\mid X = x)\}^{2}\mid X = x\right] \tag{E.5} \]

If we treat \(x\) as unknown, then \(g(X) = \mathop{\mathrm{\mathsf{E}}}(Y\mid X)\) and \(h(X) = \mathop{\mathrm{Var}}(Y\mid X)\) become random variables themselves.

Two key results:

Law of total expectation:

\[ \mathop{\mathrm{\mathsf{E}}}(Y) = \mathop{\mathrm{\mathsf{E}}}\{\mathop{\mathrm{\mathsf{E}}}(Y\mid X)\} \tag{E.6} \]

Law of total variance:

\[ \mathop{\mathrm{Var}}(Y) = \mathop{\mathrm{\mathsf{E}}}\{\mathop{\mathrm{Var}}(Y\mid X)\} + \mathop{\mathrm{Var}}\{\mathop{\mathrm{\mathsf{E}}}(Y\mid X)\} \tag{E.7} \]

These are useful when conditional moments are known.

E.5 Covariance

Let \((X, Y)^{\top}\) be a bivariate random variable. The covariance is

\[ \mathop{\mathrm{Cov}}(X,Y) = \mathop{\mathrm{\mathsf{E}}}\bigl[\{X-\mathop{\mathrm{\mathsf{E}}}(X)\}\{Y-\mathop{\mathrm{\mathsf{E}}}(Y)\}\bigr] = \mathop{\mathrm{\mathsf{E}}}(XY)-\mathop{\mathrm{\mathsf{E}}}(X)\mathop{\mathrm{\mathsf{E}}}(Y) \]

See (E.3). Note: \(\mathop{\mathrm{Cov}}(X, X)= \mathop{\mathrm{Var}}(X)\) and \(\mathop{\mathrm{Cov}}(X, Y)= 0\) if \(X\) and \(Y\) are independent.

For real \(a\), \(b\), \(c\), \(d\):

\[ \mathop{\mathrm{Cov}}(a X + b, c Y + d) = a c \mathop{\mathrm{Cov}}(X,Y) \]

For \(\boldsymbol{X} = (X_1, \dotsc, X_p)^{\top}\):

\[ \mathop{\mathrm{Cov}}(\boldsymbol{X}) = \mathop{\mathrm{\mathsf{E}}}\left[\{\boldsymbol{X}- \mathop{\mathrm{\mathsf{E}}}(\boldsymbol{X})\}\{\boldsymbol{X}- \mathop{\mathrm{\mathsf{E}}}(\boldsymbol{X})\}^{\top} \right] \]

Also:

\[ \mathop{\mathrm{Cov}}(\boldsymbol{X}) = \mathop{\mathrm{\mathsf{E}}}(\boldsymbol{X} \boldsymbol{X}^{\top}) - \mathop{\mathrm{\mathsf{E}}}(\boldsymbol{X}) \mathop{\mathrm{\mathsf{E}}}(\boldsymbol{X})^{\top} \]

For a matrix \(\boldsymbol{A}\):

\[ \mathop{\mathrm{Cov}}(\boldsymbol{A} \boldsymbol{X}) = \boldsymbol{A} \mathop{\mathrm{Cov}}(\boldsymbol{X}) \boldsymbol{A}^{\top} \]

In particular:

\[ \mathop{\mathrm{Var}}(X+Y) = \mathop{\mathrm{Var}}(X) + \mathop{\mathrm{Var}}(Y) + 2 \mathop{\mathrm{Cov}}(X,Y) \]

If \(X\) and \(Y\) are independent:

\[ \mathop{\mathrm{Var}}(X+Y) = \mathop{\mathrm{Var}}(X) + \mathop{\mathrm{Var}}(Y) \]

\[ \mathop{\mathrm{Var}}(XY) = \mathop{\mathrm{\mathsf{E}}}(X)^2 \mathop{\mathrm{Var}}(Y) + \mathop{\mathrm{\mathsf{E}}}(Y)^2 \mathop{\mathrm{Var}}(X) + \mathop{\mathrm{Var}}(X) \mathop{\mathrm{Var}}(Y) \]

E.6 Correlation

The correlation of \(X\) and \(Y\) is

\[ \mathop{\mathrm{Corr}}(X,Y) = \frac{\mathop{\mathrm{Cov}}(X,Y)}{\sqrt{\mathop{\mathrm{Var}}(X)\mathop{\mathrm{Var}}(Y)}} \]

as long as variances are positive.

An important inequality:

\[ \left|\mathop{\mathrm{Corr}}(X,Y)\right| \leq 1 \tag{E.8} \]

This follows from the Cauchy–Schwarz inequality:

\[ \mathop{\mathrm{\mathsf{E}}}(XY)^2 \leq \mathop{\mathrm{\mathsf{E}}}(X^2)\mathop{\mathrm{\mathsf{E}}}(Y^2) \tag{E.9} \]

Apply (E.9) to \(X-\mathop{\mathrm{\mathsf{E}}}(X)\) and \(Y-\mathop{\mathrm{\mathsf{E}}}(Y)\) to get (E.8).

If \(Y = aX + b\) with \(a > 0\), then \(\mathop{\mathrm{Corr}}(X, Y) = 1\); if \(a < 0\), then \(\mathop{\mathrm{Corr}}(X, Y) = -1\).

Let \(\boldsymbol{\Sigma}\) be the covariance matrix. The correlation matrix \(\boldsymbol{R}\) is:

\[ \boldsymbol{R} = \boldsymbol{S}^{-1}\boldsymbol{\Sigma}\boldsymbol{S}^{-1} \]

where \(\boldsymbol{S}\) is diagonal with entries \(\sqrt{\mathop{\mathrm{Var}}(X_i)}\). The diagonal of \(\boldsymbol{R}\) contains all ones.