Chapter 6 Distribution shape measures
6.1 Skewness
6.1.1 Fisher’s moment coefficient of skewness
The moment coefficient of skewness is usually referred to just as “the skewness coefficient” or just “skewness”.
It is one, and the most popular, of the measures of the asymmetry (that is, the skewness) of a variable’s distribution.
\[\begin{equation} g_{1} = \frac{1}{n}\sum_{i=1}^n\left(\frac{x_i-\bar{x}}{\widehat{\sigma}_x}\right)^3 \tag{6.1} \end{equation}\]
Formula (6.1) is analogous to the “population” version of variance and standard deviation.
The modified skewness coefficient can be defined as follows:
\[\begin{equation} G_{1} = \frac{\sqrt{n(n-1)}}{n-2}g_{1} \tag{6.2} \end{equation}\]
The skewness coefficient measures which end of the distribution plot (the “tail”) is more stretched out. If the left tail is stretched (values below the mean), the skewness coefficient is negative. If the right tail (values above the mean) is stretched, the coefficient is positive.
In popular spreadsheet applications, the modified skewness coefficient (\(G_1\)) can be calculated using the SKEW function (Google Sheets, Excel), while \(g_1\) can be obtained using the SKEW.P function (Google Sheets, Excel).
The interpretation of the skewness coefficient depends on the domain. For example, the following rules can be adopted:
If skewness is between -0.5 and 0.5, the data are fairly symmetrical (weak asymmetry is present)
If skewness is between -1 and -0.5 or between 0.5 and 1, the data are moderately skewed.
If skewness is less than -1 or greater than 1, the data are highly skewed.
6.1.2 Other measures of skewness
Other measures of skewness have been proposed, including:
- Pearson’s median skewness
\[ \frac{3\cdot(\text{mean} - \text{median})}{\text{standard deviation}} \]
- Bowley’s measure of skewness:
\[ \frac{\text{quartile 1} + \text{quartile 3}- 2\cdot\text{median}}{\text{quartile 3} - \text{quartile 1}}\] - Kelly’s measure of skewness
\[ \frac{\text{decile 1} + \text{decile 9}- 2\cdot\text{median}}{\text{decile 9} - \text{decile 1}}\]
Please note that in case of some distributions various measures of skewness may return conflicting information on the direction of the asymmetry of the distribution.
6.2 Kurtosis
Excess kurtosis has the following formula:
\[\begin{equation} g_{2} = \frac{1}{n}\sum_{i=1}^n\left(\frac{x_i-\bar{x}}{\widehat{\sigma}_x}\right)^4-3 \tag{6.3} \end{equation}\]
The above formula can be treated as a “population” formula. The “sample” formula is typically as follows:
\[\begin{equation} G_{2} = \frac{n-1}{(n-2)(n-3)}\left[(n+1)g_{2}+6\right] \tag{6.4} \end{equation}\]
In some statistical packages, the following formula also appears:
\[\begin{equation} b_{2} = \frac{1}{n}\sum_{i=1}^n\left(\frac{x_i-\bar{x}}{s_x}\right)^4-3 \tag{6.5} \end{equation}\]
In spreadsheet applications, the KURT function calculates the coefficient \(G_2\) according to formula (6.4) - Google Sheets, Excel. The coefficients \(g_2\) and \(b_2\) can be determined using the mean, standard deviation, and array formulas.
Excess kurtosis measures the intensity of values in the distribution tails compared to the normal distribution.
The interpretation of kurtosis depends on the field. For example, the following scale can be proposed:
Between -0.5 and 0.5 – the distribution is approximately mesokurtic (extreme values occur with intensity similar to the intensity of extreme values in the normal distribution).
Between -1 and -0.5 – the distribution is moderately platykurtic (there are fewer extreme values, or they are smaller in magnitude than in the normal distribution).
Below -1 – the distribution is highly platykurtic.
From 0.5 to 1.0 – the distribution is moderately leptokurtic.
Between 1.0 and 5.0 – highly leptokurtic.
Above 5.0 – extremely leptokurtic.
6.3 Outliers
There is no single definition of an outlier. Most commonly, they are defined using position measures or using standardized “z” values.
6.3.1 Identifying outliers using position measures
This method is used in the case of box plots. Outliers are those values that are either less than \(Q_1 - 1.5\cdot\text{IQR}\), or greater than \(Q_3 + 1.5\cdot\text{IQR}\).
Of course, it is possible to adopt multipliers other than 1.5.
Some authors define outliers, and additionally distinguish “extreme values” that are either less than \(Q_1 - 3\cdot\text{IQR}\), or greater than \(Q_3 + 3\cdot\text{IQR}\). For others, outliers and extreme values are synonyms.
6.4 Links
Distribution Skewness Explorer – Interactive Simulator: https://bankonomia.nazwa.pl/skewness/