7  Descriptive Statistics

Descriptive Statistics is an essential part of data analysis that focuses on summarizing and presenting data in a way that is easy to understand. The primary goal of Descriptive Statistics is to provide an overview of the available data using various techniques that describe patterns, distributions, and relationships between variables.

In the following mind map presents a comprehensive visual overview of Descriptive Statistics, covering key aspects of categorical data, numerical data, distribution shape, relative position, and association metrics between variables. The mind map is designed to provide a clearer understanding and practical application of descriptive statistical techniques in data analysis.

In data analysis, there is often a distinction between categorical and numerical data, each requiring different descriptive methods. Descriptive statistics for categorical data involves calculating frequency and proportion, using contingency tables, and frequency distributions to summarize qualitative data. On the other hand, descriptive statistics for numerical data focuses on computing measures like the mean, median, mode, range, variance, and standard deviation to provide a numerical summary of the data distribution.

It is crucial to understand the shape of the distribution of data through metrics such as skewness and kurtosis, which offer deeper insights into the data distribution and the potential presence of outliers. Additionally, relative position measures like percentiles, quartiles, and z-scores help in understanding how data points are positioned relative to others in the dataset.

7.1 Categorical Data

7.1.1 Frequency and Proportion

  • Definition of frequency
  • Calculating proportion in categorical data

7.1.2 Contingency Table

  • Definition and use of contingency tables
  • Analyzing relationships between categorical variables

7.1.3 Frequency Distribution

  • Creating and interpreting frequency distributions
  • Visualizing frequency distributions using tables and bar charts

7.2 Numerical Data

7.2.1 Mean

  • Definition and formula of mean
  • Advantages and disadvantages of using the mean

7.2.2 Median

  • Definition and how to calculate the median
  • Median in skewed data distributions

7.2.3 Mode

  • Definition and how to calculate the mode
  • Mode in data with multiple identical values

7.2.4 Range

  • How to calculate the range
  • Use of range in data analysis

7.2.5 Variance

  • Definition of variance and its relationship to standard deviation
  • Variance in data distributions

7.2.6 Standard Deviation

  • Definition and formula of standard deviation
  • Interpretation of standard deviation values

7.3 Shape of Distribution

7.3.1 Skewness

  • Definition and types: positive, negative, and zero skew
  • Formula and interpretation
  • Effects of skewness on mean and median

7.3.2 Kurtosis

  • Definition and types: leptokurtic, platykurtic, mesokurtic
  • Formula and interpretation
  • Importance in risk and outlier analysis

7.4 Relative Position

7.4.1 Percentiles

  • Definition and how to calculate
  • Usage in performance measurement and threshold setting

7.4.2 Quartiles

  • Q1, Q2 (median), Q3 definitions
  • Application in boxplots and IQR calculation

7.4.3 Z-Score

  • Standardized score relative to mean and standard deviation
  • Use in comparing data points across distributions

7.4.4 T-Score

  • Definition and formula
  • Comparison with Z-score in small samples

7.5 Association Metrics

7.5.1 Covariance

  • Meaning and direction of linear relationship
  • Limitations in interpretation without normalization

7.5.2 Pearson Correlation Coefficient

  • Strength and direction of linear relationship
  • Range [-1, 1] and interpretation

7.5.3 Spearman’s Rank Correlation

  • Non-parametric correlation measure
  • Suitable for ordinal data and monotonic relationships

7.5.4 Point-Biserial Correlation

  • Used when one variable is continuous and one is binary
  • Application in psychological and social sciences