2  Regions and their Countries

Objectives

Country and Regions: Classifications for countries

I aim to compare different aspects of countries. I want, for instance, to know how well Austria is doing compared to other European countries, the other member states of the European Union, or other OECD countries. It is, therefore, vital to have a consistent categorization system with different grouping schemes.

My objective for this chapter are:

  1. Understanding the country classification used by my WHR data sources (Section 2.1).
  2. Inspecting different approaches (Section 2.2) to classify countries by international organizations:
  3. Integrating and harmonizing the official country classifications with the WHR data country names so that the dataset includes unofficial (WHR) regions but conforms otherwise to the internationally recognized and approved systems (Section 2.3) and (Section 2.4).
  4. The final result is the dataset whr_final.rds (Section 2.4.3) containing 19379 rows and 34 columns.
  5. In the summary I will show which countries belong to which regions by all official grouping classifications (Section 2.5).

2.1 Country groupings in WHR

Some of the WHR years (WHR reports 2013, 2015, 2020, and 2021) have a regional grouping incorporated. But this grouping is not included in the (new) dataset 2024 from the WHR report 2025.

2.1.1 Classification of WHR 2020

I will display the regional grouping of the WHR data with the example from the year 2020 (WHR report 2021).

R Code 2.1 : WHR data 2020 classification

Listing / Output 2.1: Result of the WHR classification system for WHR data 2020 used in the WHR report 2021.
Code
(
  df_dt_whr_2020 <-  base::readRDS("data/whr/raw/whr_raw_2021.rds") |> 
      dplyr::select(`Country name`, `Regional indicator`) |> 
      dplyr::nest_by(`Regional indicator`) |> 
      dplyr::mutate(data = as.vector(data)) |>
      dplyr::mutate(data = stringr::str_c(data, collapse = "; ")) |>
      dplyr::mutate(data = paste(data, ";")) |> 
      dplyr::mutate(N = lengths(gregexpr(";", data))) |> 
      dplyr::rename(Country = data) |> 
      DT::datatable(class = 'cell-border compact stripe', 
                options = list(
                  pageLength = 25,
                  lengthMenu = c(5, 10, 15, 20, 25, 50)
                  )
            )
)

There are 10 different regional indicators. The datasets for 2013, 2015, 2020 and 2021 use all the same classification scheme with 149 countries in 10 regions.

149 are by far not all countries of the world. Their complete number is about 195 with some insecurities about Holy See (Vatican), the State of Palestine, Taiwan and Kosovo. (Compare: How Many Countries Are There In The World?) The reason for this lower number is simple: For only those 149 countries are subjective well-being data in the study year 2020 available.

2.1.2 class_scheme() function

As I am going to list several classification variants it pays the effort to develop a function for the repetitive task.

R Code 2.2 : Function class_scheme() for showing classification schemes

Listing / Output 2.2: Function class_scheme() for showing results of a classification system
Code
class_scheme <- function(df, sel1, sel2) {
    ## df = dataframe to show
    ## sel1 = name of the first column (country names) to select
    ## sel2 = name of the column with the regional indicator
  df |> 
        dplyr::select(!!sel1, !!sel2) |> 
        dplyr::nest_by(!!sel2) |> 
        dplyr::mutate(data = as.vector(data)) |>
        dplyr::mutate(data = stringr::str_c(data, collapse = "; ")) |>
        dplyr::mutate(data = paste(data, ";")) |> 
        dplyr::mutate(N = lengths(gregexpr(";", data))) |> 
        dplyr::rename(Country = data) |> 
        dplyr::arrange(!!sel2) |> 
        DT::datatable(class = 'cell-border compact stripe', 
            options = list(
              pageLength = 25,
              lengthMenu = c(5, 10, 15, 20, 25, 50),
              columnDefs = list(
                  list(className = 'dt-body-left', targets = 2)
                  )
              )
        )
}

class_scheme2 <- function(df, sel1, sel2) {
    ## df = dataframe to show
    ## sel1 = name of the first column (country names) to select
    ## sel2 = name of the column with the regional indicator
  df |> 
        dplyr::select(!!sel1, !!sel2) |> 
        dplyr::nest_by(!!sel2) |> 
        dplyr::mutate(data = as.vector(data)) |>
        dplyr::mutate(data = stringr::str_c(data, collapse = "; ")) |>
        dplyr::mutate(data = paste(data, ";")) |> 
        dplyr::mutate(N = lengths(gregexpr(";", data))) |> 
        dplyr::rename(Country = data) |> 
        dplyr::arrange(!!sel2)
}

Here I am using complex code lines. Using {dplyr} programming code in functions needs some special consideration. I have learned the details from “Bang Bang – How to program with dplyr” (Berroth 2019).

2.1.3 WHR 2020 with class_scheme() function

As the class_scheme() function is now in place, I can display with this function the different grouping schemes. At first I will try it out with the WHR data from the 2021 report:

R Code 2.3 : Classification of the WHR data

Code
df_whr <-  base::readRDS(
    paste0(here::here(), "/data/whr/raw/whr_raw_2021.rds"))
(
    whr_class <- class_scheme(
            df = df_whr,
            sel1 = rlang::quo(`Country name`),
            sel2 = rlang::quo(`Regional indicator`)
            )
)

It worked! I got the same result as in Listing / Output 2.1.

2.2 Official classifications

There are already different classification systems in place: International organizations (e.g., World Bank, United Nations) have developed them with several grouping variants.

I will look into these two official classifications schemes of World Bank and United Nations and apply the following procedure:

Procedure 2.1 : Understand structure and content of the official classifications schemata

  1. Create a directory for storing the different country classification files (see Section 2.2.1).
  2. Download classification files and store them for faster access as R objects with rds format (see Section 2.2.2.1 and Section 2.2.2.2).
  3. Inspect the data classification files of World Bank (Section 2.2.3.1) and of the United Nations (Section 2.2.3.2) in detail.

2.2.1 Create data directories

R Code 2.4 : Create folders for country classification files

Code
my_create_folder(base::paste0(here::here(), "/data/"))
my_create_folder(base::paste0(here::here(), "/data/country-class"))
my_create_folder(base::paste0(here::here(), "/data/country-class/wb"))
my_create_folder(base::paste0(here::here(), "/data/country-class/unsd"))
my_create_folder(base::paste0(here::here(), "/data/country-class/wb/excel"))
my_create_folder(base::paste0(here::here(), "/data/country-class/wb/csv"))
my_create_folder(base::paste0(here::here(), "/data/country-class/wb/rds"))
my_create_folder(base::paste0(here::here(), "/data/country-class/unsd/excel"))
my_create_folder(base::paste0(here::here(), "/data/country-class/unsd/csv"))
my_create_folder(base::paste0(here::here(), "/data/country-class/unsd/rds"))
(For this R code chunk is no output available)

2.2.2 Download classification files

2.2.2.1 World Bank

The World Bank Classification can be downloaded from How does the World Bank classify countries?. Near the bottom of the page you can see the line “Download an Excel file of historical classifications by income.”, providing a link with the word “Download”. The downloaded file CLASS.xlsx does not contain a historical classification by income but the general classification system of the last available year (2023).

Yes, there is another Excel file OGHIST.xslx with the historical cutoffs for incomes and lending categories, dating from 1987 to 2023. But the download link for this file is located at another web page: World Bank Country and Lending Groups. On this page you will also find the updates for the cutoffs for countries GNI income per capita which is important for the lending eligibility of countries. World Bank country classifications by income level for 2024-2025 has the current updated values and changes over the last year.

The file CLASS.xlsx I am interested here consists of three sheets.

    1. “List of Economies”
    1. “compositions” and
    1. “Notes”

I will download the original Excel file with all it sheets and save programmatically

  • the Excel file with all its sheet
  • CSV snapshots of all sheets (file extension = .csv) and
  • R objects of all sheets (file extension = .rds)

The CSV snapshots support reproducibility because it stores the proprietary Excel file in in a tool-agnostic, future-proof format. I am using code inspired by the vignette/article readxl Workflows

R Code 2.5 : Download the World Bank CLASS Excel file

Run this code chunk manually if the file still needs to be downloaded.
Code
url_excel = "https://datacatalogfiles.worldbank.org/ddh-published/0037712/DR0090755/CLASS.xlsx"
path_wb_excel <- base::paste0(here::here(), 
            "/data/country-class/wb/excel/wb-class.xlsx")
path_wb_csv <- base::paste0(here::here(), 
            "/data/country-class/wb/csv/")
path_wb_rds <- base::paste0(here::here(), 
            "/data/country-class/wb/rds/")

## download wb-class file ##############
downloader::download(
    url = url_excel,
    destfile = path_wb_excel
)

## from readxl workflow article ##############
## includes also my_excel_as_csv_and_rds() function 
path_wb_excel |> 
  readxl::excel_sheets()  |> 
  rlang::set_names()  |>  
  purrr::map(my_excel_as_csv_and_rds, 
             path_excel = path_wb_excel, 
             path_csv = path_wb_csv,
             path_rds = path_wb_rds
             ) 

2.2.2.2 UNSD-M49

Another more detailed classification system expressively developed for statistical purposes is developed by the United Nations Statistics Division UNSD using the M49 methodology.

The result is called Standard country or area codes for statistical use (M49) and can be downloaded manually in different languages and formats (Copy into the clipboard, Excel or CSV from the Overview page. On the page “Overview” is no URL for an R script available, because triggering one of the buttons copies or downloads the data with the help of Javascript. So I had to download the file manually or to find another location where I could download it programmatically.

I found with the OMNIKA DataStore United Nations M49 Region Codes an external source for the UNSD-M49 country classification. For security reason I checked the two files with base::all.equal() to determine if those two files are identical. Yes, they are!

The UNSD M40 standard area codes are stored as Excel and CSV files. I download for reproducibility reason the CSV file.

R Code 2.6 : Download the UNSD-M49 CSV file and create an R object (“.rds”)

Run this code chunk manually if the file still needs to be downloaded.
Code
## download unsd-m49 file ############
url_unsd_csv <- "https://github.com/omnika-datastore/unsd-m49-standard-area-codes/raw/refs/heads/main/2022-09-24__CSV_UNSD_M49.csv"
path_unsd_csv <- base::paste0(here::here(), 
     "/data/country-class/unsd/csv/2022-09-24__CSV_UNSD_M49.csv")

downloader::download(
    url = url_unsd_csv,
    destfile = path_unsd_csv
)


## create R object ###############
unsd_class <- 
  readr::read_delim(
    file = path_unsd_csv,
    delim = ";"
  )


## save as .rds file ################
my_save_data_file(
  "country-class/unsd/rds", 
  unsd_class, 
  "unsd_class.rds")
(For this R code chunk is no output available)

2.2.3 Inspect classification files

To get an detailed understanding of the data structures I will provide the following outputs:

  1. A summary statistics with skimr::skim() followed by inspection of the first data with dplyr::glimpse().
  2. Several detailed outputs of the classifications categories (regions) and their elements (countries) in different code chunks (tabs).

2.2.3.1 World Bank

Code Collection 2.1 : Inspect the structure of the World Bank classification

R Code 2.7 : Inspect sheet List of Economies of the World Bank classification file

Code
wb_class_economies <- base::readRDS(
  "data/country-class/wb/rds/wb-class-List of economies.rds")
glue::glue("******************* Using skimr::skim() ***************************")
skimr::skim(wb_class_economies)
glue::glue("")
glue::glue("****************** Using dplyr::glimpse() *************************")
dplyr::glimpse(wb_class_economies)
#> ******************* Using skimr::skim() ***************************
Data summary
Name wb_class_economies
Number of rows 267
Number of columns 5
_______________________
Column type frequency:
character 5
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Economy 1 1.00 4 50 0 266 0
Code 1 1.00 3 3 0 266 0
Region 49 0.82 10 26 0 7 0
Income group 50 0.81 10 19 0 4 0
Lending category 122 0.54 3 5 0 3 0
#> 
#> ****************** Using dplyr::glimpse() *************************
#> Rows: 267
#> Columns: 5
#> $ Economy            <chr> "Afghanistan", "Albania", "Algeria", "American Samo…
#> $ Code               <chr> "AFG", "ALB", "DZA", "ASM", "AND", "AGO", "ATG", "A…
#> $ Region             <chr> "South Asia", "Europe & Central Asia", "Middle East…
#> $ `Income group`     <chr> "Low income", "Upper middle income", "Upper middle …
#> $ `Lending category` <chr> "IDA", "IBRD", "IBRD", NA, NA, "IBRD", "IBRD", "IBR…

R Code 2.8 : Inspect sheet compositions of the World Bank classification file

Code
wb_class_compositions <- base::readRDS(
  "data/country-class/wb/rds/wb-class-compositions.rds")
glue::glue("******************* Using skimr::skim() ***************************")
skimr::skim(wb_class_compositions)
glue::glue("")
glue::glue("****************** Using dplyr::glimpse() *************************")
dplyr::glimpse(wb_class_compositions)
#> ******************* Using skimr::skim() ***************************
Data summary
Name wb_class_compositions
Number of rows 2085
Number of columns 4
_______________________
Column type frequency:
character 4
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
WB_Group_Code 0 1 3 3 0 48 0
WB_Group_Name 0 1 5 50 0 48 0
WB_Country_Code 0 1 3 3 0 218 0
WB_Country_Name 0 1 4 30 0 218 0
#> 
#> ****************** Using dplyr::glimpse() *************************
#> Rows: 2,085
#> Columns: 4
#> $ WB_Group_Code   <chr> "AFE", "AFE", "AFE", "AFE", "AFE", "AFE", "AFE", "AFE"…
#> $ WB_Group_Name   <chr> "Africa Eastern and Southern", "Africa Eastern and Sou…
#> $ WB_Country_Code <chr> "AGO", "BWA", "BDI", "COM", "COD", "ERI", "SWZ", "ETH"…
#> $ WB_Country_Name <chr> "Angola", "Botswana", "Burundi", "Comoros", "Congo, De…

R Code 2.9 : Pre-defined standard categorization

Code
df_wb_standard <-  base::readRDS(
  "data/country-class/wb/rds/wb-class-List of economies.rds") |> 
    dplyr::slice(1:218)



(
    wb_class_standard <- class_scheme(
            df = df_wb_standard,
            sel1 = rlang::quo(`Economy`),
            sel2 = rlang::quo(`Region`)
            )
)

Region is a coarse classification scheme with only 7 regions formed by 218 countries.

R Code 2.10 : All provided groups, regional, economical and political

Code
df_wb_all <-  base::readRDS(
  "data/country-class/wb/rds/wb-class-compositions.rds")


(
    wb_class_all <- class_scheme(
            df = df_wb_all,
            sel1 = rlang::quo(`WB_Country_Name`),
            sel2 = rlang::quo(`WB_Group_Name`)
            )
)

WB_Group_Name in the “compositions” file contains all available groups. They are not restricted to regional groups because they are formed by economical and political criteria as well. There is no 1:1 match, because almost all countries belong to two or more groups. There are 48 groups with a total of 2085 elements.

R Code 2.11 : Groups formed by regional criteria (without the redundant World region)

Code
str_reg <- c("AFE", "AFW", "ARB", "CSS", "CEB",
             "EAS", "ECS", "LCN", "MEA", "NAC",
             "OSS", "PSS", "SST", "SAS", "SSF")

df_wb_reg <-  base::readRDS(
  "data/country-class/wb/rds/wb-class-compositions.rds") |> 
  dplyr::filter(WB_Group_Code %in% str_reg)

(
    wb_class_reg <- class_scheme(
            df = df_wb_reg,
            sel1 = rlang::quo(`WB_Country_Name`),
            sel2 = rlang::quo(`WB_Group_Name`)
            )
)

Browsing through the composition data I have defined 15 WB_GROUP_CODEs as regional codes. These regional classification criteria results per definition to 15 regions containing 379 countries.

R Code 2.12 : Groups formed by regional criteria (without the redundant World region)

Code
str_reg2 <- c("AFE", "AFW", "ARB", "CEB",
             "EAS", "ECS", "LCN", "MEA", "NAC",
             "SAS", "SSF")

df_wb_reg2 <-  base::readRDS(
  "data/country-class/wb/rds/wb-class-compositions.rds") |> 
  dplyr::filter(WB_Group_Code %in% str_reg2)

(
    wb_class_reg2 <- class_scheme(
            df = df_wb_reg2,
            sel1 = rlang::quo(`WB_Country_Name`),
            sel2 = rlang::quo(`WB_Group_Name`)
            )
)

Browsing through the composition data I have declassified all small states for an alternative regional group. These regional classification criteria results to 11 regions containing 299 countries.

2.2.3.1.1 Description of the World Bank tabs
  1. WB economies displays the “List of Economies” and has five columns:
    • Economy with the country names (2-219) and regional names (221-268)
    • Code with the ISO alpha3 codes for countries (2-219) and for the regional names (221-268)
    • Region with seven different regional names:
      • East Asia and Pacific,
      • Europe and Central Asia,
      • Latin America & the Caribbean,
      • Middle East and North Africa,
      • North America
      • South Asia and
      • Sub-Saharan Africa
    • Income group with four groups: Low income, Lower middle income, Higher middle income, and High income.
    • Lending category with three groups: IBRD, Blend, and IDA.
  2. WB compositions has four columns: WB_Group_Code, WB_Group_Name, WB_Country_Code, WB_Country_Name. The 2084 rows are combinations of the regional and income group with their ISO alpha 3 codes and country names. (This is a more complex arrangement that I will analyse more in detail later.)
  3. WB Standard shows the World Bank seven standard regional groups with their countries. The 218 countries involved in the taxonomy of the World Bank consists of all member countries of the World Bank (189) and other economies with populations of more than 30,000 (29).
  4. WB All includes the seven regions from the “WB Standard” tab but much more. But it is important to note that there is no alternative regional structure that comprises systematically all countries of the world — the overall category “World” obviously excluded.
    • Five of the seven regional groups of “WB Standard” are also clustered without high income countries.
    • There are six other regional subcategories: “Arab World”, “Caribbean small states”, “Central Europe and Baltics”, “Other small states”, “Pacific island small states”, “Small states”.
    • Additionally there are some political groups like European Union, OECD and
    • several economical classification like “Euro area”,
    • different combinations of the four income groups and different combinations of the three lending statuses.
  5. Region 1 includes WB_Group_Code from the composition data and results into 15 regions containing 379 countries.
  6. Region 2 includes again the WB_Group_Code from the composition data but has excluded all small states for an alternative regional group. These regional classification criteria results to 11 regions containing 299 countries.

More details

The cut off limits for the income groups are: (see: World Bank Country and Lending Groups)

  • low income, $1,145 or less;
  • lower middle income, $1,146 to $4,515;
  • upper middle income, $4,516 to $14,005; and
  • high income, more than $14,005.

The effective operational cutoff for IDA eligibility is $1,335 or less. The three lending categories and their relation to each other are: (From the Notes sheet.)

IDA countries are those that lack the financial ability to borrow from IBRD. IDA credits are deeply concessional—interest-free loans and grants for programs aimed at boosting economic growth and improving living conditions. IBRD loans are non-concessional. Blend countries are eligible for IDA credits because of their low per capita incomes but are also eligible for IBRD because they are financially creditworthy.

Three additional remark relating to the Notes sheet:

  1. In the Notes I found the sentence: “Geographic classifications in this table cover all income levels.” But there is a difference of one missing data value more in the Income group column compared with the Region column (50:49). The reason is that Venezuela RB is lacking an income group because it has been temporarily unclassified since July 2021 pending release of revised national accounts statistics. Venezuela, RB was classified as an upper-middle income country until FY21, has been unclassified since then due to the unavailability of data. But it is now again classified as Upper middle income (See the World Bank page about Venezuela, RB).

  2. The term country, used interchangeably with economy, does not imply political independence but refers to any territory for which authorities report separate social or economic statistics.

  3. What follows is a quote about some details of the income classifications for the 2023 file:

Set on 1 July 2022 remain in effect until 1 July 2023. Venezuela has been temporarily unclassified since July 2021 pending release of revised national accounts statistics. Argentina, which was temporarily unclassified in July 2016 pending release of revised national accounts statistics, was classified as upper middle income for FY17 as of 29 September 2016 based on alternative conversion factors. Also effective 29 September 2016, Syrian Arab Republic is reclassified from IBRD lending category to IDA-only. On 29 March 2017, new country codes were introduced to align World Bank 3-letter codes with ISO 3-letter codes: Andorra (AND), Dem. Rep. Congo (COD), Isle of Man (IMN), Kosovo (XKX), Romania (ROU), Timor-Leste (TLS), and West Bank and Gaza (PSE). It is to be noted that Venezuela, RB classified as an upper-middle income country until FY21, has been unclassified since then due to the unavailability of data.

2.2.3.1.2 Summary

The only missing data in the columns Economy and Code corresponds to the empty line #220 that separates the country codes from the regional codes. The missing data in the other columns stem from the different structure of the second part (starting with row #221) of the data, which consists only of the two columns ‘Economy’ and ‘Code’.

Essentially this means that we have in the wb-class.xlsx file two different data sets: One for economies and the other one to explicate regional, economical and political grouping codes. In the Excel sheet compositions you will find an extended list of all available group names and their three letter codes combined with the country names and their three letter codes. These group names comprise different kinds of regional groups but also names and codes for different combination of country incomes and lending categories.

All these groups may be of interests for analysis of different trends.

The World Bank file wb-class.xlsx classifies all World Bank member countries (189), and all other economies with populations of more than 30,000 (29) in a coarse grid of only seven regions. For operational and analytical purposes, these economies are divided among income groups according to their gross national income (GNI) per capita in 2023, calculated using the World Bank Atlas method.

2.2.3.2 United Nations

Code Collection 2.2 : Inspect UNSD-M49 geoscheme classification

R Code 2.13 : Inspect UNSD M49 geoscheme classification

Code
unsd_class <- base::readRDS(
  "data/country-class/unsd/rds/unsd_class.rds")
glue::glue("******************* Using skimr::skim() ***************************")
skimr::skim(unsd_class)
glue::glue("")
glue::glue("****************** Using dplyr::glimpse() *************************")
dplyr::glimpse(unsd_class)
#> ******************* Using skimr::skim() ***************************
Data summary
Name unsd_class
Number of rows 249
Number of columns 15
_______________________
Column type frequency:
character 15
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Global Code 0 1.00 3 3 0 1 0
Global Name 0 1.00 5 5 0 1 0
Region Code 1 1.00 3 3 0 5 0
Region Name 1 1.00 4 8 0 5 0
Sub-region Code 1 1.00 3 3 0 17 0
Sub-region Name 1 1.00 9 31 0 17 0
Intermediate Region Code 141 0.43 3 3 0 8 0
Intermediate Region Name 141 0.43 9 15 0 8 0
Country or Area 0 1.00 4 52 0 249 0
M49 Code 0 1.00 3 3 0 249 0
ISO-alpha2 Code 2 0.99 2 2 0 247 0
ISO-alpha3 Code 1 1.00 3 3 0 248 0
Least Developed Countries (LDC) 203 0.18 1 1 0 1 0
Land Locked Developing Countries (LLDC) 217 0.13 1 1 0 1 0
Small Island Developing States (SIDS) 196 0.21 1 1 0 1 0
#> 
#> ****************** Using dplyr::glimpse() *************************
#> Rows: 249
#> Columns: 15
#> $ `Global Code`                             <chr> "001", "001", "001", "001", …
#> $ `Global Name`                             <chr> "World", "World", "World", "…
#> $ `Region Code`                             <chr> "002", "002", "002", "002", …
#> $ `Region Name`                             <chr> "Africa", "Africa", "Africa"…
#> $ `Sub-region Code`                         <chr> "015", "015", "015", "015", …
#> $ `Sub-region Name`                         <chr> "Northern Africa", "Northern…
#> $ `Intermediate Region Code`                <chr> NA, NA, NA, NA, NA, NA, NA, …
#> $ `Intermediate Region Name`                <chr> NA, NA, NA, NA, NA, NA, NA, …
#> $ `Country or Area`                         <chr> "Algeria", "Egypt", "Libya",…
#> $ `M49 Code`                                <chr> "012", "818", "434", "504", …
#> $ `ISO-alpha2 Code`                         <chr> "DZ", "EG", "LY", "MA", "SD"…
#> $ `ISO-alpha3 Code`                         <chr> "DZA", "EGY", "LBY", "MAR", …
#> $ `Least Developed Countries (LDC)`         <chr> NA, NA, NA, NA, "x", NA, NA,…
#> $ `Land Locked Developing Countries (LLDC)` <chr> NA, NA, NA, NA, NA, NA, NA, …
#> $ `Small Island Developing States (SIDS)`   <chr> NA, NA, NA, NA, NA, NA, NA, …

R Code 2.14 : Clean UNSD M49 geoscheme classification

Listing / Output 2.3: Script for data cleaning of the unsd_class.rds file as explained in Procedure 2.2
Code
## column renaming vector ########
m49_cols = c(
  region_c = "Region Code", region_n = "Region Name",
  subr_c = "Sub-region Code", subr_n = "Sub-region Name", 
  midr_c = "Intermediate Region Code", midr_n = "Intermediate Region Name",
  country = "Country or Area", m49 = "M49 Code", 
  iso2 = "ISO-alpha2 Code", iso3 = "ISO-alpha3 Code",
  ldc = "Least Developed Countries (LDC)", 
  lldc = "Land Locked Developing Countries (LLDC)", 
  sids = "Small Island Developing States (SIDS)"
  )
  
## clean data ###############################
unsd_class <- base::readRDS(
  "data/country-class/unsd/rds/unsd_class.rds")
unsd_class_clean <- unsd_class |> 
  dplyr::select(-(1:2)) |> 
  dplyr::rename(tidyselect::all_of(m49_cols)) |> 
  dplyr::filter(country != "Antarctica") |> 
  dplyr::mutate(iso2 = base::ifelse(country == "Namibia", "NA", iso2)) |> 
  dplyr::relocate(country, .before = region_c) |> 
  # set ldc, lldc and sids to 0 and 1
  # .x = anonymous function; "x" = value in cols of unsd_class
  dplyr::mutate(dplyr::across(
    ldc:sids, ~ dplyr::if_else(.x == "x", "1", "999", "0") 
    )) |> 
  dplyr::arrange(country)

## save new tibble ##########
my_save_data_file(
  "country-class/unsd/rds",
  unsd_class_clean,
  "unsd_class_clean.rds"
)


## prepare skimmers ##########
my_skim <- skimr::skim_with(
  character = skimr::sfl(
    whitespace = NULL,
    min = NULL,
    max = NULL,
    empty = NULL
    )
)


## display results ##########
unsd_class <- base::readRDS(
  "data/country-class/unsd/rds/unsd_class.rds")
glue::glue("******************* Using skimr::skim() ***************************")
my_skim(unsd_class_clean) |> dplyr::select(-complete_rate)
glue::glue("")
glue::glue("****************** Using dplyr::glimpse() *************************")
dplyr::glimpse(unsd_class_clean)
#> ******************* Using skimr::skim() ***************************
Data summary
Name unsd_class_clean
Number of rows 248
Number of columns 13
_______________________
Column type frequency:
character 13
________________________
Group variables None

Variable type: character

skim_variable n_missing n_unique
country 0 248
region_c 0 5
region_n 0 5
subr_c 0 17
subr_n 0 17
midr_c 140 8
midr_n 140 8
m49 0 248
iso2 1 247
iso3 1 247
ldc 0 2
lldc 0 2
sids 0 2
#> 
#> ****************** Using dplyr::glimpse() *************************
#> Rows: 248
#> Columns: 13
#> $ country  <chr> "Afghanistan", "Albania", "Algeria", "American Samoa", "Andor…
#> $ region_c <chr> "142", "150", "002", "009", "150", "002", "019", "019", "019"…
#> $ region_n <chr> "Asia", "Europe", "Africa", "Oceania", "Europe", "Africa", "A…
#> $ subr_c   <chr> "034", "039", "015", "061", "039", "202", "419", "419", "419"…
#> $ subr_n   <chr> "Southern Asia", "Southern Europe", "Northern Africa", "Polyn…
#> $ midr_c   <chr> NA, NA, NA, NA, NA, "017", "029", "029", "005", NA, "029", NA…
#> $ midr_n   <chr> NA, NA, NA, NA, NA, "Middle Africa", "Caribbean", "Caribbean"…
#> $ m49      <chr> "004", "008", "012", "016", "020", "024", "660", "028", "032"…
#> $ iso2     <chr> "AF", "AL", "DZ", "AS", "AD", "AO", "AI", "AG", "AR", "AM", "…
#> $ iso3     <chr> "AFG", "ALB", "DZA", "ASM", "AND", "AGO", "AIA", "ATG", "ARG"…
#> $ ldc      <chr> "1", "0", "0", "0", "0", "1", "0", "0", "0", "0", "0", "0", "…
#> $ lldc     <chr> "1", "0", "0", "0", "0", "0", "0", "0", "0", "1", "0", "0", "…
#> $ sids     <chr> "0", "0", "0", "1", "0", "0", "1", "1", "0", "0", "1", "0", "…

R Code 2.15 : Display regions of UNSD class scheme

Code
df_unsd <-  base::readRDS(
  "data/country-class/unsd/rds/unsd_class_clean.rds")

(
    unsd_class1 <- class_scheme(
            df = df_unsd,
            sel1 = rlang::quo(`country`),
            sel2 = rlang::quo(`region_n`)
            )
)

region_n is a classification scheme for continents with 248 countries in 5 regions.

R Code 2.16 : Display sub-regions of UNSD class scheme

Code
df_unsd <-  base::readRDS(
  "data/country-class/unsd/rds/unsd_class_clean.rds")

(
    unsd_class2 <- class_scheme(
            df = df_unsd,
            sel1 = rlang::quo(`country`),
            sel2 = rlang::quo(`subr_n`)
            )
)

subr_n is a classification scheme with 248 countries in 17 regions.

R Code 2.17 : Display intermediate regions of UNSD class scheme

Code
df_unsd <-  base::readRDS(
  "data/country-class/unsd/rds/unsd_class_clean.rds")

(
    dt_unsd_class3 <- class_scheme(
            df = df_unsd,
            sel1 = rlang::quo(`country`),
            sel2 = rlang::quo(`midr_n`)
            )
)

midr_n is an inconsistent classification scheme with 248 countries in 9 regions. It has a focus on South- / Central America and Africa. All the other countries are together in a big mixed groups with 140 countries.

R Code 2.18 : Display alternative intermediate regions of UNSD class scheme

Code
unsd_class_clean <-  base::readRDS(
  "data/country-class/unsd/rds/unsd_class_clean.rds")

# create interemdiate2 (more consistent than intermediate)
unsd_class4 <- unsd_class_clean |> 
  dplyr::mutate(midr_n2 = 
         base::ifelse(is.na(midr_n), subr_n, midr_n)
         ) |> 
  dplyr::mutate(midr_c2 = 
         base::ifelse(is.na(midr_c), subr_c, midr_c)
         )


## save intermediate2 data as .rds file ################
my_save_data_file(
  "country-class/unsd/rds", 
  unsd_class4, 
  "unsd_class_finale.rds")


(
    dt_unsd_class4 <- class_scheme(
            df = unsd_class4,
            sel1 = rlang::quo(`country`),
            sel2 = rlang::quo(`midr_n2`)
            )
) 

midr_n2 is a classification scheme with 248 countries in 23 regions. This grouping divides each continent into different regions and has more plausibility than the midr_n classification in tab “Intermediate”.

2.2.3.2.1 Descriptions of the UNSD-M49 geoscheme classification

What follows is a description if the tabs in Code Collection 2.2.

Tab “raw”: The raw file unsd_class has 15 columns as you can also see online from the Overview page. The many missing values (NAs) for the categories LDC, LLDC and SIDS are easy explained: These three columns are coded with an ‘x’ if the country of this row belong to this category.

One of the missing value for ISO-alpha2 codes belongs to Namibia because its abbreviation NA is interpreted by R as a missing value!

The other missing values for ISO-alpha2 and ISO-alpha3 is related to Sark, which is “recognized by the United Nations Statistics Division (UNSD) as a separate territory” but was not accepted by ISO now for more than 20 years (McCarthy 2020). Recently a new application (see PDF) will change that but currently Sark is still waiting for ISO 3166 codes.

Tab “clean”: Recoding columns “LDC”, “LLDC” and “SIDS” with 1 and 0 (1 = yes, belongs to this category, 0 = no, does not belong to this category) reduce most of their missing values. I have also recoded “Namibia” to repair their “NA” value.

Tab “Region”, “Sub-Region” and “Intermediate Region”: One missing value in these regional categories is related to Antarctica which is not seen by the M49 scheme as a separated region. It has therefore no regional codes and names with the exception of the overall comprising global region. But it has M49 as well ISO-alpha codes.

Procedure 2.2 : Cleaning the UNSD M49 data file

To clean the data I have taken the following recoding actions in the script for the “clean” tab in

  • Remove the global codes and names because they a redundant: All rows have global code “001” (“World”).
  • Rename the columns to get shorter names.
  • Remove Antarctica because it is not seen as separate country.
  • Replace NA in the column ISO-alpha2 Code” of Namibia with the string “NA”.
  • Recode the columns LDC, LLDC and SIDS with 0 and 1.
  • Relocate the column “country” (previously “Country or Area”) to the first column because than it easier to find some relevant content
  • Sort the data alphabetically by “country”.
2.2.3.2.2 Summary

The UN geoscheme classification knows 248 countries divided the world into

  • 5 continents (Africa, Americas, Asia, Europe and Oceania): Tab Region
  • 17 sub-regions Tab Sub-region
  • 9 intermediate regions focussing on South/Central America and Africa, otherwise not consistent Tab Intermediate
  • 23 areas divides the continents into different parts Tab Intermediate2

2.3 Combining M49 and WHR

To use the different region with the WHR data I will combine the M49 classification data scheme with the WHR data. Ideally I would combine the two datasets with their unique ISO3 code, but unfortunately the WHR data are lacking these data. Thew next best approach is to link the two datasets via the country names. But here I have to take into account that the authoritative country names are provided by M49 and not by the WHR data.

2.3.1 Combine with country names

Code Collection 2.3 : Combine WHR data with M49 geoscheme

R Code 2.19 : Combine WHR with UNSD M49 data

Code
whr_2011_2024 <-  base::readRDS(
  "data/whr-cantril/rds/whr_2011_2024_arrange.rds")

unsd_class_finale <-  base::readRDS(
  "data/country-class/unsd/rds/unsd_class_finale.rds")

## left join: WHR as priority ########
whr_m49_first_try <- dplyr::left_join(
  x = whr_2011_2024,
  y = unsd_class_finale,
  by = dplyr::join_by(`Country name` == country),
  relationship = "many-to-one") |> 
  dplyr::relocate(iso3, .after = `Country name`)

## select columns and show distinct countries
 whr_m49_first_try |> 
    dplyr::select(
      `Country name`, iso2, iso3, m49
    ) |> 
    dplyr::distinct() |> 
    DT::datatable(class = 'cell-border compact stripe', 
                  options = list(
                    pageLength = 25,
                    lengthMenu = c(25, 50, 100, 200)
                    )
              )

R Code 2.20 : Show WHR countries with missing ISO3 codes

Code
whr_m49_first_try |> 
  dplyr::select(
      `Country name`, iso2, iso3, m49
    ) |> 
    dplyr::distinct() |> 
  dplyr::filter(base::is.na(iso3)) |> 
  DT::datatable(class = 'cell-border compact stripe', 
                  options = list(
                    pageLength = 25,
                    lengthMenu = c(5, 10, 15, 25)
                    )
              )

2.3.2 Differences WHR and M49

Combining the two datasets of M49 classification WHR leaves 16 country names in WHR empty because they do not match. The reasons are two fold:

  1. Many names do not match because of different names even both dataset mean the same countries.
  2. Some country names of the WHR data do not correspond to the official country classification (M49).

The following table details the differences between the country names in the WHR and UNSD M49 datasets.

Table 2.1: Differences between WHR and UNSD M49
Country name WHR Country name M49 Comment
Bolivia Bolivia (Plurinational State of)
DR Congo Democratic Republic of the Congo
Hong Kong SAR of China China, Hong Kong Special Administrative Region
Iran Iran (Islamic Republic of)
Kosovo Not every country recognizes Kosovo (prov. code XKX)
Lao PDR Lao People’s Democratic Republic
Macedonia North Macedonia Renamed 2019, only wrong for the year 2019
North Cyprus Recognized only by Türkiye
Somaliland Region Not recognized, self-declared independence from Somalia
Swaziland Eswatini Renamed 2018
Syria Syrian Arab Republic
Taiwan Province of China China – as one of the world’s largest and most influential countries – considers Taiwan as part of its territory
Tanzania United Republic of Tanzania
United Kingdom United Kingdom of Great Britain and Northern Ireland
United States United States of America
Venezuela Venezuela (Bolivarian Republic of)

2.3.3 WHR with revised country name

I change the WHR country designations to their official m49 country names. The exception are Kosovo, North Cyprus, Somaliland and Taiwan where no ISO codes exist. These four names are set in angle brackets “[]” and remain without ISO and M49 codes.

R Code 2.21 : WHR data with official country names

Code
whr_2011_2024_arrange <-  base::readRDS(
  "data/whr-cantril/rds/whr_2011_2024_arrange.rds")

## revise WHR data with m49 names ################
whr_2011_2024_revised <-  whr_2011_2024_arrange |> 
  dplyr::rename(Country_WHR = `Country name`) |> 
  dplyr::mutate(Country_M49 =
    dplyr::case_when(
      Country_WHR == "Bolivia" ~ "Bolivia (Plurinational State of)",
      Country_WHR == "DR Congo" ~ "Democratic Republic of the Congo",
      Country_WHR == "Hong Kong SAR of China" ~ "China, Hong Kong Special Administrative Region",
      Country_WHR == "Iran" ~ "Iran (Islamic Republic of)",
      Country_WHR == "Kosovo" ~ "[Kosovo]",
      Country_WHR == "Lao PDR" ~ "Lao People's Democratic Republic",
      Country_WHR == "Macedonia" ~ "North Macedonia",
      Country_WHR == "North Cyprus" ~ "[North Cyprus]",
      Country_WHR == "Somaliland Region" ~ "[Somaliland Region]",
      Country_WHR == "Swaziland" ~ "Eswatini",
      Country_WHR == "Syria" ~ "Syrian Arab Republic",
      Country_WHR == "Taiwan Province of China" ~ "[Taiwan Province of China]",
      Country_WHR == "Tanzania" ~ "United Republic of Tanzania",
      Country_WHR == "United Kingdom" ~ "United Kingdom of Great Britain and Northern Ireland",
      Country_WHR == "United States" ~ "United States of America",
      Country_WHR == "Venezuela" ~ "Venezuela (Bolivarian Republic of)",
      TRUE ~ Country_WHR
    )
  ) |> 
  dplyr::relocate(Country_M49, .after = Country_WHR) |> 

  ## special case: change Macedonia in Country_WHR to "North Macedonia"
  dplyr::mutate(Country_WHR =
        stringr::str_replace(Country_WHR, "^Macedonia$", "North Macedonia")) |> 
  dplyr::arrange(Country_WHR, Year)

## select unique country rows  ################
whr_2011_2024_revised |> 
  dplyr::select(Country_WHR, Country_M49) |> 
  dplyr::distinct() |> 
  DT::datatable(class = 'cell-border compact stripe', 
                  options = list(
                    pageLength = 25,
                    lengthMenu = c(25, 50, 100, 200)
                    )
              )

## save revised data as .rds file ################
my_save_data_file(
  "whr-cantril/rds", 
  whr_2011_2024_revised, 
  "whr_2011_2024_revised.rds")

I have saved the WHR data file as whr_2011_2024_revised.rds under “data/whr_cantril/rds” with the new column Country_M49 and have changed the old WHR column Country name to Country_WHR.

2.3.4 Combine WHR and M49 revised

I am now in the position to combine the revised WHR file with the UNSD M49 classification file. I will show only the interesting columns but save the full new dataframe as whr_m49 under the folder “data/whr_cantril/rds”.

R Code 2.22 : Combine WHR and M49 with revised country column

Code
whr_2011_2024_revised <-  base::readRDS(
  "data/whr-cantril/rds/whr_2011_2024_revised.rds")

unsd_class_finale <-  base::readRDS(
  "data/country-class/unsd/rds/unsd_class_finale.rds")

## left join: WHR as priority ########
whr_m49 <- dplyr::left_join(
  x = whr_2011_2024_revised,
  y = unsd_class_finale,
  by = dplyr::join_by(Country_M49 == country),
      relationship = "many-to-one") 

whr_m49 |> 
  dplyr::select(
    Country_WHR, 
    Country_M49,
    m49, iso2, iso3
    ) |> 
  dplyr::distinct() |> 
  DT::datatable(class = 'cell-border compact stripe', 
                  options = list(
                    pageLength = 25,
                    lengthMenu = c(25, 50, 100, 200)
                    )
              )

## save revised data as .rds file ################
my_save_data_file(
  "whr-cantril/rds", 
  whr_m49, 
  "whr_m49.rds")

2.4 Combine whr_m49 with World Bank data

There are two different sheets to combine with the whr_m49 dataset: The wb-class-List of economies.rds and the wb-class-compositions.rds.

2.4.1 Combine whr_m49 with World Bank economy regions

R Code 2.23 : Combine whr_m49 with the World Bank economy Region

Code
whr_m49 <-  base::readRDS(
  "data/whr-cantril/rds/whr_m49.rds")

wb_economies <-  base::readRDS(
  "data/country-class/wb/rds/wb-class-List of economies.rds") |> 
  dplyr::select(-1)

## left join: WHR-M49 as priority ########
whr_m49_wb_regions <- dplyr::left_join(
  x = whr_m49,
  y = wb_economies,
  by = dplyr::join_by(iso3 == Code),
      relationship = "many-to-one") 

whr_m49_regions <- whr_m49_wb_regions |> 
  dplyr::select(Country_M49, Region) |> 
  dplyr::distinct()

## show whr_m49_wb_regions #########
(
    dt_whr_m49_wb_regions <- class_scheme(
            df = whr_m49_regions,
            sel1 = rlang::quo(`Country_M49`),
            sel2 = rlang::quo(`Region`)
            )
)

## save whr_m49_wb_reegions as .rds file ################
my_save_data_file(
  "whr-cantril/rds", 
  whr_m49_wb_regions, 
  "whr_m49_wb_regions.rds")

The World Bank regional classification results to 8 regions containing 167 countries.

Note the last group with names not available in the m49 classification scheme. These four regions do not belong to any regions automatically but could classified manually:

  • Kosovo and North Cyprus: Europe & Central Asia
  • Somaliland Region: Sub-Saharan Africa
  • Taiwan Province of China: East Asia & Pacific

2.4.2 Combine whr_m49 with World Bank Group Codes

The second sheet to combine is data from wb-class-compositions.rds. This is more complex as it contains 2085 entries divided in 48 groups. The entries are multiple country codes and country names as each country belong to several groups.

R Code 2.24 : Combine whr_m49 with the World Bank groups

Code
whr_m49_wb_regions <-  base::readRDS(
  "data/whr-cantril/rds/whr_m49_wb_regions.rds")

wb_class_compositions_reduced <-  base::readRDS(
  "data/country-class/wb/rds/wb-class-compositions.rds") |> 
  dplyr::select(-4)

## left join: WHR-M49 as priority ########
  whr_m49_wb <- dplyr::left_join(
    x = whr_m49_wb_regions,
    y = wb_class_compositions_reduced,
    by = dplyr::join_by(iso3 == WB_Country_Code),
        relationship = "many-to-many")

## show whr_m49_wb #########
(
    dt_whr_m49_wb <- class_scheme2(
            df = whr_m49_wb,
            sel1 = rlang::quo(`Country_M49`),
            sel2 = rlang::quo(`WB_Group_Name`)
            )
)

## save whr_m49_wb as .rds file ################
my_save_data_file(
  "whr-cantril/rds", 
  whr_m49_wb, 
  "whr_m49_wb.rds")
#> # A tibble: 48 × 3
#> # Rowwise:  WB_Group_Name
#>    WB_Group_Name                               Country                         N
#>    <chr>                                       <chr>                       <int>
#>  1 Africa Eastern and Southern                 Angola; Angola; Angola; An…   247
#>  2 Africa Western and Central                  Benin; Benin; Benin; Benin…   233
#>  3 Arab World                                  Algeria; Algeria; Algeria;…   234
#>  4 Caribbean small states                      Belize; Belize; Belize; Be…     9
#>  5 Central Europe and the Baltics              Bulgaria; Bulgaria; Bulgar…   143
#>  6 Early-demographic dividend                  Algeria; Algeria; Algeria;…   612
#>  7 East Asia & Pacific                         Australia; Australia; Aust…   207
#>  8 East Asia & Pacific (IDA & IBRD)            Cambodia; Cambodia; Cambod…   129
#>  9 East Asia & Pacific (excluding high income) Cambodia; Cambodia; Cambod…   129
#> 10 Euro area                                   Austria; Austria; Austria;…   260
#> # ℹ 38 more rows

To receive a more concise data frame which a better overview of the available WB group codes I have to limit the WHR data to a specific year. Then I have only one country name and I can use the one-to-many-relationship (Country to WB_Group_Names).

R Code 2.25 : Showing the 2024 WHR data with their World Bank groups

Code
whr_m49_wb_2024 <- whr_m49_wb_regions |> 
  dplyr::filter(Year == 2024)

## join whr_m49_wb with WB group codes #########
whr_m49_wb_groups_2024 <- dplyr::left_join(
    x = whr_m49_wb_2024,
    y = wb_class_compositions_reduced,
    by = dplyr::join_by(iso3 == WB_Country_Code),
        relationship = "one-to-many") |> 
    dplyr::mutate(`WB_Group` = 
                    paste0(WB_Group_Name, " [", WB_Group_Code, "]")
                  )
      

## show whr_m49_wb_2024_groups #########
(
    dt_whr_m49_wb_groups_2024 <- class_scheme(
            df = whr_m49_wb_groups_2024,
            sel1 = rlang::quo(`Country_M49`),
            sel2 = rlang::quo(`WB_Group`)
            )
)


## save whr_m49_wb_groups_2024 as .rds file ################
my_save_data_file(
  "whr-cantril/rds", 
  whr_m49_wb_groups_2024, 
  "whr_m49_wb_groups_2024.rds")

2.4.3 Create final dataset

To facilitate later work I will crate whr_final.rds as the final dataset with the following changes from whr_m49_wb.rds:

  1. rename columns:
    1. using lower letters,
    2. replace spaces with underscores
    3. add number of groups after code and region markers
  2. create a new column group48 that combines World Bank group codes and names
  3. change NA values (created as “NA [NA]” in step 2) to World [WLD]
  4. relocate iso3 as first column of the dataset
  5. factorize all character columns

R Code 2.26 : Rename & reorder column to final (region / grouped) dataset

Code
whr_m49_wb <-  base::readRDS(
  "data/whr-cantril/rds/whr_m49_wb.rds")

# create final whr table #########
whr_final <- whr_m49_wb |> 
  dplyr::rename_with(~ tolower(gsub(" ", "_", .x))) |> 
  dplyr::rename(
    region5 = region_n,
    code5 = region_c,
    region7 = region,
    region17 = subr_n,
    code17 = subr_c,
    region23 = midr_n2,
    code23 = midr_c2,
    group8 = midr_n,
    code8 = midr_c
  ) |> 
  dplyr::mutate(group48 = 
                paste0(wb_group_name, " [", wb_group_code, "]")
                ) |> 
  dplyr::mutate(group48 =
                  dplyr::case_match(
                    group48, 
                    "NA [NA]" ~ "World [WLD]",
                    .default = group48)
                ) |> 
  dplyr::relocate(iso3) |> 
  dplyr::mutate(dplyr::across(
    .cols = dplyr::where(is.character), .fns = base::factor)
    )



## save whr_final as .rds file ################
my_save_data_file(
  "whr-cantril/rds", 
  whr_final, 
  "whr_final.rds")

skimr::skim(whr_final)
Data summary
Name whr_final
Number of rows 19379
Number of columns 34
_______________________
Column type frequency:
factor 22
numeric 12
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
iso3 39 1.00 FALSE 163 COM: 198, MRT: 195, BEN: 182, BFA: 182
country_whr 0 1.00 FALSE 168 Com: 198, Mau: 195, Ben: 182, Bur: 182
country_m49 0 1.00 FALSE 167 Com: 198, Mau: 195, Ben: 182, Bur: 182
code5 39 1.00 FALSE 5 002: 7190, 142: 5140, 150: 4000, 019: 2880
region5 39 1.00 FALSE 5 Afr: 7190, Asi: 5140, Eur: 4000, Ame: 2880
code17 39 1.00 FALSE 14 202: 6384, 419: 2750, 145: 1835, 039: 1378
region17 39 1.00 FALSE 14 Sub: 6384, Lat: 2750, Wes: 1835, Sou: 1378
code8 10245 0.47 FALSE 7 011: 2379, 014: 2354, 005: 1272, 013: 1014
group8 10245 0.47 FALSE 7 Wes: 2379, Eas: 2354, Sou: 1272, Cen: 1014
m49 39 1.00 FALSE 163 174: 198, 478: 195, 120: 182, 148: 182
iso2 39 1.00 FALSE 163 KM: 198, MR: 195, BF: 182, BJ: 182
ldc 39 1.00 FALSE 2 0: 13616, 1: 5724
lldc 39 1.00 FALSE 2 0: 14862, 1: 4478
sids 39 1.00 FALSE 2 0: 18318, 1: 1022
region23 39 1.00 FALSE 19 Wes: 2379, Eas: 2354, Wes: 1835, Sou: 1378
code23 39 1.00 FALSE 19 011: 2379, 014: 2354, 145: 1835, 039: 1378
region7 39 1.00 FALSE 7 Sub: 6429, Eur: 5182, Lat: 2750, Mid: 2203
income_group 130 0.99 FALSE 4 Low: 6152, Upp: 5467, Hig: 4313, Low: 3317
lending_category 3531 0.82 FALSE 3 IBR: 7677, IDA: 6824, Ble: 1347
wb_group_code 39 1.00 FALSE 47 WLD: 1930, IBT: 1385, LMY: 1272, MIC: 1025
wb_group_name 39 1.00 FALSE 47 Wor: 1930, IDA: 1385, Low: 1272, Mid: 1025
group48 0 1.00 FALSE 47 Wor: 1969, IDA: 1385, Low: 1272, Mid: 1025

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
year 0 1.00 2017.71 3.96 2011.00 2015.00 2018.00 2021.00 2024.00 ▅▇▆▇▇
rank 0 1.00 86.60 42.59 1.00 52.00 91.00 123.00 158.00 ▅▆▇▇▇
ladder_score 0 1.00 5.20 1.09 1.36 4.39 5.14 5.98 7.86 ▁▂▇▆▂
upperwhisker 10776 0.44 5.40 1.08 1.43 4.63 5.39 6.21 7.90 ▁▂▇▇▂
lowerwhisker 10776 0.44 5.16 1.12 1.30 4.34 5.12 5.98 7.78 ▁▂▇▇▂
explained_by:_log_gdp_per_capita 10803 0.44 1.12 0.45 0.00 0.78 1.11 1.45 2.21 ▂▆▇▆▂
explained_by:_social_support 10803 0.44 1.02 0.36 0.00 0.77 1.03 1.31 1.84 ▁▅▇▇▂
explained_by:_healthy_life_expectancy 10821 0.44 0.50 0.22 0.00 0.33 0.50 0.65 1.14 ▂▇▇▅▁
explained_by:_freedom_to_make_life_choices 10814 0.44 0.54 0.18 0.00 0.42 0.55 0.66 1.02 ▁▃▇▆▂
explained_by:_generosity 10803 0.44 0.15 0.09 0.00 0.09 0.14 0.20 0.57 ▆▇▂▁▁
explained_by:_perceptions_of_corruption 10808 0.44 0.13 0.10 0.00 0.06 0.10 0.16 0.59 ▇▅▁▁▁
dystopia_+_residual 10837 0.44 1.83 0.65 -0.11 1.44 1.84 2.25 3.48 ▁▃▇▆▁

There are many variables with missing values:

  • 39: This corresponds to those WHR regions, that are not available in the official M49 classification system: Kosovo, North Cyprus, Somaliland Region and Taiwan Province of China.
  • 130: There are several region like “World” or “Latin America & Caribbean” where income_group is not defined.
  • 3531: Many countries not eligible for a specific World Bank lending_category.
  • 10245: region8 resp. code8 does only provide a partial set of all countries in the world.
  • 10776 - 10837: These missing values represent missing data in the WHR reports.

2.5 Summary

The whr_final.rds file contains all WHR data and is the result of combining the WHR data with the M49 and WB classification systems.

There are two different approaches to use this dataset:

  1. To compare countries of a specific year
  • filter by Year and by one of the 48 different classification criteria of group48 or
  • filter by Year and (one value of) one of the four regional groups of countries you and by group48 == "World [WLD]".
  1. To compare the development of countries
  • filter by one value of the 48 different classification criteria of group48
  • filter by (one value of) one of the four regional groupings (region5, region7, region17, or region23) and by group48 == "World [WLD]".

There are four complete regional groups covering all countries of the world and two groups where not all group8 or multiple countries group48 are covered. To get a reference for future use I will display all countries of these six classification criteria. I will not use an R code collection because I want to have a direct link with each of these different classifications.

Table 2.2: Different classification criteria
Column name Old name Origin / Note
region5 Section 2.5.1.1 region_n M49 / continents
region7 Section 2.5.1.2 Region WB / pre-defined standard
region17 Section 2.5.1.3 subr_n M49 / North-West-South-East
region23 Section 2.5.1.4 midr_n2 M49 / detailed continent parts
group8 Section 2.5.2.1 midr_n M49 / Africa, South-America, Caribbean, Channel Islands + 140 undefined
group48 Section 2.5.2.2 WB_Group_Name WB / compositions all

2.5.1 Regions

2.5.1.1 region5

R Code 2.27 : Show countries for region5

Code
df <- base::readRDS("data/whr-cantril/rds/whr_final.rds") |> 
  dplyr::filter(year == 2024 & group48 == "World [WLD]")

(
  dt_region5 <- class_scheme(
              df,
              sel1 = rlang::quo(`country_m49`),
              sel2 = rlang::quo(`region5`)
              ) 
)
Table 2.3: WHR countries grouped by five continents

regional5 classification results to 6 regions containing 147 countries (including another group with 2 regions that not part of the official M49 classification.)

2.5.1.2 region7

R Code 2.28 : Show countries for region7

Code
df <- base::readRDS("data/whr-cantril/rds/whr_final.rds") |> 
  dplyr::filter(year == 2024 & group48 == "World [WLD]")

(
  dt_region7 <- class_scheme(
              df,
              sel1 = rlang::quo(`country_m49`),
              sel2 = rlang::quo(`region7`)
              )
)

regional7 classification results to 8 regions containing 147 countries (including another group with 2 regions that not part of the official M49 classification.)

2.5.1.3 region17

R Code 2.29 : Show countries for region17

Code
df <- base::readRDS("data/whr-cantril/rds/whr_final.rds") |> 
  dplyr::filter(year == 2024 & group48 == "World [WLD]")

(
  dt_region17 <- class_scheme(
              df,
              sel1 = rlang::quo(`country_m49`),
              sel2 = rlang::quo(`region17`)
              )
)

regional17 classification results to 15 regions containing 147 countries (including another group with 2 regions that not part of the official M49 classification.)

There are three region missing from the M49 classification: Melanesia, Micronesia, and Polynesia.

2.5.1.4 region23

R Code 2.30 : Show countries for region23

Code
df <- base::readRDS("data/whr-cantril/rds/whr_final.rds") |> 
  dplyr::filter(year == 2024 & group48 == "World [WLD]")

(
  dt_region23 <- class_scheme(
              df,
              sel1 = rlang::quo(`country_m49`),
              sel2 = rlang::quo(`region23`)
              )
)
Table 2.4: WHR countries grouped by 23 regions

regional23 classification results to 20 regions containing 147 countries (including another group with 2 regions that not part of the official M49 classification.)

There are four region missing from the M49 classification: Channel Island, Melanesia, Micronesia, Polynesia.

2.5.2 Groups

2.5.2.1 group8

R Code 2.31 : Show countries for group8

Code
df <- base::readRDS("data/whr-cantril/rds/whr_final.rds") |> 
  dplyr::filter(year == 2024 & group48 == "World [WLD]")

(
  dt_group8 <- class_scheme(
              df,
              sel1 = rlang::quo(`country_m49`),
              sel2 = rlang::quo(`group8`)
              )
)

regional8 classification results to 8 regions containing 147 countries. It includes a (unspecified) group with with 90 countries including those regions used by the WHR data that are not official countries by the M49 classification.

2.5.2.2 group48

R Code 2.32 : Show countries for group48

Code
df <- base::readRDS("data/whr-cantril/rds/whr_final.rds") |> 
  dplyr::filter(year == 2024)

(
  dt_group48 <- class_scheme(
              df,
              sel1 = rlang::quo(`country_m49`),
              sel2 = rlang::quo(`group48`)
              )
)

group48 classification results to 47 regions containing 1447 countries. There is one group containing Kosovo and Somaliland with the code “NA [NA]” missing, because I have recoded all “NA [NA]” to “World [WLD]”.

2.6 Glossary

(Some of the abbreviation have at their end an additional “x” that is not part of the abbreviation. I chose this work around to distinguish these abbreviations from the same text chunks in one of the glossary entries. This is a bug in the {glossary} package.)

term definition
CSV Text files where the values are separated with commas (Comma Separated Values = CSV). These files have the file extension .csv
GNIx Gross National Income (GNI) is a measure of a country's income, which includes all the income earned by a country's residents, businesses, and earnings from foreign sources. It is defined as the total amount of money earned by a nation's people and businesses, no matter where it was earned. GNI is an alternative to GDP as a way to measure and track a nation’s wealth, as it calculates income instead of output.
IBRD The International Bank for Reconstruction and Development (IBRD) is a global development cooperative owned by 189 member countries. As the largest development bank in the world, it supports the World Bank Group’s mission by providing loans, guarantees, risk management products, and advisory services to middle-income and creditworthy low-income countries, as well as by coordinating responses to regional and global challenges. (https://www.worldbank.org/en/who-we-are/ibrd)
IDAx The International Development Association (IDA) is the part of the World Bank that helps the world’s low-income countries. IDA's grants and low-interest loans help countries invest in their futures, improve lives, and create safer, more prosperous communities around the world. (https://ida.worldbank.org/en/what-is-ida)
LDCx The term “Least Developed Countries” (LDCs) refers to developing countries listed by the United Nations that exhibit the lowest indicators of socioeconomic development. As of December 2024, the classification applies to 44 countries. See https://unctad.org/topic/least-developed-countries/list
LLDC Landlocked Developing Countries (LLDCs) are developing nations that do not have direct access to the sea. These countries face significant economic and development challenges due to their geographical isolation and the need to rely on neighboring countries for access to international markets. Of the 32 LLDCs 16 are classified as LDCs (December 2024). See: https://www.un.org/ohrlls/content/about-landlocked-developing-countries
M49 The United Nations publication "Standard Country or Area Codes for Statistical Use" was originally published as Series M, No. 49 and is now commonly referred to as the M49 standard. M49 is a country/areas classification system prepared by the Statistics Division of the United Nations Secretariat primarily for use in its publications and databases.
OMNIKA OMNIKA DataStore is an open-access data science resource for researchers, authors, and technologists. OMNIKA Foundation is an American 501(c)(3) nonprofit organization that operates a digital mythological library. Almost every culture has relevant mythology that explains where we came from, why things are the way they are, and a number of other things. OMNIKA's goal is to collect, organize, index, and quantify all of those data in one place and make them available for free. (https://omnika.org/info/about)
RDS The abbreviation “RDS” in file endings `.rds` refers to “R Data Serialized”. It is a format used by the R programming language to serialize and store R objects, such as data frames, lists, and functions, in a compact and portable binary format.
SIDS Small Island Developing States (SIDS) are a group of developing countries that are small island nations and territories facing similar sustainable development challenges. These countries are particularly vulnerable to environmental and economic shocks due to their small size, limited resources, and remote locations. The aggregate population of all the SIDS is 65 million. See: https://www.un.org/ohrlls/content/about-small-island-developing-states
UNSD The United Nations Statistics Division (UNSD) is committed to the advancement of the global statistical system. It compiles and disseminates global statistical information, develop standards and norms for statistical activities, and support countries' efforts to strengthen their national statistical systems.
WHR The World Happiness Reports are a partnership of Gallup, the Oxford Wellbeing Research Centre, the UN Sustainable Development Solutions Network, and the WHR’s Editorial Board. The report is produced under the editorial control of the WHR Editorial Board. The Reports reflects a worldwide demand for more attention to happiness and well-being as criteria for government policy. It reviews the state of happiness in the world today and shows how the science of happiness explains personal and national variations in happiness. (https://worldhappiness.report/about/)

Session Info

Session Info

Code
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.3 (2025-02-28)
#>  os       macOS Sequoia 15.3.2
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Vienna
#>  date     2025-04-21
#>  pandoc   3.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
#>  quarto   1.6.42 @ /usr/local/bin/quarto
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  base64enc     0.1-3   2015-07-28 [1] CRAN (R 4.4.1)
#>  bslib         0.9.0   2025-01-30 [1] CRAN (R 4.4.1)
#>  cachem        1.1.0   2024-05-16 [1] CRAN (R 4.4.1)
#>  cli           3.6.4   2025-02-13 [1] CRAN (R 4.4.1)
#>  colorspace    2.1-1   2024-07-26 [1] CRAN (R 4.4.1)
#>  commonmark    1.9.2   2024-10-04 [1] CRAN (R 4.4.1)
#>  crayon        1.5.3   2024-06-20 [1] CRAN (R 4.4.1)
#>  crosstalk     1.2.1   2023-11-23 [1] CRAN (R 4.4.0)
#>  curl          6.2.1   2025-02-19 [1] CRAN (R 4.4.1)
#>  digest        0.6.37  2024-08-19 [1] CRAN (R 4.4.1)
#>  dplyr         1.1.4   2023-11-17 [1] CRAN (R 4.4.0)
#>  DT            0.33    2024-04-04 [1] CRAN (R 4.4.0)
#>  evaluate      1.0.3   2025-01-10 [1] CRAN (R 4.4.1)
#>  fastmap       1.2.0   2024-05-15 [1] CRAN (R 4.4.1)
#>  generics      0.1.3   2022-07-05 [1] CRAN (R 4.4.1)
#>  ggplot2       3.5.1   2024-04-23 [1] CRAN (R 4.4.0)
#>  glossary    * 1.0.0   2023-05-30 [1] CRAN (R 4.4.0)
#>  glue          1.8.0   2024-09-30 [1] CRAN (R 4.4.1)
#>  gtable        0.3.6   2024-10-25 [1] CRAN (R 4.4.1)
#>  here          1.0.1   2020-12-13 [1] CRAN (R 4.4.1)
#>  htmltools     0.5.8.1 2024-04-04 [1] CRAN (R 4.4.1)
#>  htmlwidgets   1.6.4   2023-12-06 [1] CRAN (R 4.4.0)
#>  jquerylib     0.1.4   2021-04-26 [1] CRAN (R 4.4.0)
#>  jsonlite      1.9.1   2025-03-03 [1] CRAN (R 4.4.1)
#>  kableExtra    1.4.0   2024-01-24 [1] CRAN (R 4.4.0)
#>  knitr         1.49    2024-11-08 [1] CRAN (R 4.4.1)
#>  lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.4.1)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.4.1)
#>  markdown      1.13    2024-06-04 [1] CRAN (R 4.4.1)
#>  munsell       0.5.1   2024-04-01 [1] CRAN (R 4.4.1)
#>  pillar        1.10.1  2025-01-07 [1] CRAN (R 4.4.1)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.4.1)
#>  purrr         1.0.4   2025-02-05 [1] CRAN (R 4.4.1)
#>  R6            2.6.1   2025-02-15 [1] CRAN (R 4.4.1)
#>  repr          1.1.7   2024-03-22 [1] CRAN (R 4.4.0)
#>  rlang         1.1.5   2025-01-17 [1] CRAN (R 4.4.1)
#>  rmarkdown     2.29    2024-11-04 [1] CRAN (R 4.4.1)
#>  rprojroot     2.0.4   2023-11-05 [1] CRAN (R 4.4.1)
#>  rstudioapi    0.17.1  2024-10-22 [1] CRAN (R 4.4.1)
#>  rversions     2.1.2   2022-08-31 [1] CRAN (R 4.4.1)
#>  sass          0.4.9   2024-03-15 [1] CRAN (R 4.4.0)
#>  scales        1.3.0   2023-11-28 [1] CRAN (R 4.4.0)
#>  sessioninfo   1.2.3   2025-02-05 [1] CRAN (R 4.4.1)
#>  skimr         2.1.5   2022-12-23 [1] CRAN (R 4.4.0)
#>  stringi       1.8.4   2024-05-06 [1] CRAN (R 4.4.1)
#>  stringr       1.5.1   2023-11-14 [1] CRAN (R 4.4.0)
#>  svglite       2.1.3   2023-12-08 [1] CRAN (R 4.4.0)
#>  systemfonts   1.2.1   2025-01-20 [1] CRAN (R 4.4.1)
#>  tibble        3.2.1   2023-03-20 [1] CRAN (R 4.4.0)
#>  tidyr         1.3.1   2024-01-24 [1] CRAN (R 4.4.1)
#>  tidyselect    1.2.1   2024-03-11 [1] CRAN (R 4.4.0)
#>  utf8          1.2.4   2023-10-22 [1] CRAN (R 4.4.1)
#>  vctrs         0.6.5   2023-12-01 [1] CRAN (R 4.4.0)
#>  viridisLite   0.4.2   2023-05-02 [1] CRAN (R 4.4.1)
#>  withr         3.0.2   2024-10-28 [1] CRAN (R 4.4.1)
#>  xfun          0.51    2025-02-19 [1] CRAN (R 4.4.1)
#>  xml2          1.3.7   2025-02-28 [1] CRAN (R 4.4.1)
#>  yaml          2.3.10  2024-07-26 [1] CRAN (R 4.4.1)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library
#>  * ── Packages attached to the search path.
#> 
#> ──────────────────────────────────────────────────────────────────────────────

References

Berroth, Markus. 2019. “Bang Bang - How to Program with Dplyr.” https://www.statworx.com/en/content-hub/blog/bang-bang-how-to-program-with-dplyr/.
McCarthy, Kieren. 2020. “After 20-Year Battle, Channel Island Sark Finally Earns the Right to Exist on the Internet with Its Own Top-Level Domain.” https://www.theregister.com/2020/03/23/sark_cctld_iso/.