2 Regions and their Countries

Objectives

Country and Regions: Classifications for countries

Understanding the country classification used by my WHR data sources (Section 2.1).
Inspecting different approaches to classify countries by international organizations:
- World Bank and
- United Nations Statistics Division
Modifying resp. adding country classifications to the WHR data where necessary so that it conforms to the internationally recognized and approved systems.

I aim to compare different aspects of countries. I want, for instance, to know how well Austria is doing compared to other European countries, the other member states of the European Union, or other OECD countries. It is, therefore, vital to have a consistent categorization system with different grouping schemes.

2.1 Country groupings in WHR

Some of the WHR years (WHR reports 2013, 2015, 2020, and 2021) have a regional grouping incorporated. But this grouping is not included in the (new) dataset 2024 from the WHR report 2025.

2.1.1 Classification of WHR 2020

I will display the regional grouping of the WHR data with the example from the year 2020 (WHR report 2021).

R Code 2.1 : WHR data 2020 classification

Listing / Output 2.1: Result of the WHR classification system for WHR data 2020 used in the WHR report 2021.

Code

(
  df_dt_whr_2020 <-  base::readRDS("data/whr/raw/whr_raw_2021.rds") |> 
      dplyr::select(`Country name`, `Regional indicator`) |> 
      dplyr::nest_by(`Regional indicator`) |> 
      dplyr::mutate(data = as.vector(data)) |>
      dplyr::mutate(data = stringr::str_c(data, collapse = "; ")) |>
      dplyr::mutate(data = paste(data, ";")) |> 
      dplyr::mutate(N = lengths(gregexpr(";", data))) |> 
      dplyr::rename(Country = data) |> 
      DT::datatable(class = 'cell-border compact stripe', 
                options = list(
                  pageLength = 25,
                  lengthMenu = c(5, 10, 15, 20, 25, 50)
                  )
            )
)

There are 10 different regional indicators. The datasets for 2013, 2015, 2020 and 2021 use all the same classification scheme with 149 countries in 10 regions.

149 are by far not all countries of the world. Their complete number is about 195 with some insecurities about Holy See (Vatican), the State of Palestine, Taiwan and Kosovo. (Compare: How Many Countries Are There In The World?) The reason for this lower number is simple: For only those 149 countries are subjective well-being data in the study year 2020 available.

2.1.2 `class_scheme()` function

As I am going to list several classification variants it pays the effort to develop a function for the repetitive task.

R Code 2.2 : Function class_scheme() for showing classification schemes

Listing / Output 2.2: Function class_scheme() for showing results of a classification system

Code

class_scheme <- function(df, sel1, sel2) {
    ## df = dataframe to show
    ## sel1 = name of the first column (country names) to select
    ## sel2 = name of the column with the regional indicator
  df |> 
        dplyr::select(!!sel1, !!sel2) |> 
        dplyr::nest_by(!!sel2) |> 
        dplyr::mutate(data = as.vector(data)) |>
        dplyr::mutate(data = stringr::str_c(data, collapse = "; ")) |>
        dplyr::mutate(data = paste(data, ";")) |> 
        dplyr::mutate(N = lengths(gregexpr(";", data))) |> 
        dplyr::rename(Country = data) |> 
        dplyr::arrange(!!sel2) |> 
        DT::datatable(class = 'cell-border compact stripe', 
            options = list(
              pageLength = 25,
              lengthMenu = c(5, 10, 15, 20, 25, 50)
              )
        )
}

Here I am using complex code lines. Using {dplyr} programming code in functions needs some special consideration. I have learned the details from “Bang Bang – How to program with dplyr” (Berroth 2019).

2.1.3 WHR 2020 with `class_scheme()` function

As the class_scheme() function is now in place, I can display with this function the different grouping schemes. At first I will try it out with the WHR data from the 2021 report:

R Code 2.3 : Classification of the WHR data

Code

df_whr <-  base::readRDS(
    paste0(here::here(), "/data/whr/raw/whr_raw_2021.rds"))
(
    whr_class <- class_scheme(
            df = df_whr,
            sel1 = rlang::quo(`Country name`),
            sel2 = rlang::quo(`Regional indicator`)
            )
)

It worked! I got the same result as in Listing / Output 2.1.

2.2 Official classifications

There are already different classification systems in place: International organizations (e.g., World Bank, United Nations) have developed them with several grouping variants.

I will look into these two official classifications schemes of World Bank and United Nations and apply the following procedure:

Procedure 2.1 : Understand structure and content of the official classifications schemata

Create a directory for storing the different country classification files (see Section 2.2.1).
Download classification files and store them for faster access as R objects with rds format (see Section 2.2.2.1 and Section 2.2.2.2).
Inspect the data classification files of World Bank (Section 2.2.3.1) and of the United Nations (Section 2.2.3.2) in detail.

2.2.1 Create data directories

R Code 2.4 : Create folders for country classification files

Code

my_create_folder(base::paste0(here::here(), "/data/"))
my_create_folder(base::paste0(here::here(), "/data/country-class"))
my_create_folder(base::paste0(here::here(), "/data/country-class/wb"))
my_create_folder(base::paste0(here::here(), "/data/country-class/unsd"))
my_create_folder(base::paste0(here::here(), "/data/country-class/wb/excel"))
my_create_folder(base::paste0(here::here(), "/data/country-class/wb/csv"))
my_create_folder(base::paste0(here::here(), "/data/country-class/wb/rds"))
my_create_folder(base::paste0(here::here(), "/data/country-class/unsd/excel"))
my_create_folder(base::paste0(here::here(), "/data/country-class/unsd/csv"))
my_create_folder(base::paste0(here::here(), "/data/country-class/unsd/rds"))

(For this R code chunk is no output available)

2.2.2 Download classification files

2.2.2.1 World Bank

The World Bank Classification can be downloaded from How does the World Bank classify countries?. Near the bottom of the page you can see the line “Download an Excel file of historical classifications by income.”, providing a link with the word “Download”. The downloaded file CLASS.xlsx does not contain a historical classification by income but the general classification system of the last available year (2023).

Yes, there is another Excel file OGHIST.xslx with the historical cutoffs for incomes and lending categories, dating from 1987 to 2023. But the download link for this file is located at another web page: World Bank Country and Lending Groups. On this page you will also find the updates for the cutoffs for countries GNI income per capita which is important for the lending eligibility of countries. World Bank country classifications by income level for 2024-2025 has the current updated values and changes over the last year.

The file CLASS.xlsx I am interested here consists of three sheets.

1. “List of Economies”
1. “compositions” and
1. “Notes”

I will download the original Excel file with all it sheets and save programmatically

the Excel file with all its sheet
CSV snapshots of all sheets (file extension = .csv) and
R objects of all sheets (file extension = .rds)

The CSV snapshots support reproducibility because it stores the proprietary Excel file in in a tool-agnostic, future-proof format. I am using code inspired by the vignette/article readxl Workflows

R Code 2.5 : Download the World Bank CLASS Excel file

Run this code chunk manually if the file still needs to be downloaded.

Code

url_excel = "https://datacatalogfiles.worldbank.org/ddh-published/0037712/DR0090755/CLASS.xlsx"
path_wb_excel <- base::paste0(here::here(), 
            "/data/country-class/wb/excel/wb-class.xlsx")
path_wb_csv <- base::paste0(here::here(), 
            "/data/country-class/wb/csv/")
path_wb_rds <- base::paste0(here::here(), 
            "/data/country-class/wb/rds/")

## download wb-class file ##############
downloader::download(
    url = url_excel,
    destfile = path_wb_excel
)

## from readxl workflow article ##############
## includes also my_excel_as_csv_and_rds() function 
path_wb_excel |> 
  readxl::excel_sheets()  |> 
  rlang::set_names()  |>  
  purrr::map(my_excel_as_csv_and_rds, 
             path_excel = path_wb_excel, 
             path_csv = path_wb_csv,
             path_rds = path_wb_rds
             )

2.2.2.2 UNSD-M49

Another more detailed classification system expressively developed for statistical purposes is developed by the United Nations Statistics Division UNSD using the M49 methodology.

The result is called Standard country or area codes for statistical use (M49) and can be downloaded manually in different languages and formats (Copy into the clipboard, Excel or CSV from the Overview page. On the page “Overview” is no URL for an R script available, because triggering one of the buttons copies or downloads the data with the help of Javascript. So I had to download the file manually or to find another location where I could download it programmatically.

I found with the OMNIKA DataStore an external source for the UNSD-M49 country classification. For security reason I checked the two files with base::all.equal() to determine if those two files are identical. Yes, they are!

The UNSD M40 standard area codes are stored as Excel and CSV files. I download for reproducibility reason the CSV file.

R Code 2.6 : Download the UNSD-M49 CSV file and create an R object (“.rds”)

Run this code chunk manually if the file still needs to be downloaded.

Code

## download unsd-m49 file ############
url_unsd_csv <- "https://github.com/omnika-datastore/unsd-m49-standard-area-codes/raw/refs/heads/main/2022-09-24__CSV_UNSD_M49.csv"
path_unsd_csv <- base::paste0(here::here(), 
     "/data/country-class/unsd/csv/2022-09-24__CSV_UNSD_M49.csv")

downloader::download(
    url = url_unsd_csv,
    destfile = path_unsd_csv
)


## create R object ###############
unsd_class <- 
  readr::read_delim(
    file = path_unsd_csv,
    delim = ";"
  )


## save as .rds file ################
my_save_data_file(
  "country-class/unsd/rds", 
  unsd_class, 
  "unsd_class.rds")

(For this R code chunk is no output available)

2.2.3 Inspect classification files

To get an detailed understanding of the data structures I will provide the following outputs:

A summary statistics with skimr::skim() followed by inspection of the first data with dplyr::glimpse().
Several detailed outputs of the classifications categories (regions) and their elements (countries) in different code chunks (tabs).

2.2.3.1 World Bank

Code Collection 2.1 : Inspect the structure of the World Bank classification

R Code 2.7 : Inspect sheet List of Economies of the World Bank classification file

Code

wb_class_economies <- base::readRDS(
  "data/country-class/wb/rds/wb-class-List of economies.rds")
glue::glue("******************* Using skimr::skim() ***************************")
skimr::skim(wb_class_economies)
glue::glue("")
glue::glue("****************** Using dplyr::glimpse() *************************")
dplyr::glimpse(wb_class_economies)

#> ******************* Using skimr::skim() ***************************

Data summary
Name	wb_class_economies
Number of rows	267
Number of columns	5
_______________________
Column type frequency:
character	5
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
Economy	1	1.00	4	50	266
Code	1	1.00	3	3	266
Region	49	0.82	10	26	7
Income group	50	0.81	10	19	4
Lending category	122	0.54	3	5	3

#> 
#> ****************** Using dplyr::glimpse() *************************
#> Rows: 267
#> Columns: 5
#> $ Economy            <chr> "Afghanistan", "Albania", "Algeria", "American Samo…
#> $ Code               <chr> "AFG", "ALB", "DZA", "ASM", "AND", "AGO", "ATG", "A…
#> $ Region             <chr> "South Asia", "Europe & Central Asia", "Middle East…
#> $ `Income group`     <chr> "Low income", "Upper middle income", "Upper middle …
#> $ `Lending category` <chr> "IDA", "IBRD", "IBRD", NA, NA, "IBRD", "IBRD", "IBR…

R Code 2.8 : Inspect sheet compositions of the World Bank classification file

Code

wb_class_compositions <- base::readRDS(
  "data/country-class/wb/rds/wb-class-compositions.rds")
glue::glue("******************* Using skimr::skim() ***************************")
skimr::skim(wb_class_compositions)
glue::glue("")
glue::glue("****************** Using dplyr::glimpse() *************************")
dplyr::glimpse(wb_class_compositions)

#> ******************* Using skimr::skim() ***************************

Data summary
Name	wb_class_compositions
Number of rows	2085
Number of columns	4
_______________________
Column type frequency:
character	4
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
WB_Group_Code	1	3	3	48
WB_Group_Name	1	5	50	48
WB_Country_Code	1	3	3	218
WB_Country_Name	1	4	30	218

#> 
#> ****************** Using dplyr::glimpse() *************************
#> Rows: 2,085
#> Columns: 4
#> $ WB_Group_Code   <chr> "AFE", "AFE", "AFE", "AFE", "AFE", "AFE", "AFE", "AFE"…
#> $ WB_Group_Name   <chr> "Africa Eastern and Southern", "Africa Eastern and Sou…
#> $ WB_Country_Code <chr> "AGO", "BWA", "BDI", "COM", "COD", "ERI", "SWZ", "ETH"…
#> $ WB_Country_Name <chr> "Angola", "Botswana", "Burundi", "Comoros", "Congo, De…

R Code 2.9 : Pre-defined standard categorization

Code

df_wb_standard <-  base::readRDS(
  "data/country-class/wb/rds/wb-class-List of economies.rds") |> 
    dplyr::slice(1:218)



(
    wb_class_standard <- class_scheme(
            df = df_wb_standard,
            sel1 = rlang::quo(`Economy`),
            sel2 = rlang::quo(`Region`)
            )
)

Region is a coarse classification scheme with only 7 regions formed by 218 countries.

R Code 2.10 : All provided groups, regional, economical and political

Code

df_wb_all <-  base::readRDS(
  "data/country-class/wb/rds/wb-class-compositions.rds")


(
    wb_class_all <- class_scheme(
            df = df_wb_all,
            sel1 = rlang::quo(`WB_Country_Name`),
            sel2 = rlang::quo(`WB_Group_Name`)
            )
)

WB_Group_Name in the “compositions” file contains all available groups. They are not restricted to regional groups because they are formed by economical and political criteria as well. There is no 1:1 match, because almost all countries belong to two or more groups. There are 48 groups with a total of 2085 elements.

R Code 2.11 : Groups formed by regional criteria (without the redundant World region)

Code

str_reg <- c("AFE", "AFW", "ARB", "CSS", "CEB",
             "EAS", "ECS", "LCN", "MEA", "NAC",
             "OSS", "PSS", "SST", "SAS", "SSF")

df_wb_reg <-  base::readRDS(
  "data/country-class/wb/rds/wb-class-compositions.rds") |> 
  dplyr::filter(WB_Group_Code %in% str_reg)

(
    wb_class_reg <- class_scheme(
            df = df_wb_reg,
            sel1 = rlang::quo(`WB_Country_Name`),
            sel2 = rlang::quo(`WB_Group_Name`)
            )
)

Browsing through the composition data I have defined 15 WB_GROUP_CODEs as regional codes. These regional classification criteria results per definition to 15 regions containing 379 countries.

R Code 2.12 : Groups formed by regional criteria (without the redundant World region)

Code

str_reg2 <- c("AFE", "AFW", "ARB", "CEB",
             "EAS", "ECS", "LCN", "MEA", "NAC",
             "SAS", "SSF")

df_wb_reg2 <-  base::readRDS(
  "data/country-class/wb/rds/wb-class-compositions.rds") |> 
  dplyr::filter(WB_Group_Code %in% str_reg2)

(
    wb_class_reg2 <- class_scheme(
            df = df_wb_reg2,
            sel1 = rlang::quo(`WB_Country_Name`),
            sel2 = rlang::quo(`WB_Group_Name`)
            )
)

Browsing through the composition data I have declassified all small states for an alternative regional group. These regional classification criteria are smaller and results to 11 regions containing 299 countries.

2.2.3.1.1 Description of the four tabs

WB economies displays the “List of Economies” and has five columns:
- Economy with the country names (2-219) and regional names (221-268)
- Code with the ISO alpha3 codes for countries (2-219) and for the regional names (221-268)
- Region with seven different regional names:
  - East Asia and Pacific,
  - Europe and Central Asia,
  - Latin America & the Caribbean,
  - Middle East and North Africa,
  - North America
  - South Asia and
  - Sub-Saharan Africa
- Income group with four groups: Low income, Lower middle income, Higher middle income, and High income.
- Lending category with three groups: IBRD, Blend, and IDA.
WB compositions has four columns: WB_Group_Code, WB_Group_Name, WB_Country_Code, WB_Country_Name. The 2084 rows are combinations of the regional and income group with their ISO alpha 3 codes and country names.
WB Standard shows the World Bank seven standard regional groups with their countries. The 218 countries involved in the taxonomy of the World Bank consists of all member countries of the World Bank (189) and other economies with populations of more than 30,000 (29).
WB All includes the seven regions from the “WB Standard” tab but much more. But it is important to note that there is no alternative regional structure that comprises systematically all countries of the world — the overall category “World” obviously excluded.
- Five of the seven regional groups of “WB Standard” are also clustered without high income countries.
- There are six other regional subcategories: “Arab World”, “Caribbean small states”, “Central Europe and Baltics”, “Other small states”, “Pacific island small states”, “Small states”.
- Additionally there are some political groups like European Union, OECD and
- several economical classification like “Euro area”,
- different combinations of the four income groups and different combinations of the three lending statuses.

More details

The cut off limits for the income groups are:

low income, $1,145 or less;
lower middle income, $1,146 to $4,515;
upper middle income, $4,516 to $14,005; and
high income, more than $14,005.

The effective operational cutoff for IDA eligibility is $1,335 or less. The three lending categories and their relation to each other are:

IDA countries are those that lack the financial ability to borrow from IBRD. IDA credits are deeply concessional—interest-free loans and grants for programs aimed at boosting economic growth and improving living conditions. IBRD loans are non-concessional. Blend countries are eligible for IDA credits because of their low per capita incomes but are also eligible for IBRD because they are financially creditworthy.

Three additional remark relating to the Notes sheet:

In the Notes I found the sentence: “Geographic classifications in this table cover all income levels.” But there is a difference of one missing data value more in the Income group column compared with the Region column (50:49). The reason is that Venezuela RB is lacking an income group because it has been temporarily unclassified since July 2021 pending release of revised national accounts statistics. Venezuela, RB was classified as an upper-middle income country until FY21, has been unclassified since then due to the unavailability of data. But it is now again classified as Upper middle income (See the World Bank page about Venezuela, RB).
The term country, used interchangeably with economy, does not imply political independence but refers to any territory for which authorities report separate social or economic statistics.
What follows is a quote about some details of the income classifications for the 2023 file:

Set on 1 July 2022 remain in effect until 1 July 2023. Venezuela has been temporarily unclassified since July 2021 pending release of revised national accounts statistics. Argentina, which was temporarily unclassified in July 2016 pending release of revised national accounts statistics, was classified as upper middle income for FY17 as of 29 September 2016 based on alternative conversion factors. Also effective 29 September 2016, Syrian Arab Republic is reclassified from IBRD lending category to IDA-only. On 29 March 2017, new country codes were introduced to align World Bank 3-letter codes with ISO 3-letter codes: Andorra (AND), Dem. Rep. Congo (COD), Isle of Man (IMN), Kosovo (XKX), Romania (ROU), Timor-Leste (TLS), and West Bank and Gaza (PSE). It is to be noted that Venezuela, RB classified as an upper-middle income country until FY21, has been unclassified since then due to the unavailability of data.

2.2.3.1.2 Summary

The only missing data in the columns Economy and Code corresponds to the empty line #220 that separates the country codes from the regional codes. The missing data in the other columns stem from the different structure of the second part (starting with row #221) of the data, which consists only of the two columns ‘Economy’ and ‘Code’.

Essentially this means that we have in the wb-class.xlsx file two different data sets: One for economies and the other one to explicate regional, economical and political grouping codes. In the Excel sheet compositions you will find an extended list of all available group names and their three letter codes combined with the country names and their three letter codes. These group names comprise different kinds of regional groups but also names and codes for different combination of country incomes and lending categories.

All these groups may be of interests for analysis of different trends. But the regional (sub)groups of the compositions sheet do not add up to the complete number of countries (218). This is in contrast to the different regional groups of the WID database because all their regional groups (region1 = 5, region2 = 18, region4 = 10, and region5 = 8 groups) includes all countries (in this case: 216).

The World Bank file wb-class.xlsx classifies all World Bank member countries (189), and all other economies with populations of more than 30,000 (29) in a coarse grid of only seven regions. For operational and analytical purposes, these economies are divided among income groups according to their gross national income (GNI) per capita in 2023, calculated using the World Bank Atlas method.

2.2.3.2 United Nations

Code Collection 2.2 : Inspect UNSD-M49 geoscheme classification

R Code 2.13 : Inspect UNSD M49 geoscheme classification

Code

unsd_class <- base::readRDS(
  "data/country-class/unsd/rds/unsd_class.rds")
glue::glue("******************* Using skimr::skim() ***************************")
skimr::skim(unsd_class)
glue::glue("")
glue::glue("****************** Using dplyr::glimpse() *************************")
dplyr::glimpse(unsd_class)

#> ******************* Using skimr::skim() ***************************

Data summary
Name	unsd_class
Number of rows	249
Number of columns	15
_______________________
Column type frequency:
character	15
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
Global Code	0	1.00	3	3	1
Global Name	0	1.00	5	5	1
Region Code	1	1.00	3	3	5
Region Name	1	1.00	4	8	5
Sub-region Code	1	1.00	3	3	17
Sub-region Name	1	1.00	9	31	17
Intermediate Region Code	141	0.43	3	3	8
Intermediate Region Name	141	0.43	9	15	8
Country or Area	0	1.00	4	52	249
M49 Code	0	1.00	3	3	249
ISO-alpha2 Code	2	0.99	2	2	247
ISO-alpha3 Code	1	1.00	3	3	248
Least Developed Countries (LDC)	203	0.18	1	1	1
Land Locked Developing Countries (LLDC)	217	0.13	1	1	1
Small Island Developing States (SIDS)	196	0.21	1	1	1

#> 
#> ****************** Using dplyr::glimpse() *************************
#> Rows: 249
#> Columns: 15
#> $ `Global Code`                             <chr> "001", "001", "001", "001", …
#> $ `Global Name`                             <chr> "World", "World", "World", "…
#> $ `Region Code`                             <chr> "002", "002", "002", "002", …
#> $ `Region Name`                             <chr> "Africa", "Africa", "Africa"…
#> $ `Sub-region Code`                         <chr> "015", "015", "015", "015", …
#> $ `Sub-region Name`                         <chr> "Northern Africa", "Northern…
#> $ `Intermediate Region Code`                <chr> NA, NA, NA, NA, NA, NA, NA, …
#> $ `Intermediate Region Name`                <chr> NA, NA, NA, NA, NA, NA, NA, …
#> $ `Country or Area`                         <chr> "Algeria", "Egypt", "Libya",…
#> $ `M49 Code`                                <chr> "012", "818", "434", "504", …
#> $ `ISO-alpha2 Code`                         <chr> "DZ", "EG", "LY", "MA", "SD"…
#> $ `ISO-alpha3 Code`                         <chr> "DZA", "EGY", "LBY", "MAR", …
#> $ `Least Developed Countries (LDC)`         <chr> NA, NA, NA, NA, "x", NA, NA,…
#> $ `Land Locked Developing Countries (LLDC)` <chr> NA, NA, NA, NA, NA, NA, NA, …
#> $ `Small Island Developing States (SIDS)`   <chr> NA, NA, NA, NA, NA, NA, NA, …

R Code 2.14 : Clean UNSD M49 geoscheme classification

Listing / Output 2.3: Script for data cleaning of the unsd_class.rds file as explained in Procedure 2.2

Code

## column renaming vector ########
m49_cols = c(
  region_c = "Region Code", region_n = "Region Name",
  subr_c = "Sub-region Code", subr_n = "Sub-region Name", 
  midr_c = "Intermediate Region Code", midr_n = "Intermediate Region Name",
  country = "Country or Area", m49 = "M49 Code", 
  iso2 = "ISO-alpha2 Code", iso3 = "ISO-alpha3 Code",
  ldc = "Least Developed Countries (LDC)", 
  lldc = "Land Locked Developing Countries (LLDC)", 
  sids = "Small Island Developing States (SIDS)"
  )
  
## clean data ###############################
unsd_class <- base::readRDS(
  "data/country-class/unsd/rds/unsd_class.rds")
unsd_class_clean <- unsd_class |> 
  dplyr::select(-(1:2)) |> 
  dplyr::rename(tidyselect::all_of(m49_cols)) |> 
  dplyr::filter(country != "Antarctica") |> 
  dplyr::mutate(iso2 = base::ifelse(country == "Namibia", "NA", iso2)) |> 
  dplyr::relocate(country, .before = region_c) |> 
  # .x = anonymous function; "x" = value in cols of unsd_class
  dplyr::mutate(dplyr::across(
    ldc:sids, ~ dplyr::if_else(.x == "x", "1", "999", "0") 
    )) |> 
  dplyr::arrange(country)

## save new tibble ##########
my_save_data_file(
  "country-class/unsd/rds",
  unsd_class_clean,
  "unsd_class_clean.rds"
)


## prepare skimmers ##########
my_skim <- skimr::skim_with(
  character = skimr::sfl(
    whitespace = NULL,
    min = NULL,
    max = NULL,
    empty = NULL
    )
)


## display results ##########
unsd_class <- base::readRDS(
  "data/country-class/unsd/rds/unsd_class.rds")
glue::glue("******************* Using skimr::skim() ***************************")
my_skim(unsd_class_clean) |> dplyr::select(-complete_rate)
glue::glue("")
glue::glue("****************** Using dplyr::glimpse() *************************")
dplyr::glimpse(unsd_class_clean)

#> ******************* Using skimr::skim() ***************************

Data summary
Name	unsd_class_clean
Number of rows	248
Number of columns	13
_______________________
Column type frequency:
character	13
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	n_unique
country	0	248
region_c	0	5
region_n	0	5
subr_c	0	17
subr_n	0	17
midr_c	140	8
midr_n	140	8
m49	0	248
iso2	1	247
iso3	1	247
ldc	0	2
lldc	0	2
sids	0	2

#> 
#> ****************** Using dplyr::glimpse() *************************
#> Rows: 248
#> Columns: 13
#> $ country  <chr> "Afghanistan", "Albania", "Algeria", "American Samoa", "Andor…
#> $ region_c <chr> "142", "150", "002", "009", "150", "002", "019", "019", "019"…
#> $ region_n <chr> "Asia", "Europe", "Africa", "Oceania", "Europe", "Africa", "A…
#> $ subr_c   <chr> "034", "039", "015", "061", "039", "202", "419", "419", "419"…
#> $ subr_n   <chr> "Southern Asia", "Southern Europe", "Northern Africa", "Polyn…
#> $ midr_c   <chr> NA, NA, NA, NA, NA, "017", "029", "029", "005", NA, "029", NA…
#> $ midr_n   <chr> NA, NA, NA, NA, NA, "Middle Africa", "Caribbean", "Caribbean"…
#> $ m49      <chr> "004", "008", "012", "016", "020", "024", "660", "028", "032"…
#> $ iso2     <chr> "AF", "AL", "DZ", "AS", "AD", "AO", "AI", "AG", "AR", "AM", "…
#> $ iso3     <chr> "AFG", "ALB", "DZA", "ASM", "AND", "AGO", "AIA", "ATG", "ARG"…
#> $ ldc      <chr> "1", "0", "0", "0", "0", "1", "0", "0", "0", "0", "0", "0", "…
#> $ lldc     <chr> "1", "0", "0", "0", "0", "0", "0", "0", "0", "1", "0", "0", "…
#> $ sids     <chr> "0", "0", "0", "1", "0", "0", "1", "1", "0", "0", "1", "0", "…

R Code 2.15 : Display regions of UNSD class scheme

Code

df_unsd <-  base::readRDS(
  "data/country-class/unsd/rds/unsd_class_clean.rds")

(
    unsd_class1 <- class_scheme(
            df = df_unsd,
            sel1 = rlang::quo(`country`),
            sel2 = rlang::quo(`region_n`)
            )
)

R Code 2.16 : Display sub-regions of UNSD class scheme

Code

df_unsd <-  base::readRDS(
  "data/country-class/unsd/rds/unsd_class_clean.rds")

(
    unsd_class2 <- class_scheme(
            df = df_unsd,
            sel1 = rlang::quo(`country`),
            sel2 = rlang::quo(`subr_n`)
            )
)

R Code 2.17 : Display intermediate regions of UNSD class scheme

Code

df_unsd <-  base::readRDS(
  "data/country-class/unsd/rds/unsd_class_clean.rds")

(
    unsd_class3 <- class_scheme(
            df = df_unsd,
            sel1 = rlang::quo(`country`),
            sel2 = rlang::quo(`midr_n`)
            )
)

R Code 2.18 : Display alternative intermediate regions of UNSD class scheme

Code

unsd_class4 <-  base::readRDS(
  "data/country-class/unsd/rds/unsd_class_clean.rds")

unsd_class4 <- unsd_class4 |> 
  dplyr::mutate(midr_n2 = 
         base::ifelse(is.na(midr_n), subr_n, midr_n)
         )

(
    unsd_class4 <- class_scheme(
            df = unsd_class4,
            sel1 = rlang::quo(`country`),
            sel2 = rlang::quo(`midr_n2`)
            )
)

midr_n2 is a classification scheme with 248 countries in 23 regions.

2.2.3.2.1 Descriptions of the UNSD-M49 geoscheme classification

What follows is a description if the tabs in Code Collection 2.2.

Tab “raw”: The raw file unsd_class has 15 columns as you can also see online from the Overview page. The many missing values (NAs) for the categories LDC, LLDC and SIDS are easy explained: These three columns are coded with an ‘x’ if the country of this row belong to this category.

One of the missing value for ISO-alpha2 codes belongs to Namibia because its abbreviation NA is interpreted by R as a missing value!

The other missing values for ISO-alpha2 and ISO-alpha3 is related to Sark, which is “recognized by the United Nations Statistics Division (UNSD) as a separate territory” but was not accepted by ISO now for more than 20 years (McCarthy 2020). Recently a new application (see PDF) will change that but currently Sark is still waiting for ISO 3166 codes.

Tab “clean”: Recoding columns “LDC”, “LLDC” and “SIDS” with 1 and 0 (1 = yes, belongs to this category, 0 = no, does not belong to this category) reduce most of their missing values. I have also recoded “Namibia” to repair their “NA” value.

Tab “Region”, “Sub-Region” and “Intermediate Region”: One missing value in these regional categories is related to Antarctica which is not seen by the M49 scheme as a separated region. It has therefore no regional codes and names with the exception of the overall comprising global region. But it has M49 as well ISO-alpha codes.

Procedure 2.2 : Cleaning the UNSD M49 data file

To clean the data I have taken the following recoding actions in the script for the “clean” tab in

Remove the global codes and names because they a redundant: All rows have global code “001” (“World”).
Rename the columns to get shorter names.
Remove Antarctica because it is not seen as separate country.
Replace NA in the column ISO-alpha2 Code” of Namibia with the string “NA”.
Recode the columns LDC, LLDC and SIDS with 0 and 1.
Relocate the column “country” (previously “Country or Area”) to the first column because than it easier to find some relevant content
Sort the data alphabetically by “country”.

2.2.3.2.2 Summary

2.3 Glossary

(Some of the abbreviation have at their end an additional “x” that is not part of the abbreviation. I chose this work around to distinguish these abbreviations from the same text chunks in one of the glossary entries. This is a bug in the {glossary} package.)

term	definition
CSV	Text files where the values are separated with commas (Comma Separated Values = CSV). These files have the file extension .csv
GNIx	Gross National Income (GNI) is a measure of a country's income, which includes all the income earned by a country's residents, businesses, and earnings from foreign sources. It is defined as the total amount of money earned by a nation's people and businesses, no matter where it was earned. GNI is an alternative to GDP as a way to measure and track a nation’s wealth, as it calculates income instead of output.
IBRD	The International Bank for Reconstruction and Development (IBRD) is a global development cooperative owned by 189 member countries. As the largest development bank in the world, it supports the World Bank Group’s mission by providing loans, guarantees, risk management products, and advisory services to middle-income and creditworthy low-income countries, as well as by coordinating responses to regional and global challenges. (https://www.worldbank.org/en/who-we-are/ibrd)
IDAx	The International Development Association (IDA) is the part of the World Bank that helps the world’s low-income countries. IDA's grants and low-interest loans help countries invest in their futures, improve lives, and create safer, more prosperous communities around the world. (https://ida.worldbank.org/en/what-is-ida)
LDCx	The term “Least Developed Countries” (LDCs) refers to developing countries listed by the United Nations that exhibit the lowest indicators of socioeconomic development. As of December 2024, the classification applies to 44 countries. See https://unctad.org/topic/least-developed-countries/list
LLDC	Landlocked Developing Countries (LLDCs) are developing nations that do not have direct access to the sea. These countries face significant economic and development challenges due to their geographical isolation and the need to rely on neighboring countries for access to international markets. Of the 32 LLDCs 16 are classified as LDCs (December 2024). See: https://www.un.org/ohrlls/content/about-landlocked-developing-countries
M49	The United Nations publication "Standard Country or Area Codes for Statistical Use" was originally published as Series M, No. 49 and is now commonly referred to as the M49 standard. M49 is a country/areas classification system prepared by the Statistics Division of the United Nations Secretariat primarily for use in its publications and databases.
OMNIKA	OMNIKA DataStore is an open-access data science resource for researchers, authors, and technologists. OMNIKA Foundation is an American 501(c)(3) nonprofit organization that operates a digital mythological library. Almost every culture has relevant mythology that explains where we came from, why things are the way they are, and a number of other things. OMNIKA's goal is to collect, organize, index, and quantify all of those data in one place and make them available for free. (https://omnika.org/info/about)
RDS	The abbreviation “RDS” in file endings `.rds` refers to “R Data Serialized”. It is a format used by the R programming language to serialize and store R objects, such as data frames, lists, and functions, in a compact and portable binary format.
SIDS	Small Island Developing States (SIDS) are a group of developing countries that are small island nations and territories facing similar sustainable development challenges. These countries are particularly vulnerable to environmental and economic shocks due to their small size, limited resources, and remote locations. The aggregate population of all the SIDS is 65 million. See: https://www.un.org/ohrlls/content/about-small-island-developing-states
UNSD	The United Nations Statistics Division (UNSD) is committed to the advancement of the global statistical system. It compiles and disseminates global statistical information, develop standards and norms for statistical activities, and support countries' efforts to strengthen their national statistical systems.
WHR	The World Happiness Reports is a partnership of Gallup, the Oxford Wellbeing Research Centre, the UN Sustainable Development Solutions Network, and the WHR’s Editorial Board. The report is produced under the editorial control of the WHR Editorial Board. The Reports reflects a worldwide demand for more attention to happiness and well-being as criteria for government policy. It reviews the state of happiness in the world today and shows how the science of happiness explains personal and national variations in happiness. (https://worldhappiness.report/about/)

Session Info

Code

sessioninfo::session_info()

#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.3 (2025-02-28)
#>  os       macOS Sequoia 15.3.2
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Vienna
#>  date     2025-04-06
#>  pandoc   3.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
#>  quarto   1.6.42 @ /usr/local/bin/quarto
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  base64enc     0.1-3   2015-07-28 [1] CRAN (R 4.4.1)
#>  bslib         0.9.0   2025-01-30 [1] CRAN (R 4.4.1)
#>  cachem        1.1.0   2024-05-16 [1] CRAN (R 4.4.1)
#>  cli           3.6.4   2025-02-13 [1] CRAN (R 4.4.1)
#>  colorspace    2.1-1   2024-07-26 [1] CRAN (R 4.4.1)
#>  commonmark    1.9.2   2024-10-04 [1] CRAN (R 4.4.1)
#>  crayon        1.5.3   2024-06-20 [1] CRAN (R 4.4.1)
#>  crosstalk     1.2.1   2023-11-23 [1] CRAN (R 4.4.0)
#>  curl          6.2.1   2025-02-19 [1] CRAN (R 4.4.1)
#>  digest        0.6.37  2024-08-19 [1] CRAN (R 4.4.1)
#>  dplyr         1.1.4   2023-11-17 [1] CRAN (R 4.4.0)
#>  DT            0.33    2024-04-04 [1] CRAN (R 4.4.0)
#>  evaluate      1.0.3   2025-01-10 [1] CRAN (R 4.4.1)
#>  fastmap       1.2.0   2024-05-15 [1] CRAN (R 4.4.1)
#>  generics      0.1.3   2022-07-05 [1] CRAN (R 4.4.1)
#>  ggplot2       3.5.1   2024-04-23 [1] CRAN (R 4.4.0)
#>  glossary    * 1.0.0   2023-05-30 [1] CRAN (R 4.4.0)
#>  glue          1.8.0   2024-09-30 [1] CRAN (R 4.4.1)
#>  gtable        0.3.6   2024-10-25 [1] CRAN (R 4.4.1)
#>  here          1.0.1   2020-12-13 [1] CRAN (R 4.4.1)
#>  htmltools     0.5.8.1 2024-04-04 [1] CRAN (R 4.4.1)
#>  htmlwidgets   1.6.4   2023-12-06 [1] CRAN (R 4.4.0)
#>  jquerylib     0.1.4   2021-04-26 [1] CRAN (R 4.4.0)
#>  jsonlite      1.9.1   2025-03-03 [1] CRAN (R 4.4.1)
#>  kableExtra    1.4.0   2024-01-24 [1] CRAN (R 4.4.0)
#>  knitr         1.49    2024-11-08 [1] CRAN (R 4.4.1)
#>  lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.4.1)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.4.1)
#>  markdown      1.13    2024-06-04 [1] CRAN (R 4.4.1)
#>  munsell       0.5.1   2024-04-01 [1] CRAN (R 4.4.1)
#>  pillar        1.10.1  2025-01-07 [1] CRAN (R 4.4.1)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.4.1)
#>  purrr         1.0.4   2025-02-05 [1] CRAN (R 4.4.1)
#>  R6            2.6.1   2025-02-15 [1] CRAN (R 4.4.1)
#>  repr          1.1.7   2024-03-22 [1] CRAN (R 4.4.0)
#>  rlang         1.1.5   2025-01-17 [1] CRAN (R 4.4.1)
#>  rmarkdown     2.29    2024-11-04 [1] CRAN (R 4.4.1)
#>  rprojroot     2.0.4   2023-11-05 [1] CRAN (R 4.4.1)
#>  rstudioapi    0.17.1  2024-10-22 [1] CRAN (R 4.4.1)
#>  rversions     2.1.2   2022-08-31 [1] CRAN (R 4.4.1)
#>  sass          0.4.9   2024-03-15 [1] CRAN (R 4.4.0)
#>  scales        1.3.0   2023-11-28 [1] CRAN (R 4.4.0)
#>  sessioninfo   1.2.3   2025-02-05 [1] CRAN (R 4.4.1)
#>  skimr         2.1.5   2022-12-23 [1] CRAN (R 4.4.0)
#>  stringi       1.8.4   2024-05-06 [1] CRAN (R 4.4.1)
#>  stringr       1.5.1   2023-11-14 [1] CRAN (R 4.4.0)
#>  svglite       2.1.3   2023-12-08 [1] CRAN (R 4.4.0)
#>  systemfonts   1.2.1   2025-01-20 [1] CRAN (R 4.4.1)
#>  tibble        3.2.1   2023-03-20 [1] CRAN (R 4.4.0)
#>  tidyr         1.3.1   2024-01-24 [1] CRAN (R 4.4.1)
#>  tidyselect    1.2.1   2024-03-11 [1] CRAN (R 4.4.0)
#>  vctrs         0.6.5   2023-12-01 [1] CRAN (R 4.4.0)
#>  viridisLite   0.4.2   2023-05-02 [1] CRAN (R 4.4.1)
#>  withr         3.0.2   2024-10-28 [1] CRAN (R 4.4.1)
#>  xfun          0.51    2025-02-19 [1] CRAN (R 4.4.1)
#>  xml2          1.3.7   2025-02-28 [1] CRAN (R 4.4.1)
#>  yaml          2.3.10  2024-07-26 [1] CRAN (R 4.4.1)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library
#>  * ── Packages attached to the search path.
#> 
#> ──────────────────────────────────────────────────────────────────────────────

References

Berroth, Markus. 2019. “Bang Bang - How to Program with Dplyr.” https://www.statworx.com/en/content-hub/blog/bang-bang-how-to-program-with-dplyr/.

McCarthy, Kieren. 2020. “After 20-Year Battle, Channel Island Sark Finally Earns the Right to Exist on the Internet with Its Own Top-Level Domain.” https://www.theregister.com/2020/03/23/sark_cctld_iso/.

Objectives

2.1 Country groupings in WHR

2.1.1 Classification of WHR 2020

2.1.2 class_scheme() function

2.1.3 WHR 2020 with class_scheme() function

2.2 Official classifications

2.2.1 Create data directories

2.2.2 Download classification files

2.2.2.1 World Bank

2.2.2.2 UNSD-M49

2.2.3 Inspect classification files

2.2.3.1 World Bank

2.2.3.1.1 Description of the four tabs

2.2.3.1.2 Summary

2.2.3.2 United Nations

2.2.3.2.1 Descriptions of the UNSD-M49 geoscheme classification

2.2.3.2.2 Summary

2.3 Glossary

Session Info

References

2.1.2 `class_scheme()` function

2.1.3 WHR 2020 with `class_scheme()` function