Country and Regions: Classifications for countries
Understanding the country classification used by my WHR data sources (Section 2.1).
Inspecting different approaches to classify countries by international organizations:
World Bank and
United Nations Statistics Division
Modifying resp. adding country classifications to the WHR data where necessary so that it conforms to the internationally recognized and approved systems.
I aim to compare different aspects of countries. I want, for instance, to know how well Austria is doing compared to other European countries, the other member states of the European Union, or other OECD countries. It is, therefore, vital to have a consistent categorization system with different grouping schemes.
2.1 Country groupings in WHR
Some of the WHR years (WHR reports 2013, 2015, 2020, and 2021) have a regional grouping incorporated. But this grouping is not included in the (new) dataset 2024 from the WHR report 2025.
2.1.1 Classification of WHR 2020
I will display the regional grouping of the WHR data with the example from the year 2020 (WHR report 2021).
R Code 2.1 : WHR data 2020 classification
Listing / Output 2.1: Result of the WHR classification system for WHR data 2020 used in the WHR report 2021.
There are 10 different regional indicators. The datasets for 2013, 2015, 2020 and 2021 use all the same classification scheme with 149 countries in 10 regions.
149 are by far not all countries of the world. Their complete number is about 195 with some insecurities about Holy See (Vatican), the State of Palestine, Taiwan and Kosovo. (Compare: How Many Countries Are There In The World?) The reason for this lower number is simple: For only those 149 countries are subjective well-being data in the study year 2020 available.
2.1.2class_scheme() function
As I am going to list several classification variants it pays the effort to develop a function for the repetitive task.
R Code 2.2 : Function class_scheme() for showing classification schemes
Listing / Output 2.2: Function class_scheme() for showing results of a classification system
Code
class_scheme<-function(df, sel1, sel2){## df = dataframe to show## sel1 = name of the first column (country names) to select## sel2 = name of the column with the regional indicatordf|>dplyr::select(!!sel1, !!sel2)|>dplyr::nest_by(!!sel2)|>dplyr::mutate(data =as.vector(data))|>dplyr::mutate(data =stringr::str_c(data, collapse ="; "))|>dplyr::mutate(data =paste(data, ";"))|>dplyr::mutate(N =lengths(gregexpr(";", data)))|>dplyr::rename(Country =data)|>dplyr::arrange(!!sel2)|>DT::datatable(class ='cell-border compact stripe', options =list( pageLength =25, lengthMenu =c(5, 10, 15, 20, 25, 50)))}
Here I am using complex code lines. Using {dplyr} programming code in functions needs some special consideration. I have learned the details from “Bang Bang – How to program with dplyr” (Berroth 2019).
2.1.3 WHR 2020 with class_scheme() function
As the class_scheme() function is now in place, I can display with this function the different grouping schemes. At first I will try it out with the WHR data from the 2021 report:
There are already different classification systems in place: International organizations (e.g., World Bank, United Nations) have developed them with several grouping variants.
I will look into these two official classifications schemes of World Bank and United Nations and apply the following procedure:
Procedure 2.1 : Understand structure and content of the official classifications schemata
Create a directory for storing the different country classification files (see Section 2.2.1).
Download classification files and store them for faster access as R objects with rds format (see Section 2.2.2.1 and Section 2.2.2.2).
Inspect the data classification files of World Bank (Section 2.2.3.1) and of the United Nations (Section 2.2.3.2) in detail.
2.2.1 Create data directories
R Code 2.4 : Create folders for country classification files
The World Bank Classification can be downloaded from How does the World Bank classify countries?. Near the bottom of the page you can see the line “Download an Excel file of historical classifications by income.”, providing a link with the word “Download”. The downloaded file CLASS.xlsx does not contain a historical classification by income but the general classification system of the last available year (2023).
Yes, there is another Excel file OGHIST.xslx with the historical cutoffs for incomes and lending categories, dating from 1987 to 2023. But the download link for this file is located at another web page: World Bank Country and Lending Groups. On this page you will also find the updates for the cutoffs for countries GNI income per capita which is important for the lending eligibility of countries. World Bank country classifications by income level for 2024-2025 has the current updated values and changes over the last year.
The file CLASS.xlsx I am interested here consists of three sheets.
“List of Economies”
“compositions” and
“Notes”
I will download the original Excel file with all it sheets and save programmatically
the Excel file with all its sheet
CSV snapshots of all sheets (file extension = .csv) and
R objects of all sheets (file extension = .rds)
The CSV snapshots support reproducibility because it stores the proprietary Excel file in in a tool-agnostic, future-proof format. I am using code inspired by the vignette/article readxl Workflows
R Code 2.5 : Download the World Bank CLASS Excel file
Run this code chunk manually if the file still needs to be downloaded.
Code
url_excel="https://datacatalogfiles.worldbank.org/ddh-published/0037712/DR0090755/CLASS.xlsx"path_wb_excel<-base::paste0(here::here(), "/data/country-class/wb/excel/wb-class.xlsx")path_wb_csv<-base::paste0(here::here(), "/data/country-class/wb/csv/")path_wb_rds<-base::paste0(here::here(), "/data/country-class/wb/rds/")## download wb-class file ##############downloader::download( url =url_excel, destfile =path_wb_excel)## from readxl workflow article ################ includes also my_excel_as_csv_and_rds() function path_wb_excel|>readxl::excel_sheets()|>rlang::set_names()|>purrr::map(my_excel_as_csv_and_rds, path_excel =path_wb_excel, path_csv =path_wb_csv, path_rds =path_wb_rds)
2.2.2.2 UNSD-M49
Another more detailed classification system expressively developed for statistical purposes is developed by the United Nations Statistics Division UNSD using the M49 methodology.
The result is called Standard country or area codes for statistical use (M49) and can be downloaded manually in different languages and formats (Copy into the clipboard, Excel or CSV from the Overview page. On the page “Overview” is no URL for an R script available, because triggering one of the buttons copies or downloads the data with the help of Javascript. So I had to download the file manually or to find another location where I could download it programmatically.
R Code 2.7 : Inspect sheet List of Economies of the World Bank classification file
Code
wb_class_economies<-base::readRDS("data/country-class/wb/rds/wb-class-List of economies.rds")glue::glue("******************* Using skimr::skim() ***************************")skimr::skim(wb_class_economies)glue::glue("")glue::glue("****************** Using dplyr::glimpse() *************************")dplyr::glimpse(wb_class_economies)
#> ******************* Using skimr::skim() ***************************
R Code 2.8 : Inspect sheet compositions of the World Bank classification file
Code
wb_class_compositions<-base::readRDS("data/country-class/wb/rds/wb-class-compositions.rds")glue::glue("******************* Using skimr::skim() ***************************")skimr::skim(wb_class_compositions)glue::glue("")glue::glue("****************** Using dplyr::glimpse() *************************")dplyr::glimpse(wb_class_compositions)
#> ******************* Using skimr::skim() ***************************
WB_Group_Name in the “compositions” file contains all available groups. They are not restricted to regional groups because they are formed by economical and political criteria as well. There is no 1:1 match, because almost all countries belong to two or more groups. There are 48 groups with a total of 2085 elements.
R Code 2.11 : Groups formed by regional criteria (without the redundant World region)
Browsing through the composition data I have defined 15 WB_GROUP_CODEs as regional codes. These regional classification criteria results per definition to 15 regions containing 379 countries.
R Code 2.12 : Groups formed by regional criteria (without the redundant World region)
Browsing through the composition data I have declassified all small states for an alternative regional group. These regional classification criteria are smaller and results to 11 regions containing 299 countries.
2.2.3.1.1 Description of the four tabs
WB economies displays the “List of Economies” and has five columns:
Economy with the country names (2-219) and regional names (221-268)
Code with the ISO alpha3 codes for countries (2-219) and for the regional names (221-268)
Region with seven different regional names:
East Asia and Pacific,
Europe and Central Asia,
Latin America & the Caribbean,
Middle East and North Africa,
North America
South Asia and
Sub-Saharan Africa
Income group with four groups: Low income, Lower middle income, Higher middle income, and High income.
Lending category with three groups: IBRD, Blend, and IDA.
WB compositions has four columns: WB_Group_Code, WB_Group_Name, WB_Country_Code, WB_Country_Name. The 2084 rows are combinations of the regional and income group with their ISO alpha 3 codes and country names.
WB Standard shows the World Bank seven standard regional groups with their countries. The 218 countries involved in the taxonomy of the World Bank consists of all member countries of the World Bank (189) and other economies with populations of more than 30,000 (29).
WB All includes the seven regions from the “WB Standard” tab but much more. But it is important to note that there is no alternative regional structure that comprises systematically all countries of the world — the overall category “World” obviously excluded.
Five of the seven regional groups of “WB Standard” are also clustered without high income countries.
There are six other regional subcategories: “Arab World”, “Caribbean small states”, “Central Europe and Baltics”, “Other small states”, “Pacific island small states”, “Small states”.
Additionally there are some political groups like European Union, OECD and
several economical classification like “Euro area”,
different combinations of the four income groups and different combinations of the three lending statuses.
More details
The cut off limits for the income groups are:
low income, $1,145 or less;
lower middle income, $1,146 to $4,515;
upper middle income, $4,516 to $14,005; and
high income, more than $14,005.
The effective operational cutoff for IDA eligibility is $1,335 or less. The three lending categories and their relation to each other are:
IDA countries are those that lack the financial ability to borrow from IBRD. IDA credits are deeply concessional—interest-free loans and grants for programs aimed at boosting economic growth and improving living conditions. IBRD loans are non-concessional. Blend countries are eligible for IDA credits because of their low per capita incomes but are also eligible for IBRD because they are financially creditworthy.
Three additional remark relating to the Notes sheet:
In the Notes I found the sentence: “Geographic classifications in this table cover all income levels.” But there is a difference of one missing data value more in the Income group column compared with the Region column (50:49). The reason is that Venezuela RB is lacking an income group because it has been temporarily unclassified since July 2021 pending release of revised national accounts statistics. Venezuela, RB was classified as an upper-middle income country until FY21, has been unclassified since then due to the unavailability of data. But it is now again classified as Upper middle income (See the World Bank page about Venezuela, RB).
The term country, used interchangeably with economy, does not imply political independence but refers to any territory for which authorities report separate social or economic statistics.
What follows is a quote about some details of the income classifications for the 2023 file:
Set on 1 July 2022 remain in effect until 1 July 2023. Venezuela has been temporarily unclassified since July 2021 pending release of revised national accounts statistics. Argentina, which was temporarily unclassified in July 2016 pending release of revised national accounts statistics, was classified as upper middle income for FY17 as of 29 September 2016 based on alternative conversion factors. Also effective 29 September 2016, Syrian Arab Republic is reclassified from IBRD lending category to IDA-only. On 29 March 2017, new country codes were introduced to align World Bank 3-letter codes with ISO 3-letter codes: Andorra (AND), Dem. Rep. Congo (COD), Isle of Man (IMN), Kosovo (XKX), Romania (ROU), Timor-Leste (TLS), and West Bank and Gaza (PSE). It is to be noted that Venezuela, RB classified as an upper-middle income country until FY21, has been unclassified since then due to the unavailability of data.
2.2.3.1.2 Summary
The only missing data in the columns Economy and Code corresponds to the empty line #220 that separates the country codes from the regional codes. The missing data in the other columns stem from the different structure of the second part (starting with row #221) of the data, which consists only of the two columns ‘Economy’ and ‘Code’.
Essentially this means that we have in the wb-class.xlsx file two different data sets: One for economies and the other one to explicate regional, economical and political grouping codes. In the Excel sheet compositions you will find an extended list of all available group names and their three letter codes combined with the country names and their three letter codes. These group names comprise different kinds of regional groups but also names and codes for different combination of country incomes and lending categories.
All these groups may be of interests for analysis of different trends. But the regional (sub)groups of the compositions sheet do not add up to the complete number of countries (218). This is in contrast to the different regional groups of the WID database because all their regional groups (region1 = 5, region2 = 18, region4 = 10, and region5 = 8 groups) includes all countries (in this case: 216).
The World Bank file wb-class.xlsx classifies all World Bank member countries (189), and all other economies with populations of more than 30,000 (29) in a coarse grid of only seven regions. For operational and analytical purposes, these economies are divided among income groups according to their gross national income (GNI) per capita in 2023, calculated using the World Bank Atlas method.
R Code 2.13 : Inspect UNSD M49 geoscheme classification
Code
unsd_class<-base::readRDS("data/country-class/unsd/rds/unsd_class.rds")glue::glue("******************* Using skimr::skim() ***************************")skimr::skim(unsd_class)glue::glue("")glue::glue("****************** Using dplyr::glimpse() *************************")dplyr::glimpse(unsd_class)
#> ******************* Using skimr::skim() ***************************
R Code 2.14 : Clean UNSD M49 geoscheme classification
Listing / Output 2.3: Script for data cleaning of the unsd_class.rds file as explained in Procedure 2.2
Code
## column renaming vector ########m49_cols=c( region_c ="Region Code", region_n ="Region Name", subr_c ="Sub-region Code", subr_n ="Sub-region Name", midr_c ="Intermediate Region Code", midr_n ="Intermediate Region Name", country ="Country or Area", m49 ="M49 Code", iso2 ="ISO-alpha2 Code", iso3 ="ISO-alpha3 Code", ldc ="Least Developed Countries (LDC)", lldc ="Land Locked Developing Countries (LLDC)", sids ="Small Island Developing States (SIDS)")## clean data ###############################unsd_class<-base::readRDS("data/country-class/unsd/rds/unsd_class.rds")unsd_class_clean<-unsd_class|>dplyr::select(-(1:2))|>dplyr::rename(tidyselect::all_of(m49_cols))|>dplyr::filter(country!="Antarctica")|>dplyr::mutate(iso2 =base::ifelse(country=="Namibia", "NA", iso2))|>dplyr::relocate(country, .before =region_c)|># .x = anonymous function; "x" = value in cols of unsd_classdplyr::mutate(dplyr::across(ldc:sids, ~dplyr::if_else(.x=="x", "1", "999", "0")))|>dplyr::arrange(country)## save new tibble ##########my_save_data_file("country-class/unsd/rds",unsd_class_clean,"unsd_class_clean.rds")## prepare skimmers ##########my_skim<-skimr::skim_with( character =skimr::sfl( whitespace =NULL, min =NULL, max =NULL, empty =NULL))## display results ##########unsd_class<-base::readRDS("data/country-class/unsd/rds/unsd_class.rds")glue::glue("******************* Using skimr::skim() ***************************")my_skim(unsd_class_clean)|>dplyr::select(-complete_rate)glue::glue("")glue::glue("****************** Using dplyr::glimpse() *************************")dplyr::glimpse(unsd_class_clean)
#> ******************* Using skimr::skim() ***************************
Tab “raw”: The raw file unsd_class has 15 columns as you can also see online from the Overview page. The many missing values (NAs) for the categories LDC, LLDC and SIDS are easy explained: These three columns are coded with an ‘x’ if the country of this row belong to this category.
One of the missing value for ISO-alpha2 codes belongs to Namibia because its abbreviation NA is interpreted by R as a missing value!
The other missing values for ISO-alpha2 and ISO-alpha3 is related to Sark, which is “recognized by the United Nations Statistics Division (UNSD) as a separate territory” but was not accepted by ISO now for more than 20 years (McCarthy 2020). Recently a new application (see PDF) will change that but currently Sark is still waiting for ISO 3166 codes.
Tab “clean”: Recoding columns “LDC”, “LLDC” and “SIDS” with 1 and 0 (1 = yes, belongs to this category, 0 = no, does not belong to this category) reduce most of their missing values. I have also recoded “Namibia” to repair their “NA” value.
Tab “Region”, “Sub-Region” and “Intermediate Region”: One missing value in these regional categories is related to Antarctica which is not seen by the M49 scheme as a separated region. It has therefore no regional codes and names with the exception of the overall comprising global region. But it has M49 as well ISO-alpha codes.
Procedure 2.2 : Cleaning the UNSD M49 data file
To clean the data I have taken the following recoding actions in the script for the “clean” tab in
Remove the global codes and names because they a redundant: All rows have global code “001” (“World”).
Rename the columns to get shorter names.
Remove Antarctica because it is not seen as separate country.
Replace NA in the column ISO-alpha2 Code” of Namibia with the string “NA”.
Recode the columns LDC, LLDC and SIDS with 0 and 1.
Relocate the column “country” (previously “Country or Area”) to the first column because than it easier to find some relevant content
Sort the data alphabetically by “country”.
2.2.3.2.2 Summary
2.3 Glossary
(Some of the abbreviation have at their end an additional “x” that is not part of the abbreviation. I chose this work around to distinguish these abbreviations from the same text chunks in one of the glossary entries. This is a bug in the {glossary} package.)
term
definition
CSV
Text files where the values are separated with commas (Comma Separated Values = CSV). These files have the file extension .csv
GNIx
Gross National Income (GNI) is a measure of a country's income, which includes all the income earned by a country's residents, businesses, and earnings from foreign sources. It is defined as the total amount of money earned by a nation's people and businesses, no matter where it was earned. GNI is an alternative to GDP as a way to measure and track a nation’s wealth, as it calculates income instead of output.
IBRD
The International Bank for Reconstruction and Development (IBRD) is a global development cooperative owned by 189 member countries. As the largest development bank in the world, it supports the World Bank Group’s mission by providing loans, guarantees, risk management products, and advisory services to middle-income and creditworthy low-income countries, as well as by coordinating responses to regional and global challenges. (https://www.worldbank.org/en/who-we-are/ibrd)
IDAx
The International Development Association (IDA) is the part of the World Bank that helps the world’s low-income countries. IDA's grants and low-interest loans help countries invest in their futures, improve lives, and create safer, more prosperous communities around the world. (https://ida.worldbank.org/en/what-is-ida)
LDCx
The term “Least Developed Countries” (LDCs) refers to developing countries listed by the United Nations that exhibit the lowest indicators of socioeconomic development. As of December 2024, the classification applies to 44 countries. See https://unctad.org/topic/least-developed-countries/list
LLDC
Landlocked Developing Countries (LLDCs) are developing nations that do not have direct access to the sea. These countries face significant economic and development challenges due to their geographical isolation and the need to rely on neighboring countries for access to international markets. Of the 32 LLDCs 16 are classified as LDCs (December 2024). See: https://www.un.org/ohrlls/content/about-landlocked-developing-countries
M49
The United Nations publication "Standard Country or Area Codes for Statistical Use" was originally published as Series M, No. 49 and is now commonly referred to as the M49 standard. M49 is a country/areas classification system prepared by the Statistics Division of the United Nations Secretariat primarily for use in its publications and databases.
OMNIKA
OMNIKA DataStore is an open-access data science resource for researchers, authors, and technologists. OMNIKA Foundation is an American 501(c)(3) nonprofit organization that operates a digital mythological library. Almost every culture has relevant mythology that explains where we came from, why things are the way they are, and a number of other things. OMNIKA's goal is to collect, organize, index, and quantify all of those data in one place and make them available for free. (https://omnika.org/info/about)
RDS
The abbreviation “RDS” in file endings `.rds` refers to “R Data Serialized”. It is a format used by the R programming language to serialize and store R objects, such as data frames, lists, and functions, in a compact and portable binary format.
SIDS
Small Island Developing States (SIDS) are a group of developing countries that are small island nations and territories facing similar sustainable development challenges. These countries are particularly vulnerable to environmental and economic shocks due to their small size, limited resources, and remote locations. The aggregate population of all the SIDS is 65 million. See: https://www.un.org/ohrlls/content/about-small-island-developing-states
UNSD
The United Nations Statistics Division (UNSD) is committed to the advancement of the global statistical system. It compiles and disseminates global statistical information, develop standards and norms for statistical activities, and support countries' efforts to strengthen their national statistical systems.
WHR
The World Happiness Reports is a partnership of Gallup, the Oxford Wellbeing Research Centre, the UN Sustainable Development Solutions Network, and the WHR’s Editorial Board. The report is produced under the editorial control of the WHR Editorial Board. The Reports reflects a worldwide demand for more attention to happiness and well-being as criteria for government policy. It reviews the state of happiness in the world today and shows how the science of happiness explains personal and national variations in happiness. (https://worldhappiness.report/about/)
---execute: cache: false---# Regions and their Countries {#sec-02-countries-in-regions}```{r}#| label: setup#| results: hold#| include: falsebase::source(file =paste0(here::here(), "/R/helper.R"))ggplot2::theme_set(ggplot2::theme_bw())```## Objectives {.unnumbered}::::: my-objectives::: my-objectives-headerCountry and Regions: Classifications for countries:::::: my-objectives-container1. Understanding the country classification used by my `r glossary("WHR")` data sources (@sec-02-whr-classification).2. Inspecting different approaches to classify countries by international organizations: - World Bank and - United Nations Statistics Division3. Modifying resp. adding country classifications to the WHR data where necessary so that it conforms to the internationally recognized and approved systems.I aim to compare different aspects of countries. I want, for instance, to know how well Austria is doingcompared to other European countries, the other member states of theEuropean Union, or other OECD countries. It is, therefore, vital to havea consistent categorization system with different grouping schemes.::::::::## Country groupings in WHR {#sec-02-whr-classification}Some of the WHR years (WHR reports 2013, 2015, 2020, and 2021) have a regionalgrouping incorporated. But this grouping is not included in the (new) dataset 2024 from the WHR report 2025.### Classification of WHR 2020I will display the regional grouping of the WHR data with the example from the year 2020 (WHR report 2021).:::::{.my-r-code}:::{.my-r-code-header}:::::: {#cnj-02-whr-classification-2020}: WHR data 2020 classification:::::::::::::{.my-r-code-container}::: {#lst-whr-classification-2020}```{r}#| label: whr-classification-2020( df_dt_whr_2020 <- base::readRDS("data/whr/raw/whr_raw_2021.rds") |> dplyr::select(`Country name`, `Regional indicator`) |> dplyr::nest_by(`Regional indicator`) |> dplyr::mutate(data =as.vector(data)) |> dplyr::mutate(data = stringr::str_c(data, collapse ="; ")) |> dplyr::mutate(data =paste(data, ";")) |> dplyr::mutate(N =lengths(gregexpr(";", data))) |> dplyr::rename(Country = data) |> DT::datatable(class ='cell-border compact stripe', options =list(pageLength =25,lengthMenu =c(5, 10, 15, 20, 25, 50) ) ))```Result of the WHR classification system for WHR data 2020 used in the WHR report 2021.:::------------------------------------------------------------------------There are `r base::length(df_dt_whr_2020$x$data$N)` different regional indicators. The datasets for 2013, 2015, 2020 and 2021 use all the sameclassification scheme with **`r base::sum(df_dt_whr_2020$x$data$N)` countries in`r base::length(df_dt_whr_2020$x$data$N)` regions**.`r sum(df_dt_whr_2020$x$data$N)` are by far not all countries of the world. Their complete number is about 195 with some insecurities about Holy See (Vatican), the State of Palestine, Taiwan and Kosovo. (Compare: [How Many Countries Are There In The World?](https://www.worldatlas.com/geography/how-many-countries-are-there-in-the-world.html)) The reason for this lower number is simple: For only those `r sum(df_dt_whr_2020$x$data$N)` countries are subjective well-being data in the study year 2020 available.:::::::::### `class_scheme()` functionAs I am going to list several classification variants it pays the effort to develop a function for the repetitive task.::::::: my-r-code:::: my-r-code-header::: {#cnj-show-class-scheme}: Function `class_scheme()` for showing classification schemes::::::::::: my-r-code-container::: {#lst-show-class-scheme}```{r}#| label: function-class-scheme#| code-fold: showclass_scheme <-function(df, sel1, sel2) {## df = dataframe to show## sel1 = name of the first column (country names) to select## sel2 = name of the column with the regional indicator df |> dplyr::select(!!sel1, !!sel2) |> dplyr::nest_by(!!sel2) |> dplyr::mutate(data =as.vector(data)) |> dplyr::mutate(data = stringr::str_c(data, collapse ="; ")) |> dplyr::mutate(data =paste(data, ";")) |> dplyr::mutate(N =lengths(gregexpr(";", data))) |> dplyr::rename(Country = data) |> dplyr::arrange(!!sel2) |> DT::datatable(class ='cell-border compact stripe', options =list(pageLength =25,lengthMenu =c(5, 10, 15, 20, 25, 50) ) )}```Function `class_scheme()` for showing results of a classification system:::Here I am using complex code lines. Using {**dplyr**} programming code in functions needs some specialconsideration. I have learned the details from "Bang Bang – How toprogram with dplyr" [@berroth-2019].:::::::::::### WHR 2020 with `class_scheme()` functionAs the `class_scheme()` function is now in place, I can display with this function the different groupingschemes. At first I will try it out with the WHR data from the 2021 report::::::{.my-r-code}:::{.my-r-code-header}:::::: {#cnj-02-whr-grouping-schema}: Classification of the WHR data:::::::::::::{.my-r-code-container}```{r}#| label: whr-grouping-schemadf_whr <- base::readRDS(paste0(here::here(), "/data/whr/raw/whr_raw_2021.rds"))( whr_class <-class_scheme(df = df_whr,sel1 = rlang::quo(`Country name`),sel2 = rlang::quo(`Regional indicator`) ))```:::::::::It worked! I got the same result as in @lst-whr-classification-2020.## Official classificationsThere are already different classification systems in place:International organizations (e.g., [WorldBank](https://datahelpdesk.worldbank.org/knowledgebase/articles/906519),[United Nations](https://unstats.un.org/unsd/methodology/m49/)) havedeveloped them with several grouping variants.I will look into these two official classifications schemes of World Bank and United Nations and apply the following procedure::::::{.my-procedure}:::{.my-procedure-header}:::::: {#prp-country-class}: Understand structure and content of the official classifications schemata:::::::::::::{.my-procedure-container}1. Create a directory for storing the different country classification files (see @sec-02-create-data-dirs).2. Download classification files and store them for faster access as R objects with `r glossary("rds")` format (see @sec-02-wb-download and @sec-02-unsd-download).3. Inspect the data classification files of World Bank (@sec-02-inspect-wb) and of the United Nations (@sec-02-inspect-unsd) in detail.:::::::::### Create data directories {#sec-02-create-data-dirs}:::::{.my-r-code}:::{.my-r-code-header}:::::: {#cnj-create-class-dirs}: Create folders for country classification files:::::::::::::{.my-r-code-container}```{r}#| label: create-class-dirs#| code-fold: showmy_create_folder(base::paste0(here::here(), "/data/"))my_create_folder(base::paste0(here::here(), "/data/country-class"))my_create_folder(base::paste0(here::here(), "/data/country-class/wb"))my_create_folder(base::paste0(here::here(), "/data/country-class/unsd"))my_create_folder(base::paste0(here::here(), "/data/country-class/wb/excel"))my_create_folder(base::paste0(here::here(), "/data/country-class/wb/csv"))my_create_folder(base::paste0(here::here(), "/data/country-class/wb/rds"))my_create_folder(base::paste0(here::here(), "/data/country-class/unsd/excel"))my_create_folder(base::paste0(here::here(), "/data/country-class/unsd/csv"))my_create_folder(base::paste0(here::here(), "/data/country-class/unsd/rds"))```<center>(*For this R code chunk is no output available*)</center>:::::::::### Download classification files {#sec-02-download-class-files}#### World Bank {#sec-02-wb-download}The World Bank Classification can be downloaded from [How does the World Bank classify countries?](https://datahelpdesk.worldbank.org/knowledgebase/articles/378834-how-does-the-world-bank-classify-countries). Near the bottom of the page you can see the line "Download an Excel file of historical classifications by income.", providing a link with the word "Download". The downloaded file `CLASS.xlsx` does *not* contain a historical classification by income but the general classification system of the last available year (2023). Yes, there is another Excel file `OGHIST.xslx` with the historical cutoffsfor incomes and lending categories, dating from 1987 to 2023. But the download link for this file is located at another web page: [World Bank Country and Lending Groups](https://datahelpdesk.worldbank.org/knowledgebase/articles/906519). On this page you will also find the updates for the cutoffs for countries `r glossary("GNIx", "GNI")` income per capita which is important for the lending eligibility of countries. [World Bank country classifications by income level for2024-2025](https://blogs.worldbank.org/en/opendata/world-bank-country-classifications-by-income-level-for-2024-2025) has the current updated values and changes over the last year.The file `CLASS.xlsx` I am interested here consists of three sheets. - 1. "List of Economies" - 2. "compositions" and - 3. "Notes"I will download the original Excel file with all it sheets and save programmatically - the Excel file with all its sheet - CSV snapshots of all sheets (file extension = `.csv`) and - R objects of all sheets (file extension = `.rds`)The CSV snapshots support reproducibility because it stores the proprietary Excel file in in a tool-agnostic, future-proof format. I am using code inspired by the vignette/article [readxl Workflows](https://readxl.tidyverse.org/articles/readxl-workflows.html):::::{.my-r-code}:::{.my-r-code-header}:::::: {#cnj-02-wb-class}: Download the World Bank CLASS Excel file:::::::::::::{.my-r-code-container}<center>**Run this code chunk manually if the file still needs to be downloaded.**</center>```{r}#| label: wb-class#| code-fold: show#| eval: falseurl_excel ="https://datacatalogfiles.worldbank.org/ddh-published/0037712/DR0090755/CLASS.xlsx"path_wb_excel <- base::paste0(here::here(), "/data/country-class/wb/excel/wb-class.xlsx")path_wb_csv <- base::paste0(here::here(), "/data/country-class/wb/csv/")path_wb_rds <- base::paste0(here::here(), "/data/country-class/wb/rds/")## download wb-class file ##############downloader::download(url = url_excel,destfile = path_wb_excel)## from readxl workflow article ################ includes also my_excel_as_csv_and_rds() function path_wb_excel |> readxl::excel_sheets() |> rlang::set_names() |> purrr::map(my_excel_as_csv_and_rds, path_excel = path_wb_excel, path_csv = path_wb_csv,path_rds = path_wb_rds ) ```:::::::::#### UNSD-M49 {#sec-02-unsd-download}Another more detailed classification system expressively developed forstatistical purposes is developed by the United Nations StatisticsDivision `r glossary("UNSD")` using the `r glossary("M49")` methodology.The result is called [Standard country or area codes for statistical use(M49)](https://unstats.un.org/unsd/methodology/m49/) and can bedownloaded manually in different languages and formats (Copy into theclipboard, Excel or `r glossary("CSV")` from the [Overviewpage](https://unstats.un.org/unsd/methodology/m49/overview/). On the page "Overview" isno URL for an R script available, because triggering one of the buttonscopies or downloads the data with the help of Javascript. So I had to download the file manually or to find another location where I could download it programmatically. I found with the `r glossary("OMNIKA")` DataStore an [external source for the UNSD-M49 country classification](https://github.com/omnika-datastore/unsd-m49-standard-area-codes). For security reason I checked the two files with `base::all.equal()` to determine if those two files are identical. Yes, they are!The UNSD M40 standard area codes are stored as Excel and CSV files. I download for reproducibility reason the CSV file.:::::{.my-r-code}:::{.my-r-code-header}:::::: {#cnj-02-UNSD-M49}: Download the UNSD-M49 CSV file and create an R object (".rds"):::::::::::::{.my-r-code-container}<center>**Run this code chunk manually if the file still needs to be downloaded.**</center>```{r}#| label: unsd-class#| code-fold: show#| eval: false## download unsd-m49 file ############url_unsd_csv <-"https://github.com/omnika-datastore/unsd-m49-standard-area-codes/raw/refs/heads/main/2022-09-24__CSV_UNSD_M49.csv"path_unsd_csv <- base::paste0(here::here(), "/data/country-class/unsd/csv/2022-09-24__CSV_UNSD_M49.csv")downloader::download(url = url_unsd_csv,destfile = path_unsd_csv)## create R object ###############unsd_class <- readr::read_delim(file = path_unsd_csv,delim =";" )## save as .rds file ################my_save_data_file("country-class/unsd/rds", unsd_class, "unsd_class.rds")```<center>(*For this R code chunk is no output available*)</center>:::::::::### Inspect classification files {#sec-02-inspect-class-files}To get an detailed understanding of the data structures I will provide the following outputs:1. A summary statistics with `skimr::skim()` followed by inspection of the first data with `dplyr::glimpse()`.2. Several detailed outputs of the classifications categories (regions) and their elements (countries) in different code chunks (tabs).#### World Bank {#sec-02-inspect-wb}::: {.my-code-collection}:::: {.my-code-collection-header}::::: {.my-code-collection-icon}::::::::::: {#exm-02-inspect-wb-class-files}: Inspect the structure of the World Bank classification::::::::::::::{.my-code-collection-container}::: {.panel-tabset}###### WB `economies`:::::{.my-r-code}:::{.my-r-code-header}:::::: {#cnj-02-inspect-wb-sheet1}: Inspect sheet `List of Economies` of the World Bank classification file:::::::::::::{.my-r-code-container}```{r}#| label: inspect-wb-sheet1#| results: holdwb_class_economies <- base::readRDS("data/country-class/wb/rds/wb-class-List of economies.rds")glue::glue("******************* Using skimr::skim() ***************************")skimr::skim(wb_class_economies)glue::glue("")glue::glue("****************** Using dplyr::glimpse() *************************")dplyr::glimpse(wb_class_economies)```:::::::::###### WB `compositions`:::::{.my-r-code}:::{.my-r-code-header}:::::: {#cnj-02-inspect-wb-sheet2}: Inspect sheet `compositions` of the World Bank classification file:::::::::::::{.my-r-code-container}```{r}#| label: inspect-wb-sheet2#| results: holdwb_class_compositions <- base::readRDS("data/country-class/wb/rds/wb-class-compositions.rds")glue::glue("******************* Using skimr::skim() ***************************")skimr::skim(wb_class_compositions)glue::glue("")glue::glue("****************** Using dplyr::glimpse() *************************")dplyr::glimpse(wb_class_compositions)```:::::::::###### WB standard:::::{.my-r-code}:::{.my-r-code-header}:::::: {#cnj-02-wb-class-standard}: Pre-defined standard categorization:::::::::::::{.my-r-code-container}```{r}#| label: wb-class-standarddf_wb_standard <- base::readRDS("data/country-class/wb/rds/wb-class-List of economies.rds") |> dplyr::slice(1:218)( wb_class_standard <-class_scheme(df = df_wb_standard,sel1 = rlang::quo(`Economy`),sel2 = rlang::quo(`Region`) ))```***`Region` is a coarse classification scheme with only **`r length(wb_class_standard$x$data$N)` regions formed by `r sum(wb_class_standard$x$data$N)` countries**.:::::::::###### WB All:::::{.my-r-code}:::{.my-r-code-header}:::::: {#cnj-02-wb-class-all}: All provided groups, regional, economical and political:::::::::::::{.my-r-code-container}```{r}#| label: wb-class-alldf_wb_all <- base::readRDS("data/country-class/wb/rds/wb-class-compositions.rds")( wb_class_all <-class_scheme(df = df_wb_all,sel1 = rlang::quo(`WB_Country_Name`),sel2 = rlang::quo(`WB_Group_Name`) ))```***`WB_Group_Name` in the "compositions" file contains all available groups. They are not restricted to regional groups because they are formed by economical and political criteria as well. There is no 1:1 match, because almost all countries belong to two or more groups. There are **`r length(wb_class_all$x$data$N)` groups with a total of `r sum(wb_class_all$x$data$N)` elements**.:::::::::###### Region1:::::{.my-r-code}:::{.my-r-code-header}:::::: {#cnj-02-wb-class-regional}: Groups formed by regional criteria (without the redundant `World` region):::::::::::::{.my-r-code-container}```{r}#| label: wb-class-regionalstr_reg <-c("AFE", "AFW", "ARB", "CSS", "CEB","EAS", "ECS", "LCN", "MEA", "NAC","OSS", "PSS", "SST", "SAS", "SSF")df_wb_reg <- base::readRDS("data/country-class/wb/rds/wb-class-compositions.rds") |> dplyr::filter(WB_Group_Code %in% str_reg)( wb_class_reg <-class_scheme(df = df_wb_reg,sel1 = rlang::quo(`WB_Country_Name`),sel2 = rlang::quo(`WB_Group_Name`) ))```***Browsing through the `composition` data I have defined 15 `WB_GROUP_CODE`s as regional codes. These regional classification criteria results per definition to **`r length(wb_class_reg$x$data$N)` regions containing `r sum(wb_class_reg$x$data$N)` countries**.:::::::::###### Region2:::::{.my-r-code}:::{.my-r-code-header}:::::: {#cnj-02-wb-class-regional}: Groups formed by regional criteria (without the redundant `World` region):::::::::::::{.my-r-code-container}```{r}#| label: wb-class-regional2str_reg2 <-c("AFE", "AFW", "ARB", "CEB","EAS", "ECS", "LCN", "MEA", "NAC","SAS", "SSF")df_wb_reg2 <- base::readRDS("data/country-class/wb/rds/wb-class-compositions.rds") |> dplyr::filter(WB_Group_Code %in% str_reg2)( wb_class_reg2 <-class_scheme(df = df_wb_reg2,sel1 = rlang::quo(`WB_Country_Name`),sel2 = rlang::quo(`WB_Group_Name`) ))```***Browsing through the `composition` data I have declassified all small states for an alternative regional group. These regional classification criteria are smaller and results to **`r length(wb_class_reg2$x$data$N)` regions containing `r sum(wb_class_reg2$x$data$N)` countries**.:::::::::::::::::::::##### Description of the four tabs1. **WB economies** displays the "List of Economies" and has five columns: - `Economy` with the country names (2-219) and regional names (221-268) - `Code` with the ISO alpha3 codes for countries (2-219) and for the regional names (221-268) - `Region` with seven different regional names: - East Asia and Pacific, - Europe and Central Asia, - Latin America & the Caribbean, - Middle East and North Africa, - North America - South Asia and - Sub-Saharan Africa - `Income group` with four groups: Low income, Lower middle income, Higher middle income, and High income. - `Lending category` with three groups: `IBRD`, `Blend`, and`IDA`.2. **WB compositions** has four columns: `WB_Group_Code`, `WB_Group_Name`,`WB_Country_Code`, `WB_Country_Name`. The 2084 rows are combinations of the regional and income group with their ISO alpha 3 codes and country names.3. **WB Standard** shows the World Bank seven standard regional groups with their countries. The 218 countries involved in the taxonomy of the World Bank consists of all member countries of the World Bank (189) and other economies with populations of more than 30,000 (29).4. **WB All** includes the seven regions from the "WB Standard" tab but much more. But it is important to note that there is no alternative *regional* structure that comprises systematically all countries of the world --- the overall category "World" obviously excluded. - Five of the seven regional groups of "WB Standard" are also clustered without high income countries. - There are six other regional subcategories: "Arab World", "Caribbean small states", "Central Europe and Baltics", "Other small states", "Pacific island small states", "Small states". - Additionally there are some political groups like European Union, OECD and - several economical classification like "Euro area", - different combinations of the four income groups and different combinations of the three lending statuses.**More details**The cut off limits for the income groups are: - low income, $1,145 or less; - lower middle income, $1,146 to $4,515; - upper middle income, $4,516 to $14,005; and - high income, more than $14,005. The effective operational cutoff for `r glossary("IDAx", "IDA")` eligibility is $1,335 or less. The three lending categories and their relation to each other are:> `r glossary("IDAx", "IDA")` countries are those that lack the financial ability to borrow from `r glossary("IBRD")`. IDA credits are deeply concessional—interest-free loans and grants for programs aimed at boosting economic growth and improving living conditions. IBRD loans are non-concessional. `Blend` countries are eligible for IDA credits because of their low per capita incomes but are also eligible for IBRD because they are financially creditworthy.Three additional remark relating to the `Notes` sheet: 1. In the `Notes` I found the sentence: "Geographic classifications in this table cover all income levels." But there is a difference of one missing data value more in the `Income group` column compared with the `Region` column (50:49). The reason is that `Venezuela RB` is lacking an income group because it has been temporarily unclassified since July 2021 pending release of revised national accounts statistics. Venezuela, RB was classified as an upper-middle income country until FY21, has been unclassified since then due to the unavailability of data. But it is now again classified as `Upper middle income` (See the World Bank [page about Venezuela, RB](https://archive.doingbusiness.org/en/data/exploreeconomies/venezuela)).2. The term country, used interchangeably with economy, does not imply political independence but refers to any territory for which authorities report separate social or economic statistics. 3. What follows is a quote about some details of the income classifications for the 2023 file:> Set on 1 July 2022 remain in effect until 1 July 2023. Venezuela has been temporarily unclassified since July 2021 pending release of revised national accounts statistics. Argentina, which was temporarily unclassified in July 2016 pending release of revised national accounts statistics, was classified as upper middle income for FY17 as of 29 September 2016 based on alternative conversion factors. Also effective 29 September 2016, Syrian Arab Republic is reclassified from IBRD lending category to IDA-only. On 29 March 2017, new country codes were introduced to align World Bank 3-letter codes with ISO 3-letter codes: Andorra (AND), Dem. Rep. Congo (COD), Isle of Man (IMN), Kosovo (XKX), Romania (ROU), Timor-Leste (TLS), and West Bank and Gaza (PSE). It is to be noted that Venezuela, RB classified as an upper-middle income country until FY21, has been unclassified since then due to the unavailability of data.##### Summary {#sec-02-wb-summary}The only missing data in the columns `Economy` and `Code` corresponds to the empty line #220 that separates the country codes from the regional codes. The missing data in the other columns stem from the different structure of the second part (starting with row #221) of the data, which consists only of the two columns 'Economy' and 'Code'. Essentially this means that we have in the `wb-class.xlsx` file two different data sets: One for economies and the other one to explicate regional, economical and political grouping codes. In the Excel sheet `compositions` you will find an extended list of all available group names and their three letter codes combined with the country names and their three letter codes. These group names comprise different kinds of regional groups but also names and codes for different combination of country incomes and lending categories.All these groups may be of interests for analysis of different trends. But the regional (sub)groups of the `compositions` sheet do not add up to the complete number of countries (218). This is in contrast to the different regional groups of the WID database because all their regional groups (region1 = 5, region2 = 18, region4 = 10, and region5 = 8 groups) includes all countries (in this case: 216).The World Bank file `wb-class.xlsx` classifies all World Bank member countries (189), and all other economies with populations of more than 30,000 (29) in a coarse grid of only seven regions. For operational and analytical purposes, these economies are divided among income groups according to their [gross national income (GNI) per capita](https://datahelpdesk.worldbank.org/knowledgebase/articles/378831-why-use-gni-per-capita-to-classify-economies-into) in 2023, calculated using the [World Bank Atlas method](https://datahelpdesk.worldbank.org/knowledgebase/articles/378832-what-is-the-world-bank-atlas-method). #### United Nations {#sec-02-inspect-unsd}::: {.my-code-collection}:::: {.my-code-collection-header}::::: {.my-code-collection-icon}::::::::::: {#exm-02-inspect-unsd}: Inspect UNSD-M49 geoscheme classification::::::::::::::{.my-code-collection-container}::: {.panel-tabset}###### raw:::::{.my-r-code}:::{.my-r-code-header}:::::: {#cnj-02-inspect-unsd-m49}: Inspect UNSD M49 geoscheme classification:::::::::::::{.my-r-code-container}```{r}#| label: inspect-unsd-m49#| results: holdunsd_class <- base::readRDS("data/country-class/unsd/rds/unsd_class.rds")glue::glue("******************* Using skimr::skim() ***************************")skimr::skim(unsd_class)glue::glue("")glue::glue("****************** Using dplyr::glimpse() *************************")dplyr::glimpse(unsd_class)```:::::::::###### clean :::::{.my-r-code}:::{.my-r-code-header}:::::: {#cnj-02-clean-unsd-m49}: Clean UNSD M49 geoscheme classification:::::::::::::{.my-r-code-container}:::::{#lst-02-clean-unsd-m49}```{r}#| label: clean-unsd-49#| results: hold## column renaming vector ########m49_cols =c(region_c ="Region Code", region_n ="Region Name",subr_c ="Sub-region Code", subr_n ="Sub-region Name", midr_c ="Intermediate Region Code", midr_n ="Intermediate Region Name",country ="Country or Area", m49 ="M49 Code", iso2 ="ISO-alpha2 Code", iso3 ="ISO-alpha3 Code",ldc ="Least Developed Countries (LDC)", lldc ="Land Locked Developing Countries (LLDC)", sids ="Small Island Developing States (SIDS)" )## clean data ###############################unsd_class <- base::readRDS("data/country-class/unsd/rds/unsd_class.rds")unsd_class_clean <- unsd_class |> dplyr::select(-(1:2)) |> dplyr::rename(tidyselect::all_of(m49_cols)) |> dplyr::filter(country !="Antarctica") |> dplyr::mutate(iso2 = base::ifelse(country =="Namibia", "NA", iso2)) |> dplyr::relocate(country, .before = region_c) |># .x = anonymous function; "x" = value in cols of unsd_class dplyr::mutate(dplyr::across( ldc:sids, ~ dplyr::if_else(.x =="x", "1", "999", "0") )) |> dplyr::arrange(country)## save new tibble ##########my_save_data_file("country-class/unsd/rds", unsd_class_clean,"unsd_class_clean.rds")## prepare skimmers ##########my_skim <- skimr::skim_with(character = skimr::sfl(whitespace =NULL,min =NULL,max =NULL,empty =NULL ))## display results ##########unsd_class <- base::readRDS("data/country-class/unsd/rds/unsd_class.rds")glue::glue("******************* Using skimr::skim() ***************************")my_skim(unsd_class_clean) |> dplyr::select(-complete_rate)glue::glue("")glue::glue("****************** Using dplyr::glimpse() *************************")dplyr::glimpse(unsd_class_clean)```Script for data cleaning of the `unsd_class.rds` file as explained in @prp-02-clean-unsd-m49-data::::::::::::::###### Region:::::{.my-r-code}:::{.my-r-code-header}:::::: {#cnj-02-ID-text}: Display regions of UNSD class scheme:::::::::::::{.my-r-code-container}```{r}#| label: unsd-class1df_unsd <- base::readRDS("data/country-class/unsd/rds/unsd_class_clean.rds")( unsd_class1 <-class_scheme(df = df_unsd,sel1 = rlang::quo(`country`),sel2 = rlang::quo(`region_n`) ))```:::::::::###### Sub-region:::::{.my-r-code}:::{.my-r-code-header}:::::: {#cnj-02-ID-text}: Display sub-regions of UNSD class scheme:::::::::::::{.my-r-code-container}```{r}#| label: unsd-class2df_unsd <- base::readRDS("data/country-class/unsd/rds/unsd_class_clean.rds")( unsd_class2 <-class_scheme(df = df_unsd,sel1 = rlang::quo(`country`),sel2 = rlang::quo(`subr_n`) ))```:::::::::###### Intermediate:::::{.my-r-code}:::{.my-r-code-header}:::::: {#cnj-ID-text}: Display intermediate regions of UNSD class scheme:::::::::::::{.my-r-code-container}```{r}#| label: unsd-class3df_unsd <- base::readRDS("data/country-class/unsd/rds/unsd_class_clean.rds")( unsd_class3 <-class_scheme(df = df_unsd,sel1 = rlang::quo(`country`),sel2 = rlang::quo(`midr_n`) ))```:::::::::###### Intermediate2:::::{.my-r-code}:::{.my-r-code-header}:::::: {#cnj-02-ID-text}: Display alternative intermediate regions of UNSD class scheme:::::::::::::{.my-r-code-container}```{r}#| label: unsd-class4unsd_class4 <- base::readRDS("data/country-class/unsd/rds/unsd_class_clean.rds")unsd_class4 <- unsd_class4 |> dplyr::mutate(midr_n2 = base::ifelse(is.na(midr_n), subr_n, midr_n) )( unsd_class4 <-class_scheme(df = unsd_class4,sel1 = rlang::quo(`country`),sel2 = rlang::quo(`midr_n2`) )) ````midr_n2` is a classification scheme with **`r sum(unsd_class4$x$data$N)`countries in `r length(unsd_class4$x$data$N)` regions**.:::::::::::::::::::::##### Descriptions of the UNSD-M49 geoscheme classificationWhat follows is a description if the tabs in @exm-02-inspect-unsd.**Tab "raw"**: The raw file `unsd_class` has 15 columns as you can also see online from the [Overviewpage](https://unstats.un.org/unsd/methodology/m49/overview/). The many missing values (`NAs`) for the categories `r glossary("LDCx", "LDC")`, `r glossary("LLDC")` and `r glossary("SIDS")` are easy explained: These three columns are coded with an 'x' if the country of this row belong to this category. One of the missing value for ISO-alpha2 codes belongs to Namibia because its abbreviation `NA` is interpreted by R as a missing value!The other missing values for ISO-alpha2 and ISO-alpha3 is related to [Sark](https://www.sark.co.uk/), which is "recognized by the United Nations Statistics Division (UNSD) as a separate territory" but was not accepted by ISO now for more than 20 years [@mccarthy-2020]. Recently a new application (see [PDF](https://www.sarkid.org/assets/pdf/SarkID%20Identity%20info%20v1_2.pdf)) will change that but currently Sark is still waiting for [ISO 3166 codes](https://www.iso.org/iso-3166-country-codes.html).**Tab "clean"**: Recoding columns "LDC", "LLDC" and "SIDS" with 1 and 0 (1 = yes, belongs to this category, 0 = no, does not belong to this category) reduce most of their missing values. I have also recoded "Namibia" to repair their "NA" value.**Tab "Region", "Sub-Region" and "Intermediate Region"**: One missing value in these regional categories is related to Antarctica which is not seen by the M49 scheme as a separated region. It has therefore no regional codes and names with the exception of the overall comprising global region. But it has M49 as well ISO-alpha codes.:::::{.my-procedure}:::{.my-procedure-header}:::::: {#prp-02-clean-unsd-m49-data}: Cleaning the UNSD M49 data file:::::::::::::{.my-procedure-container}To clean the data I have taken the following recoding actions in the script for the "clean" tab in- Remove the global codes and names because they a redundant: All rows have global code "001" ("World").- Rename the columns to get shorter names.- Remove Antarctica because it is not seen as separate country.- Replace `NA` in the column ISO-alpha2 Code" of Namibia with the string "NA".- Recode the columns LDC, LLDC and SIDS with 0 and 1.- Relocate the column "country" (previously "Country or Area") to the first column because than it easier to find some relevant content- Sort the data alphabetically by "country".:::::::::##### Summary------------------------------------------------------------------------## Glossary(Some of the abbreviation have at their end an additional "x" that is not part of the abbreviation. I chose this work around to distinguish these abbreviations from the same text chunks in one of the glossary entries. This is a bug in the {**glossary**} package.)```{r}#| label: glossary-table#| echo: falseglossary_table()```------------------------------------------------------------------------## Session Info {.unnumbered}::::: my-r-code::: my-r-code-headerSession Info:::::: my-r-code-container```{r}#| label: session-infosessioninfo::session_info()```::::::::