My objective for this chapter

In this chapter, I will download all well-being data provided by the different World Happiness Reports 2012-2025.

1.1 Introduction

From the years 2012-2024 only data for one year was provided. For example, the World Happiness Report (WHR) from 2012 included data from Gallup research conducted in 2011. However, starting with the 2025 report, all data are combined into a single dataset covering the years 2011 to 2024, with the exception of 2012 (as there was no report in 2013).

For the years 2011 to 2018, only the Cantril ladder data along with country rankings are available. Beginning with 2019 (in the report published in 2020), the complete dataset is included. This full dataset contains the upper whisker/lower whisker data and the six presumed contributing factors for subjective well-being: log GDP per capita, social support, healthy life expectancy, freedom to make life choices, generosity, and perception of corruption. Additionally, it includes the residuals for “dystopia,” an imaginary country representing the lowest scores for each of the six variables observed.

1.2 Folder and file oganization

The first step is to find the life-evaluation data for each year and to store the files for further R processing. I will store the files in the “whr-cantril” folder inside my “data” folder.

The my_create_folder() function in the following code chunk checks if the folder already exists. If this is the case it leaves it untouched, otherwise it creates the folder at the desired path. The code for this private function is in the file helper.R inside the R folder at root level of this working directory.

Resource 1.1 : File Organization

I will organize files into several sub folders:

  • /data/: main folder for all data files related to this project
  • /data/whr-cantril/excel/: untouched original Excel file(s) (to prevent possible link rot at the original source)
  • /data/whr-cantril/rds/: R objects of the original data of the Excel file(s)

R Code 1.1 : Create folders for data files of the World Happiness Reports

Code
my_create_folder(base::paste0(here::here(), "/data/"))
my_create_folder(base::paste0(here::here(), "/data/whr-cantril/"))
my_create_folder(base::paste0(here::here(), "/data/whr-cantril/excel"))
my_create_folder(base::paste0(here::here(), "/data/whr-cantril/csv"))
my_create_folder(base::paste0(here::here(), "/data/whr-cantril/rds"))
(For this R code chunk is no output available)

WATCH OUT! Excel file contains many hidden rows

A manual inspection of the Excel data file revealed that only the last year 2024 are shown. All the other years are hidden. But the download.file() functions saves all data, visible and hidden rows alike.

Here I can’t use the downloader::download() function because it does not save the hidden data.

At the same code chunk where I download the Excel file I will also cache a CSV snapshot for reproducibility reasons.

1.3 Save data as Excel as “.csv” files and as “.rds” object

R Code 1.2 : Save data for the World Happiness Reports 2011-2024 in different formats

Run this code chunk manually if the file still needs to be converted to an “.rds” object.
Code
## WHR 2025 #########################################
url_excel <- base::paste0("https://happiness-report.s3.us-east-1.amazonaws.com/2025/",
    "Data+for+Figure+2.1+(2011%E2%80%932024).xlsx")
path_excel <- base::paste0(here::here(), 
            "/data/whr-cantril/excel/Data+for+Figure+2.1+(2011–2024).xlsx")
path_csv <- base::paste0(here::here(), 
            "/data/whr-cantril/csv/Data+for+Figure+2.1+(2011–2024).csv")


# download Excel file
utils::download.file(
    url_excel, 
    destfile = path_excel
    )

# cache a CSV snapshot
whr_2011_2024 <- path_excel  |>  
  readxl::read_excel() |>  
  readr::write_csv(path_csv)

# save a RDS R object
my_save_data_file("whr-cantril/rds", whr_2011_2024, "whr_2011_2024_orig.rds")


# download as .rds object, sorted by country name and year
whr_2011_2024_arrange <- dplyr::arrange(whr_2011_2024, `Country name`, Year)
my_save_data_file("whr-cantril/rds", whr_2011_2024_arrange, "whr_2011_2024_arrange.rds")
(For this R code chunk is no output available)

1.4 Inspect data

Code Collection 1.1 : Inspect Cantril Ladder data 2011-2024

R Code 1.3 : Show random Cantril Ladder data 2011-2024

Code
whr_2011_2024_arrange <- base::readRDS("data/whr-cantril/rds/whr_2011_2024_arrange.rds")
my_glance_data(whr_2011_2024_arrange)
#> # A tibble: 10 × 14
#>      obs  Year  Rank `Country name` `Ladder score` upperwhisker lowerwhisker
#>    <int> <dbl> <dbl> <chr>                   <dbl>        <dbl>        <dbl>
#>  1     1  2011   131 Afghanistan              4.26        NA           NA   
#>  2    49  2015    26 Argentina                6.65        NA           NA   
#>  3   321  2018   132 Chad                     4.35        NA           NA   
#>  4   561  2022   124 Ethiopia                 4.09         4.27         3.91
#>  5   634  2024    22 Germany                  6.75         6.85         6.65
#>  6  1098  2012    48 Malta                    5.96        NA           NA   
#>  7  1170  2020    73 Montenegro               5.58         5.69         5.48
#>  8  1177  2014    92 Morocco                  5.01        NA           NA   
#>  9  1252  2014     9 New Zealand              7.29        NA           NA   
#> 10  1969  2024   143 Zimbabwe                 3.40         3.51         3.28
#> # ℹ 7 more variables: `Explained by: Log GDP per capita` <dbl>,
#> #   `Explained by: Social support` <dbl>,
#> #   `Explained by: Healthy life expectancy` <dbl>,
#> #   `Explained by: Freedom to make life choices` <dbl>,
#> #   `Explained by: Generosity` <dbl>,
#> #   `Explained by: Perceptions of corruption` <dbl>,
#> #   `Dystopia + residual` <dbl>

R Code 1.4 : Glimpse cantril ladder data 2011-2024

Code
whr_2011_2024_arrange <- base::readRDS("data/whr-cantril/rds/whr_2011_2024_arrange.rds") |> 
  dplyr::glimpse()
#> Rows: 1,969
#> Columns: 13
#> $ Year                                         <dbl> 2011, 2012, 2014, 2015, 2…
#> $ Rank                                         <dbl> 131, 143, 153, 154, 141, …
#> $ `Country name`                               <chr> "Afghanistan", "Afghanist…
#> $ `Ladder score`                               <dbl> 4.2580, 4.0400, 3.5750, 3…
#> $ upperwhisker                                 <dbl> NA, NA, NA, NA, NA, NA, N…
#> $ lowerwhisker                                 <dbl> NA, NA, NA, NA, NA, NA, N…
#> $ `Explained by: Log GDP per capita`           <dbl> NA, NA, NA, NA, NA, NA, N…
#> $ `Explained by: Social support`               <dbl> NA, NA, NA, NA, NA, NA, N…
#> $ `Explained by: Healthy life expectancy`      <dbl> NA, NA, NA, NA, NA, NA, N…
#> $ `Explained by: Freedom to make life choices` <dbl> NA, NA, NA, NA, NA, NA, N…
#> $ `Explained by: Generosity`                   <dbl> NA, NA, NA, NA, NA, NA, N…
#> $ `Explained by: Perceptions of corruption`    <dbl> NA, NA, NA, NA, NA, NA, N…
#> $ `Dystopia + residual`                        <dbl> NA, NA, NA, NA, NA, NA, N…

R Code 1.5 : Skim cantril ladder data 2011-2024

Code
whr_2011_2024_arrange <- base::readRDS("data/whr-cantril/rds/whr_2011_2024_arrange.rds") |> 
  skimr::skim()

After inspection of the data we can summarize:

  • There are data for 169 countries. So we don’t have data for all countries and territories by the United Nations geoscheme because this includes 193 UN member states, two UN observer states (the Holy See and the State of Palestine), two states in free association with New Zealand (the Cook Islands and Niue), and 49 non-sovereign dependencies or territories, as well as Western Sahara (a disputed territory whose sovereignty is contested) and Antarctica.
  • Furthermore the country names do not confirm in all instances to the official names of the M49 geoscheme. For instance we have Cyprus and North Cyprus or Somalia and Somaliland Region.
  • Not all of these countries have covered all years (2011-2024). For instance Cuba has covered only one year (2011).
  • For the years 2011 to 2018, only the Cantril ladder data along with country rankings are available. Beginning with 2019 (in the report published in 2020), the complete dataset is included.

1.5 Glossary

term definition
Cantril Ladder The Cantril Ladder is a visual scale used to assess general life satisfaction. It asks respondents to evaluate their lives on a ladder from worst (bottom) to best (top) possible life, making it a simple tool to measure subjective well-being. This measure has been widely used in various studies and surveys, including the Gallup World Poll, which collects data from over 140 countries annually.
Dystopia In the context of the World Happiness Report (WHR), Dystopia is a hypothetical country used as a benchmark where each of the six factors—levels of Gross Domestic Product (GDP), life expectancy, generosity, social support, freedom, and corruption—score at the bottom, representing the world’s least-happy people. This imaginary country serves to compare real countries against a baseline of the lowest observed scores for these factors, allowing researchers to understand how much better or worse each country performs in terms of happiness. The concept of Dystopia contrasts with the idea of Utopia, an imagined society where life is perfect. Instead, Dystopia illustrates a society where conditions are as unfavorable as possible according to the six factors measured in the World Happiness Report
WHR The World Happiness Reports is a partnership of Gallup, the Oxford Wellbeing Research Centre, the UN Sustainable Development Solutions Network, and the WHR’s Editorial Board. The report is produced under the editorial control of the WHR Editorial Board. The Reports reflects a worldwide demand for more attention to happiness and well-being as criteria for government policy. It reviews the state of happiness in the world today and shows how the science of happiness explains personal and national variations in happiness. (https://worldhappiness.report/about/)

1.6 Session Info

Session Info

Code
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.3 (2025-02-28)
#>  os       macOS Sequoia 15.3.2
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Vienna
#>  date     2025-04-06
#>  pandoc   3.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
#>  quarto   1.6.42 @ /usr/local/bin/quarto
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  base64enc     0.1-3   2015-07-28 [1] CRAN (R 4.4.1)
#>  cli           3.6.4   2025-02-13 [1] CRAN (R 4.4.1)
#>  colorspace    2.1-1   2024-07-26 [1] CRAN (R 4.4.1)
#>  commonmark    1.9.2   2024-10-04 [1] CRAN (R 4.4.1)
#>  curl          6.2.1   2025-02-19 [1] CRAN (R 4.4.1)
#>  digest        0.6.37  2024-08-19 [1] CRAN (R 4.4.1)
#>  dplyr         1.1.4   2023-11-17 [1] CRAN (R 4.4.0)
#>  evaluate      1.0.3   2025-01-10 [1] CRAN (R 4.4.1)
#>  fastmap       1.2.0   2024-05-15 [1] CRAN (R 4.4.1)
#>  generics      0.1.3   2022-07-05 [1] CRAN (R 4.4.1)
#>  glossary    * 1.0.0   2023-05-30 [1] CRAN (R 4.4.0)
#>  glue          1.8.0   2024-09-30 [1] CRAN (R 4.4.1)
#>  here          1.0.1   2020-12-13 [1] CRAN (R 4.4.1)
#>  htmltools     0.5.8.1 2024-04-04 [1] CRAN (R 4.4.1)
#>  htmlwidgets   1.6.4   2023-12-06 [1] CRAN (R 4.4.0)
#>  jsonlite      1.9.1   2025-03-03 [1] CRAN (R 4.4.1)
#>  kableExtra    1.4.0   2024-01-24 [1] CRAN (R 4.4.0)
#>  knitr         1.49    2024-11-08 [1] CRAN (R 4.4.1)
#>  lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.4.1)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.4.1)
#>  markdown      1.13    2024-06-04 [1] CRAN (R 4.4.1)
#>  munsell       0.5.1   2024-04-01 [1] CRAN (R 4.4.1)
#>  pillar        1.10.1  2025-01-07 [1] CRAN (R 4.4.1)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.4.1)
#>  purrr         1.0.4   2025-02-05 [1] CRAN (R 4.4.1)
#>  R6            2.6.1   2025-02-15 [1] CRAN (R 4.4.1)
#>  repr          1.1.7   2024-03-22 [1] CRAN (R 4.4.0)
#>  rlang         1.1.5   2025-01-17 [1] CRAN (R 4.4.1)
#>  rmarkdown     2.29    2024-11-04 [1] CRAN (R 4.4.1)
#>  rprojroot     2.0.4   2023-11-05 [1] CRAN (R 4.4.1)
#>  rstudioapi    0.17.1  2024-10-22 [1] CRAN (R 4.4.1)
#>  rversions     2.1.2   2022-08-31 [1] CRAN (R 4.4.1)
#>  scales        1.3.0   2023-11-28 [1] CRAN (R 4.4.0)
#>  sessioninfo   1.2.3   2025-02-05 [1] CRAN (R 4.4.1)
#>  skimr         2.1.5   2022-12-23 [1] CRAN (R 4.4.0)
#>  stringi       1.8.4   2024-05-06 [1] CRAN (R 4.4.1)
#>  stringr       1.5.1   2023-11-14 [1] CRAN (R 4.4.0)
#>  svglite       2.1.3   2023-12-08 [1] CRAN (R 4.4.0)
#>  systemfonts   1.2.1   2025-01-20 [1] CRAN (R 4.4.1)
#>  tibble        3.2.1   2023-03-20 [1] CRAN (R 4.4.0)
#>  tidyr         1.3.1   2024-01-24 [1] CRAN (R 4.4.1)
#>  tidyselect    1.2.1   2024-03-11 [1] CRAN (R 4.4.0)
#>  utf8          1.2.4   2023-10-22 [1] CRAN (R 4.4.1)
#>  vctrs         0.6.5   2023-12-01 [1] CRAN (R 4.4.0)
#>  viridisLite   0.4.2   2023-05-02 [1] CRAN (R 4.4.1)
#>  withr         3.0.2   2024-10-28 [1] CRAN (R 4.4.1)
#>  xfun          0.51    2025-02-19 [1] CRAN (R 4.4.1)
#>  xml2          1.3.7   2025-02-28 [1] CRAN (R 4.4.1)
#>  yaml          2.3.10  2024-07-26 [1] CRAN (R 4.4.1)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library
#>  * ── Packages attached to the search path.
#> 
#> ──────────────────────────────────────────────────────────────────────────────