A review of the most important concepts used in Demography with reference to data available in public repositories GUS, EUROSTAT, WHO or FAO.
Census (national census), registers (USC, PESEL), survey
Inventory/Register: Everyone is counted. Survey count part of the population. Census every 10 (or 5) years. Counting method: de facto (current place of stay) or de jure (domicile).
First censuses: 3000 BC (China) or the Roman Empire (5th century BC; the census is mentioned in the Bible, the Holy Family was going to be counted before the birth of Jesus (Gospel of St. Luke; cf https://pl.wikipedia.org/wiki/Ewangelia_%C5%81ukasza)
The first census in Poland https://pl.wikipedia.org/wiki/Pierwszy_Pszechny_Spis_Ludno%C5%9Bci History of censuses in Poland https://spis.gov.pl/o-spisie/historia-spisow
In the US, the first census was organized in 1790 (https://en.wikipedia.org/wiki/United_States_census or https://www.census.gov/history/www/through_the_decades/overview/1790.html). The population was divided into the following categories: Free White males of 16 years and upward Free White males under 16 years Free White females All other free persons Slaves [BP2010]
Registers: USC/PESEL. Birth/ death/marriage certificate. Theoretically precise (or reliable)
Rate vs ratio: that is the quotient of something by something (if both values use the same measure, it will be ratio, but if they use different measures is rate). In PL, there is no distinction between ratio/rate – both are simply called ratio.
TERYT – National Official Register of the Territorial Division of the Country - it is an official register kept by the Central Statistical Office. TERYT includes, among others, the system of identifiers and names of territorial division units – TERC
TERC identifier consists of seven digits: wwppggr (voivodship/powiat/community/unit symbol) Unit symbol: 1 – municipal community, 2 – rural community, 3 – urban-rural community, 4 – city in an urban-rural community, 5 – rural area in an urban-rural community, 8 – districts of the capital city of Warsaw, 9 – parts of: Kraków, Łódź, Poznań and Wrocław.
TERC identifiers can look different. The Central Statistical Office (GUS) follows the conventions of using 7-digit TERC numbers with trailing zeros: 2,200,000 – Pomeranian; 2,201,000 – Bytów district; 2201023 – Bytów (municipality of mw); 2201024 – Bytów (town); 2201025 – Bytów (village)
Example: the file Miasto_wojewodzkie.csv
contains, inter alia, TERC codes voivodship cities (in the column teryt
)
wojmiasta <- read.csv("miasta_wojewodzkie.csv", sep = ';',
colClasses=c('factor', 'character', rep('numeric', 2)),
header=T, na.string="NA" )
#wojmiasta
wojmiasta$teryt <- sprintf ("%s000", wojmiasta$teryt)
wmt <- wojmiasta$teryt
See also: https://stat.gov.pl/statystyka-regionalna/jednostki-terytorialne/podzial-administracyjny-polski/ and https://stat.gov.pl/statystyka-regionalna/jednostki-terytorialne/podzial-administracyjny-polski/rodzaje-gmin-oraz-obszary-miejskie-i-wiejskie/
Births – only live births are counted; They are also counted consecutive birth numbers, age of mothers and others …
Birth rate (crude birth rate): \(U/L\), where \(L\) is the number of live births; \(L\) is the average population or mid-year population. Usually the result it is multiplied by a thousand. For every thousand people there are X babies born there …
General fertility rate: number of births by (ie. divided) the number of women aged 15-49 (fertile period). It can be counted in age groups (age specific fertility rate or ASFR)
Total fertility rate: the average number of children that will give birth to a woman within her life. Estimated based on fertility rates. Basically the same as GFR. There are even simplified formulas: \(T_{Fr} = G_{Fr} * 30\) (30 is length of a period 15–49 in years) or \(T_{Fr} = C_{Br} * 30 * 4.5\). The first formula is almost exact.
The fertility rate that guarantees simple replacement of generations is 2.1–2.15
# TFR w bazie Eurostatu jest w tabeli demo_find
f <- read.csv("demo_find.csv", sep = ';', header=T, na.string="NA" )
countries <- c ('PL', 'DE', 'FR', 'UK', 'IT')
f <- f %>% filter ( geo %in% countries & item == 'TOTFERRT') %>% as.data.frame()
p1 <- ggplot(f, aes(x=year, y=value, color=geo)) +
geom_smooth(method="loess", se=F, span=spanV, size=.4) +
geom_point(size=.8, alpha=.5)
p1
TFR is also in the FAO database (more countries but data for one year only):
# Pobranie danych via API ze strony WHO
curl "https://apps.who.int/gho/athena/api/GHO/WHS9_95?format=csv" > fertility_who.csv
wc -l fertility_who.csv
184
The original fertility_who.csv
file has been slightly modified to remove unnecessary columns. Ultimately, this file contains the following variables: year
(always 2016)region
(world region) geo
(country / ISO code)aprox_value
(value approximate) and value
.
fw <- read.csv("fertility_who.csv", sep = ';', header=T, na.string="NA" )
summary(fw$value)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.241 1.752 2.307 2.795 3.709 7.239
## wg regionów
## AFR=Africa; AMR=Americas; EMR=Eastern Mediterranean;
## EUR=Europe; SEAR=South-East Asia; WPR=Western Pacific
ggplot(fw, aes(x=region, y=value, fill=region)) + geom_boxplot() +
ylab("regiony") +
xlab("")
Now let’s download the data on the population in individual countries of the world
# curl "https://apps.who.int/gho/athena/api/GHO/WHS9_86?format=csv" > pop_who.csv
pw <- read.csv("pop_who.csv", sep = ';', header=T, na.string="NA" )
# geo != '' usuwa sumy dla regionów i ogółem zostają tylko kraje
pw <- pw %>% select(geo, value) %>% filter (geo != '') %>% as.data.frame()
# łącznie na świecie
ludnosc.swiata <- sum(pw$value)
# łączymy obie tabele w oparciu o kolumnę geo
fw <- left_join(fw, pw, by='geo')
# tylko kraje o liczbie ludności > 10mln
fw <- fw %>% filter(value.y > 10000)
duze.kraje <- nrow(fw)
duze.kraje.ludnosc <- sum(fw$value.y)
According to WHO, the world’s population is 7430261 (but the data is a bit old because from 2016.) In large countries (population up to 10 million), which is 87, lives 7085366 (ie 95.36 % ).
Let’s show these major countries in a scatter plot:
fr2016p <- ggplot(fw, aes(x = reorder(geo, value.x), color=region )) +
geom_point(aes(y = value.x), size=1) +
xlab(label="country") +
ylab(label="fertility ratio") +
ggtitle("Fertility ratio 2016") +
theme(axis.text = element_text(size = 4)) +
theme(plot.title = element_text(hjust = 0.5)) +
coord_flip(ylim = c(0, 8))
fr2016p
Death (in case you did not know): permanent, irreversible cessation of all biological functions that sustain a living organism. For the statistician death have to be officially confirmed (Death certificate). To die you have to be born BTW (deaths apply to live births only)
http://isap.sejm.gov.pl/isap.nsf/download.xsp/WDU20200000698/O/D20200698.pdf
BTW, death register statistics (aggregated weekly) are published monthly in Poland https://dane.gov.pl/pl/dataset/1953,liczba-zgonow-zarejestrowanych-w-rejestrze-stanu-cywilnego
Crude death rate: \(Z/L\) (number of deaths divided by average population per year) It can be counted in selected groups (age, professional, gender); and then it is called X-specific death rate (age-specific death rate or ASDR for example).
Infant deaths are treated differently: \(Z/U\) (deaths divided by live births.) This ratio is frequently used as a measure of the level of socio-economic development.
z <- read.csv("demo_mmonth.csv", sep = ';', header=T, na.string="NA" )
countries <- c ('PL', 'DE', 'FR', 'UK', 'IT')
z <- z %>% filter ( geo %in% countries & year > 2000
& (month != 'TOTAL' & month != 'UNK') ) %>% as.data.frame()
z$date <- sprintf ("%i-%02i-01", z$year, as.numeric(substr(z$month,2,3)))
p1 <- ggplot(z, aes(x=as.Date(date), y=value, color=geo)) +
geom_line(size=.4, alpha=.3) +
geom_point(size=.8, alpha=.5)
p1
Last five years for DE/PL/UK:
##
z15 <- z %>% filter ( geo %in% c('DE', 'PL', 'UK') & year > 2014 )
z15$date <- sprintf ("%i-%02i-01", z15$year, as.numeric(substr(z15$month,2,3)))
p1 <- ggplot(z15, aes(x=as.Date(date), y=value, color=geo)) +
geom_line(size=.4, alpha=.3) +
geom_point(size=.8, alpha=.5)
p1
Life expectancy tables, is a set of coefficients describing the extinction process of a population:
age specific death rate (Mx or DEATHRATE according to Eurostat))
probability of death within a year for a person aged x years (Probability of dying between exact ages (qx or PROBDEATH))
probability of survival in a year for a person aged x years (Probability of surviving between exact ages (px or PROBSURV))
number of survivors x completed years (Number left alive at given exact age (lx or SURVIVORS))
the number of people who died during the year (i.e. those who are over x years old) (dx or NUMBERDYING acc. To Eurostat, but this factor is not published by EuroStat)
total number of person-years living to age x years (Person-years lived between exact age (Lx or PYLIVED))
the total number of person-years to live by people aged x years (Total person-years lived above given exact age (Tx or TOTPYLIVED))
average life expectancy by a person aged x years (Life expectancy at given exact age (ex or LIFEXP))
Natural change: the difference between the number of births and deaths per year. For comparisons better to use: natural change rate (PN divided by average population)
Highly discretionary measure. Garbage codes; https://stat.gov.pl/obszary-tematyczne/ludnosc/statystyka-przyczyn-zgonow/zgony-wedlug-przyczyn-okreslanych-jako-garbage-codes,3,1.html): codes corresponding to inaccurate descriptions of diseases that make it impossible to determine the cause of death (died because he/she stopped breathing).
International Statistical Classification of Diseases and Related Health Problems (ICD/WHO) https://www.who.int/standards/classifications/classification-of-diseases https://en.wikipedia.org/wiki/International_Classification_of_Diseases
How does the Central Statistical Office determine the cause of death? https://stat.gov.pl/obszary-tematyczne/ludnosc/statystyka-przyczyn-zgonow/jak-gus-kieta-statystyke-zgonow,8,1.html
Deaths by causes in PL (BDL)
Definition and measurement. Who deceased is in general known and certain as records of deads are reliable and complete. Who is a migrant is not necessarily clear, as well as it is not clear how to count them.
Migration (definition): long-distance change of permanent residence. If administrative unit change – internal, if country change – external.
According to Eurostat emigration: the action by which a person, having previously been a resident in the territory of a Member State, ceases to have his or her usual residence [or establishes his or her usual residence in case of immigration] in that Member State for a period that is, or is expected to be, of at least 12 months. Eurostat does not keep internal migration statistics
Member States generally base their migration flow data on administrative sources, sample surveys, census data, mirror data, mathematical methods or a combination of data sources (that is, everything and nothing is known)
In Poland migration data is based on: administrative data [and] estimated data based on administrative data, mirror statistics, national statistical surveys. More here: https://ec.europa.eu/eurostat/cache/metadata/en/migr_immi_esms.htm
According to the Central Statistical Office, internal migration is a change of place of residence (permanent/temporary stay a legal term/a legal procedura and a PESEL register – in PL one has to register his/her stay, you do of course?) within country consisting in crossing the administrative border of the community (in the 7-digit sense, so to speak, if someone lived in the community 2201024 (Bytów city), built a house and moved to suburb 2201025 – (village Bytów), he is already a migrant for the Central Statistical Office …
According to GUS the only source of data on internal and external migration are registers, namely the PESEL register. The content of the PESEL register is in particular updated on the basis of the stay/departure/return reports (obowiązek meldunkowy). To the what extent the register corresponds to the actual state is another matter. There are opinions that the correspondence is low… https://stat.gov.pl/obszary-tematyczne/ludnosc/migracji-zagraniczne-ludnosci/zeszyt-metodologiczny-migracji-ludnosci,15,1.html
Total migration balance for poviats of the Pomeranian Voivodeship (data taken from the BDL)
#Bank Danych Lokalnych
#Kategoria K3 LUDNOŚĆ /
#Grupa G8 MIGRACJE WEWNĘTRZNE I ZAGRANICZNE
#Podgrupa P1355 Migracje na pobyt stały gminne wg płci migrantów i kierunku (miasto, wieś)
fw <- read.csv("LUDN_1355_CTAB_20210108152342.csv",
colClasses=c('factor', 'character', rep('numeric', 50), 'character'),
sep = ';', dec = ',', header=T, na.string="NA" )
#colnames(fw)
#str(fw)
The data must be transformed
## TERC województwa/powiatu
fw$woj <- substr(fw$Kod,1,2)
fw$powiat <- substr(fw$Kod,3,4)
## Tylko powiaty z pomorskiego TERC=22
pom <- fw %>% filter(woj == "22" & powiat != "00" & substr(fw$Kod,5,6) == '00') %>% as.data.frame()
## zmienne 3--28 zamien na long
## https://tidyselect.r-lib.org/reference/starts_with.html
library("tidyverse")
pom1 <- pom %>% select (-contains("1000")) %>%
pivot_longer( cols = starts_with("saldo.migracji"), names_to = "rok",
names_prefix = "saldo.migracji_ogółem_", values_to = "value") %>%
mutate(rok = as.numeric(str_sub(rok, 1, 4)),
Nazwa = str_sub(Nazwa,7),
) %>% as.data.frame()
p5 <- ggplot(pom1, aes(x=as.Date(as.character(rok), format = "%Y"), y=value)) +
facet_wrap(~Nazwa, scales = "free_y", ncol = 4) +
geom_line(size=.4, alpha=.3) +
geom_point(size=.8, alpha=.5)
p5
Number of departing and arriving permanently (emigrants/immigrants) for poviats of the Pomeranian Voivodeship (data from BDL; after downloading, we modify the header and remove the characters "
)
##K3 LUDNOŚĆ
##Grupa G8 MIGRACJE WEWNĘTRZNE I ZAGRANICZNE
##Podgrupa P3000 Migracje na pobyt stały wewnętrzne i zagraniczne (dane półroczne)
fx <- read.csv("LUDN_3000_CTAB_20210108204154.csv",
colClasses=c('factor', 'character', rep('numeric', 32), 'character'),
sep = ';', dec = ',', header=T, na.string="NA" )
fx$woj <- substr(fx$Kod,1,2)
fx$powiat <- substr(fx$Kod,3,4)
pom <- fx %>% filter(woj == "22" & powiat != "00" & substr(fx$Kod,5,6) == '00') %>% as.data.frame()
## trik usuwający nieużywane wartości Kod
pom$Kod <- factor(pom$Kod)
Zamieniamy oddzielnie emigracje/imigracje na typ long:
pom1i <- pom %>% select (-contains("emigracja_")) %>%
pivot_longer( cols = starts_with("imigracja_"), names_to = "rok",
names_prefix = "imigracja_", values_to = "value") %>%
mutate(rok = as.numeric(str_sub(rok, 1, 4)),
Nazwa = str_sub(Nazwa,7),
) %>% as.data.frame()
#levels(pom1i$Kod)
pom1e <- pom %>% select (-contains("imigracja_")) %>%
pivot_longer( cols = starts_with("emigracja_"), names_to = "rok",
names_prefix = "emigracja_", values_to = "value") %>%
mutate(rok = as.numeric(str_sub(rok, 1, 4)),
Nazwa = str_sub(Nazwa,7),
) %>% as.data.frame()
#levels(pom1e$Kod)
We combine emigrant/immigrant frames into one (left_join
based on Kod
/rok
keys):
pom1x <- left_join(pom1e, pom1i, by=c("Kod", "rok"))
Scatter plot with a trend (loess method). In red, emigration, and in blue, immigration
p6 <- ggplot(pom1x, aes(x=as.Date(as.character(rok), format = "%Y"))) +
geom_smooth(aes(y=value.x), method="loess", se=F, span=spanV, size=.4, color='red') +
geom_point( aes(y=value.x), size=.8, alpha=.5, color='red') +
geom_smooth(aes(y=value.y), method="loess", se=F, span=spanV, size=.4, color='blue') +
geom_point( aes(y=value.y), size=.8, alpha=.5, color='blue') +
scale_x_date( labels = date_format("%y"), breaks = "2 years") +
xlab("rok")+
ylab("emigrants (red)/immigrants (blue)") +
# Różne skale na osiach OY
#facet_wrap(~Nazwa.x, scales = "fixed", ncol = 4)
facet_wrap(~Nazwa.x, scales = "fixed", ncol = 4)
p6
Because the values for Gdańsk (and Gdynia; but to a lesser extent) are an order of magnitude larger than for other counties the chart is difficult to read.
The first solution is to use scales =" free_y "
, which has the downside that every county will have other values on the OY axis. Better to convert the scale of the OY axis from arithmetic to logarithmic. Just add scale_y_log10 ()
to this
p7 <- ggplot(pom1x, aes(x=as.Date(as.character(rok), format = "%Y"))) +
geom_smooth(aes(y=value.x), method="loess", se=F, span=spanV, size=.4, color='red') +
geom_point( aes(y=value.x), size=.8, alpha=.5, color='red') +
geom_smooth(aes(y=value.y), method="loess", se=F, span=spanV, size=.4, color='blue') +
geom_point( aes(y=value.y), size=.8, alpha=.5, color='blue') +
scale_y_log10()+
scale_x_date( labels = date_format("%y"), breaks = "2 years") +
xlab("rok")+
ylab("emigrants (red)/immigrants (blue)") +
#facet_wrap(~Nazwa.x, scales = "fixed", ncol = 4)
facet_wrap(~Nazwa.x, scales = "fixed", ncol = 4)
p7
The structure of the population by sex and age determines the division of labor in a society: for those who work, learn (childern/youth) and/or require care (old people).
Share of men and women in the total population. Sex ratio (M/F). The value of this coefficient can be calculated for age groups; for newborns it is called sex ratio at birth or SRB.
** Age pyramid **: just a histogram but drawn in a strange way.
The working age population is defined as those aged 15–64 or 18–64 for men and 18–59 for woman. Mobile age - working age group including the population aged 18–44, immobile age – working age group including the population aged: men - 45–64, women – 45–59. The population in non-working age is understood as the population in the pre-working age, i.e. up to 17 years of age and in the post-working age, ie men – 65 years and more, women – 60 years and more.
Age dependency ratio: The ratio of the number of people in non-working age to the number of people in working age (https://stat.gov.pl/metainformacje/slownik-pojec/pojecia-stosowane-w-statystyce-publiczna/1958, archive.html). You can also distinguish: youth-dependency ratio (YDR) and aged-dependency ratio (ADR)
Aging of the population (* population aging *); increasing value ADR coefficient.
UN Population Division World Population Prospects 2019 (huge files) https://population.un.org/wpp/Download/Standard/CSV/ https://cran.r-project.org/web/packages/wpp2019/index.html
Forecast 2014-2050 for Poland https://stat.gov.pl/obszary-tematyczne/ludnosc/ https://stat.gov.pl/obszary-tematyczne/ludnosc/prognoza-ludnosci/prognoza-ludnosci-na-lata-2014-2050-opracowana-2014-r-,1,5.html
Old-age-dependency ratio (or aged-dependency ratio) number of elderly people (65 and over) per 100 people aged 15–64. (https://ec.europa.eu/eurostat/web/products-datasets/-/tps00198)
** Total dependency ratio** number of elderly people (65 and over) and young people (0–14) per 100 people aged 15–64.
Population by sex and functional age groups (BDL; data by five-year age groups seem incomplete). The data is in * long * format (BDL calls it * relational table *):
pw.list <- read.csv("BDL_powiaty.csv", sep = ';',
colClasses=c('factor', 'character'), header=T, na.string="NA" )
## wg plci i funkcjonalnych grup wieku
lm <- read.csv("LUDN_3447.csv", sep = ';',
colClasses=c('factor', 'factor', 'factor', 'numeric', 'numeric'),
header=T, na.string="NA" )
## dodanie nazwa powiatów z oddzielnego pliku
lm <- left_join(lm, pw.list, by="Kod")
We change the format to * wide *; we delete the rows for which the value in the Back
column not equal to O
(total):
lmw <- lm %>% pivot_wider(names_from = wiek, values_from = Wartosc)
## tylko ogółem
lmw65 <- lmw %>% filter (Plec == "O") %>% as.data.frame()
We calculate different things:
lmw65$`0-15` <- lmw65$`0-2` + lmw65$`3-6` + lmw65$`7-12` + lmw65$`13-15`
lmw65$`16-64` <- lmw65$O - lmw65$`0-15` - lmw65$`65-99`
lmw65$`0-15p` <- lmw65$`0-15` / lmw65$O * 100
lmw65$`16-64p` <- lmw65$`16-64` / lmw65$O * 100
lmw65$`65-99p` <- lmw65$`65-99` / lmw65$O * 100
lmw65$ADR <- lmw65$`65-99` / ( lmw65$O - lmw65$`0-15` - lmw65$`65-99`) * 100
lmw65$YDR <- lmw65$`0-15` / ( lmw65$O - lmw65$`0-15` - lmw65$`65-99`) * 100
## kopia na później
lmw65.r <- lmw65
Voivodeship cities:
## wmt to wektor numerów TERC dla miast woj
lmw65w <- lmw65 %>% filter (Kod %in% wmt) %>% as.data.frame()
##str(lmw65w)
timeBreaks <- "2 years"
p66 <- ggplot(lmw65w, aes(x=as.Date(as.character(Rok), format = "%Y"), y=ADR)) +
geom_smooth(method="loess", se=F, span=spanV, size=.4) +
geom_point(size=.8, alpha=.5) +
scale_x_date( labels = date_format("%y"), breaks = timeBreaks) +
xlab("rok")+
ylab("ADR") +
#scale_y_log10() +
facet_wrap(~Nazwa, scales = "fixed", ncol = 4)
p66
p67 <- ggplot(lmw65w, aes(x=as.Date(as.character(Rok), format = "%Y"), y=YDR)) +
geom_smooth(method="loess", se=F, span=spanV, size=.4) +
geom_point(size=.8, alpha=.5) +
scale_x_date( labels = date_format("%y"), breaks = timeBreaks) +
xlab("rok")+
ylab("YDR") +
#scale_y_log10() +
facet_wrap(~Nazwa, scales = "fixed", ncol = 4)
p67
Shares of the population aged 0-15 (red), 16-64 (green), 65 and above (blue).
p68 <- ggplot(lmw65w, aes(x=as.Date(as.character(Rok), format = "%Y"))) +
geom_smooth(aes(y=`0-15p`), method="loess", se=F, span=spanV, size=.4, color='red') +
geom_point(aes(y=`0-15p`), size=.8, alpha=.5, color='red') +
geom_smooth(aes(y=`65-99p`), method="loess", se=F, span=spanV, size=.4, color='blue') +
geom_point(aes(y=`65-99p`), size=.8, alpha=.5, color='blue') +
geom_smooth(aes(y=`16-64p`), method="loess", se=F, span=spanV, size=.4, color='green') +
geom_point(aes(y=`16-64p`), size=.8, alpha=.5, color='green') +
scale_x_date( labels = date_format("%y"), breaks = timeBreaks) +
xlab("year")+
ylab("-15/16-64/65-") +
scale_y_log10() +
facet_wrap(~Nazwa, scales = "fixed", ncol = 4)
p68
Poviats of the Pomeranian Voivodeship Shares of the population aged 0-15 (red), 16-64 (green), 65 and above (blue).
lmw65$woj <- substr(lmw65$Kod,1,2)
lmw65$powiat <- substr(lmw65$Kod,3,4)
pom65 <- lmw65 %>% filter(woj == "22" & powiat != "00" ) %>% as.data.frame()
p69 <- ggplot(pom65, aes(x=as.Date(as.character(Rok), format = "%Y"))) +
geom_smooth(aes(y=`0-15p`), method="loess", se=F, span=spanV, size=.4, color='red') +
geom_point(aes(y=`0-15p`), size=.8, alpha=.5, color='red') +
geom_smooth(aes(y=`65-99p`), method="loess", se=F, span=spanV, size=.4, color='blue') +
geom_point(aes(y=`65-99p`), size=.8, alpha=.5, color='blue') +
geom_smooth(aes(y=`16-64p`), method="loess", se=F, span=spanV, size=.4, color='green') +
geom_point(aes(y=`16-64p`), size=.8, alpha=.5, color='green') +
scale_x_date( labels = date_format("%y"), breaks = timeBreaks) +
scale_y_continuous(breaks=c(0,10,20,30,40,50,60,70,80))+
xlab("rok")+
ylab("-15/16-64/65-") +
##scale_y_log10() +
facet_wrap(~Nazwa, scales = "fixed", ncol = 4)
p69
The oldest counties
old65 <- lmw65 %>% group_by(Kod) %>% arrange(Rok) %>%
filter(row_number()==n()) %>% arrange(`65-99p`) %>% as.data.frame()
##str(old65)
old65top <- old65 %>% slice_tail( n= 20) %>% as.data.frame()
## tylko nazwa i udział 65- i więcej
old65top %>% select(Nazwa, `65-99p`)
## Nazwa 65-99p
## 1 Powiat m.Słupsk 21.45212
## 2 Powiat m.Tarnów 21.46308
## 3 Powiat bielski 21.52043
## 4 Powiat krasnostawski 21.54036
## 5 Powiat m.Kalisz 21.60784
## 6 Powiat m.Świnoujście 21.73498
## 7 Powiat m.Bydgoszcz 21.74531
## 8 Powiat skarżyski 21.99322
## 9 Powiat m.Konin 22.01382
## 10 Powiat m.Kielce 22.04648
## 11 Powiat m.Wałbrzych od 2013 22.06437
## 12 Powiat m.Gdynia 22.09679
## 13 Powiat m.Koszalin 22.11998
## 14 Powiat m.Katowice 22.18879
## 15 Powiat m.Częstochowa 22.36916
## 16 Powiat m.Sosnowiec 22.85397
## 17 Powiat m.Jelenia Góra 23.81958
## 18 Powiat m.Łódź 23.83795
## 19 Powiat hajnowski 24.26660
## 20 Powiat m.Sopot 27.67995
Forecast and reality. GUS forecast for 2014 (based on data from the 2011 census)
## lmw65 - prognoza ; lmw65.r -- realizacja do roku 2019
## lmw65.rp zapamiętana ramka z pop przykładu
lm <- read.csv("LUDN_3561.csv", sep = ';',
colClasses=c('factor', 'factor', 'factor', 'numeric', 'numeric'),
header=T, na.string="NA" )
lm <- left_join(lm, pw.list, by="Kod")
lmw <- lm %>% pivot_wider(names_from = wiek, values_from = Wartosc)
## tylko ogółem
lmw65 <- lmw %>% filter (Plec == "O") %>% as.data.frame()
## analigicznie jak dla wartości historycznych:
lmw65$`0-15` <- lmw65$`0-2` + lmw65$`3-6` + lmw65$`7-12` + lmw65$`13-15`
lmw65$`16-64` <- lmw65$O - lmw65$`0-15` - lmw65$`65-99`
lmw65$`0-15p` <- lmw65$`0-15` / lmw65$O * 100
lmw65$`16-64p` <- lmw65$`16-64` / lmw65$O * 100
lmw65$`65-99p` <- lmw65$`65-99` / lmw65$O * 100
lmw65$ADR <- lmw65$`65-99` / ( lmw65$O - lmw65$`0-15` - lmw65$`65-99`) * 100
lmw65$YDR <- lmw65$`0-15` / ( lmw65$O - lmw65$`0-15` - lmw65$`65-99`) * 100
We combine the frame of forecasted values with the realized (historical) values; we limit the time horizon to 2025 (because we only have historical values until 2019) and the spatial horizon to the Pomeranian Voivodeship:
lmw65.rp <- left_join(lmw65, lmw65.r, by=c("Kod", "Rok"))
lmw65.rp$woj <- substr(lmw65.rp$Kod,1,2)
lmw65.rp$powiat <- substr(lmw65.rp$Kod,3,4)
pom65.rp <- lmw65.rp %>% filter(Rok <= 2025) %>% filter(woj == "22" & powiat != "00" ) %>% as.data.frame()
# w promocji ramka dla Sopotu
pom65.rp.sopot <- pom65.rp %>% filter (Kod == "2264000") %>%
select(Kod, Rok, `65-99.x`, `65-99p.x`, `65-99.y`, `65-99p.y`) %>% as.data.frame()
Shares of the population aged 0-15 (red), 16-64 (green), 65 and above (blue). The lighter color is the forecast and the darker color is the actual data.
p690 <- ggplot(pom65.rp, aes(x=as.Date(as.character(Rok), format = "%Y"))) +
geom_smooth(aes(y=`0-15p.x`), method="loess", se=F, span=spanV, size=.4, color='tomato1') +
geom_smooth(aes(y=`0-15p.y`), method="loess", se=F, span=spanV, size=.4, color='tomato4') +
geom_point(aes(y=`0-15p.x`), size=.8, alpha=.5, color='tomato1') +
geom_point(aes(y=`0-15p.y`), size=.8, alpha=.5, color='tomato4') +
##
geom_smooth(aes(y=`65-99p.x`), method="loess", se=F, span=spanV, size=.4, color='steelblue1') +
geom_smooth(aes(y=`65-99p.y`), method="loess", se=F, span=spanV, size=.4, color='steelblue4') +
geom_point(aes(y=`65-99p.x`), size=.8, alpha=.5, color='steelblue1') +
geom_point(aes(y=`65-99p.y`), size=.8, alpha=.5, color='steelblue4') +
##
geom_smooth(aes(y=`16-64p.x`), method="loess", se=F, span=spanV, size=.4, color='green1') +
geom_smooth(aes(y=`16-64p.y`), method="loess", se=F, span=spanV, size=.4, color='green4') +
geom_point(aes(y=`16-64p.x`), size=.8, alpha=.5, color='green1') +
geom_point(aes(y=`16-64p.y`), size=.8, alpha=.5, color='green4') +
scale_x_date( labels = date_format("%y"), breaks = '1 year') +
scale_y_continuous(breaks=c(0,10,20,30,40,50,60,70,80))+
xlab("year")+
ylab("ADR") +
##scale_y_log10() +
facet_wrap(~Nazwa.x, scales = "fixed", ncol = 4)
p690
As you can see the accuracy of forecasts could be envied by specialists in forecasting an epidemic :-)
Quirky term nuptiality – intensity/tendency to get married.
Marriage: union between persons of opposite sexes which involve rights and obligations fixed by law or custom; from Demopædia Encyclopedia on Population (http://en-ii.demopaedia.org/wiki/Nuptiality) and was created from the * Multilingual Demographic Dictionary * https://www.un.org/development/desa/capacity-development/tools/tool/demopaedia-multilingual-demographic-dictionary-2nd-ed/ (note that the link is from the UN website)
According to Eurostat: * A marriage is the act, ceremony or process by which the legal relationship between two persons is formed. The legality of the union may be established by civil, religious or other means as recognized by the laws of each country.
A divorce is the final legal dissolution of a marriage.
What exactly constitutes a legal relationship between two persons varies from country to country. In PL, same-sex relationships are not registered, while in many countries there are such things as partnerships https://en.wikipedia.org/wiki/Civil_union Replaced by Same-Sex Marriages https://en.wikipedia.org/wiki/Same-sex_marriage.
The family unit is a changing concept (in your country too?): what it means to be a member of a family and the expectations people have of family relationships vary with time and space, making it difficult to find a universally agreed and applied definition. Due to differences in the timing and formal recognition of changing patterns of family formation and dissolution, these concepts have become more difficult to measure in practice. Analysts of demographic statistics therefore have access to relatively few complete and reliable data sets with which to make comparisons over time and between or within countries. https://ec.europa.eu/eurostat/statistics-explained/index.php/Marriage_and_divorce_statistics
The subject has therefore become potentially dangerous. It’s easy to be a homophobe so better not to bother with it. But if we really want to deal with it, the general approach is similar to that used for births/deaths.
Crude marriage rate M/L (number of marriages split by average population per year)
In the more popular variant (** General marriage rate **), the coefficient is calculated as the quotient of the number of marriages and the average eligible population to get married (instead of the total population)
Base demo_nind
(marriages) anddemo_ndivind
(divorces); both from Eurostat: MARRIAGE – number of marriages; GNUPRT – Crude marriage rate; DIV – number of divorces; GDIVRT – Crude divorce rate; DIVMARPCT – divorces per 100 marriages
d <- read.csv("demo_ndivind.csv", sep = ';', header=T, na.string="NA" )
countries <- c ('PL', 'DE', 'FR', 'UK', 'IT')
d <- d %>% filter ( geo %in% countries & item == 'GDIVRT') %>% as.data.frame()
p1 <- ggplot(d, aes(x=year, y=value, color=geo)) +
geom_smooth(method="loess", se=F, span=spanV, size=.4) +
geom_point(size=.8, alpha=.5)
p1
We should start with this, but it is better to treat this topic as a summary. The number depends on births/deaths, migration. Deaths and births, in turn, have a lot in common with age, for example.
Population in voivodship cities (BDL):
## liczba ludności w miastach wojewódzkich
fx <- read.csv("LUDN_2137_CTAB_20210108163150.csv",
colClasses=c('factor', 'character', rep('numeric', 45), 'character'),
sep = ';', dec = ',', header=T, na.string="NA" )
wmt <- wojmiasta$teryt
fxmw <- fx %>% filter (Kod %in% wmt) %>% as.data.frame()
fxmw1 <- fxmw %>% select (c(Kod, Nazwa, contains("T"))) %>%
pivot_longer( cols = starts_with("T"), names_to = "rok",
names_prefix = "T", values_to = "value") %>% as.data.frame()
p6 <- ggplot(fxmw1, aes(x=as.Date(as.character(rok), format = "%Y"), y=value, color=Nazwa)) +
#p6 <- ggplot(fxmw1, aes(x=as.Date(as.character(rok), format = "%Y"), y=value)) +
geom_smooth(method="loess", se=F, span=spanV, size=.4) +
geom_point(size=.8, alpha=.5) +
#scale_y_log10()+
scale_x_date( labels = date_format("%y"), breaks = "2 years") +
xlab("year")+
ylab("lpop") +
scale_y_log10()
#facet_wrap(~Nazwa, scales = "fixed", ncol = 4)
p6
or the same in an alternate version:
Sex rate for selected municipalities/communities (gmina)
fx$woj <- substr (fx$Kod, 1, 2)
fx$gmina <- substr (fx$Kod, 5, 6)
fx$typgminy <- substr (fx$Kod, 7, 7)
# typgminy: 8: delegatura (W-wa) 9 = dzielnica (pomijamy)
fxg <- fx %>% filter(gmina != "00" & typgminy > 0 & typgminy < 8) %>%
select (Kod, Nazwa, F2019, M2019) %>% as.data.frame()
#nrow(fxg)
#str(fxg)
#
fxg$fr2019 <- fxg$F2019/fxg$M2019 * 100
fxg.min <- fxg %>% arrange(fr2019) %>% slice_head(n=20)
fxg.max <- fxg %>% arrange(fr2019) %>% slice_tail(n=20)
fxg.min
## Kod Nazwa F2019 M2019 fr2019
## 1 2211025 Jastarnia - obszar wiejski (5) 433 549 78.87067
## 2 2802055 Pieniężno - obszar wiejski (5) 1580 1891 83.55368
## 3 2213015 Czarna Woda - obszar wiejski (5) 172 200 86.00000
## 4 2816025 Orzysz - obszar wiejski (5) 1545 1783 86.65171
## 5 1426062 Paprotnia (2) 1189 1360 87.42647
## 6 3218025 Łobez - obszar wiejski (5) 1781 2016 88.34325
## 7 1424012 Gzy (2) 1762 1992 88.45382
## 8 2817085 Wielbark - obszar wiejski (5) 1635 1839 88.90701
## 9 1411052 Młynarze (2) 820 922 88.93709
## 10 2013025 Ciechanowiec - obszar wiejski (5) 1874 2090 89.66507
## 11 2006062 Turośl (2) 2421 2689 90.03347
## 12 1433092 Wierzbno (2) 1304 1446 90.17981
## 13 2215052 Gniewino (2) 3527 3909 90.22768
## 14 3212035 Lipiany - obszar wiejski (5) 913 1011 90.30663
## 15 1406042 Goszczyn (2) 1410 1561 90.32671
## 16 1422025 Chorzele - obszar wiejski (5) 3355 3707 90.50445
## 17 1422032 Czernice Borowe (2) 1806 1995 90.52632
## 18 0615072 Ulan-Majorat (2) 2853 3147 90.65777
## 19 1805062 Krempna (2) 886 975 90.87179
## 20 2809055 Orneta - obszar wiejski (5) 1517 1669 90.89275
fxg.max
## Kod Nazwa F2019 M2019 fr2019
## 1 0614011 Puławy (1) 25451 21966 115.8654
## 2 1406084 Nowe Miasto nad Pilicą - miasto (4) 1996 1721 115.9791
## 3 3207044 Międzyzdroje - miasto (4) 2870 2473 116.0534
## 4 1020031 Zgierz (1) 30196 25994 116.1653
## 5 1004011 Łęczyca (1) 7513 6458 116.3363
## 6 1214024 Koszyce - miasto (4) 419 359 116.7131
## 7 0663011 Lublin (1) 183075 156709 116.8248
## 8 1008021 Pabianice (1) 34896 29861 116.8615
## 9 1609042 Komprachcice (2) 4920 4200 117.1429
## 10 1465011 M.st.Warszawa od 2002 (1) 966319 824339 117.2235
## 11 2601044 Pacanów - miasto (4) 605 516 117.2481
## 12 2211024 Jastarnia - miasto (4) 1465 1243 117.8600
## 13 0208051 Polanica-Zdrój (1) 3419 2888 118.3864
## 14 2814064 Jeziorany - miasto (4) 1712 1441 118.8064
## 15 1061011 Łódź (1) 369987 309954 119.3684
## 16 0401021 Ciechocinek (1) 5786 4832 119.7434
## 17 3206064 Moryń - miasto (4) 888 741 119.8381
## 18 0614084 Nałęczów - miasto (4) 2055 1698 121.0247
## 19 2008014 Goniądz - miasto (4) 1012 836 121.0526
## 20 2405064 Sośnicowice - miasto (4) 1065 841 126.6350
Histogram:
p6h <- ggplot(fxg, aes(x = fr2019)) + geom_histogram(binwidth = 0.5) +
ggtitle ("Współczynnik fr")
p6h
Line chart for the 20 municipalities with the lowest feminization rate:
mfzone <- fxg.min$Kod
##mfzone
##length(mfzone)
fxmw <- fx %>% filter (Kod %in% mfzone) %>% as.data.frame()
##nrow(fxmw)
fxmw1f <- fxmw %>% select (c(Kod, Nazwa, contains("F"))) %>%
pivot_longer( cols = starts_with("F"), names_to = "rok",
names_prefix = "F", values_to = "value") %>% as.data.frame()
fxmw1m <- fxmw %>% select (c(Kod, Nazwa, contains("M"))) %>%
pivot_longer( cols = starts_with("M"), names_to = "rok",
names_prefix = "M", values_to = "value") %>% as.data.frame()
fxmw1fm <- left_join(fxmw1f, fxmw1m, by=c("Kod", "rok"))
fxmw1fm$fr <- fxmw1fm$value.x / fxmw1fm$value.y * 100
##fxmw1fm$fr
fxmw1fm <- fxmw1fm %>% group_by(Nazwa.x) %>%
arrange(desc(fr), .by_group = TRUE) %>% as.data.frame()
p6f <- ggplot(fxmw1fm, aes(x=as.Date(as.character(rok), format = "%Y"), y=fr)) +
geom_smooth(method="loess", se=F, span=spanV, size=.4) +
geom_point(size=.8, alpha=.5) +
#scale_y_log10()+
scale_x_date( labels = date_format("%y"), breaks = "2 years") +
xlab("year")+
ylab("fr") +
## scale_y_log10()
facet_wrap(~Nazwa.x, scales = "fixed", ncol = 4)
p6f
** World population according to US Census Bureau**. In section Population estimates and projections are given historical and forecasts up to 2100 for 228 countries. Especially There are two files for download in CSV format. One includes the annual age groups and relationship with this it is huge (300Mb.) The second one uses 5-year intervals. Despite this, it is also large (30Mb) and in relation to it with this file we download data for 16 selected countries of the world. This data is in the file idb5yr_16.csv
(less than 2Mb).
## 13 krajów bez NL, BE, UZ bo nie starczy kolorów na wykresie
countries <- c('CN', 'DE', 'ET', 'FR', 'IN', 'IR', 'NR', 'PK', 'PL',
'RU', 'SE', 'TR', 'US')
z0 <- read.csv("idb5yr_16.csv", sep = ';', header=T, na.string="NA" )
## US last forecast is 2060
z <- z0 %>% filter (GENC %in% countries)
The file contains almost 100 variables, so for the sake of order further for calculations, we choose only those that interest us (YR - year, GENC - iso code, NAME - name and POP - i.e. the total population):
z <- z %>% select (YR, GENC, NAME, POP) %>% as.data.frame()
Let us present the dynamics of the population as a relative size, in relation to 2015 population, expressed as a percentage:
z2015 <- z %>% filter (YR == 2015 ) %>% as.data.frame()
zp <- left_join(z, z2015, by=c("GENC"))
## Population as % of 2015
zp$pop <- zp$POP.x / zp$POP.y * 100
Chart (for some reason the forecast for the US is only until 2060):
colors16 <- c('CN' = "red", 'DE' = "red4", 'ET' = "orchid",
'FR' = "yellow", 'IN' = "orchid4", 'IR' = "palegreen", 'NR' = "powderblue",
'PK' = "purple1", 'PL' = "purple4", 'RU' = "pink",
'SE' = "yellow4", 'TR' = "black", 'US' = "palegreen4")
p3 <- ggplot(zp, aes(x=as.Date(as.character(YR.x), format = "%Y"), y=pop, color=GENC)) +
geom_line(size=.4, alpha=.6) +
geom_point(size=.8, alpha=.25) +
## wypisz ostatnią wartość POP
geom_text(data=. %>%
arrange(desc( as.Date(as.character(YR.x), format = "%Y") )) %>%
group_by(GENC) %>%
slice(1),
aes(label= sprintf("%.1f", pop)),
position=position_nudge(0.1, 0.5), hjust=.9, vjust=-0.6,
size=2, color='black', alpha=.4, show.legend=FALSE) +
scale_color_manual(values = colors16)
p3
Poland in the lead (if you reverse the table)
Using the same dataset (i.e. * Population estimates and projections * US Census Bureau) let’s compare the age structures for selected 16 countries. To do this, let’s modify the z0
frame:
## tylko zmienne dot grup wiekowych M/F ale bez wartości ogółem
z <- z0 %>% select("YR", "GENC", starts_with(c("MPOP", "FPOP"))) %>%
select(-MPOP, -FPOP) %>% as.data.frame()
We change to * long * data format, * filter * (leaving only the year 2020), and modify values of the variable grp
by removingPOP
(to make it look better in the printout):
zz <- z %>% pivot_longer(starts_with(c("MPOP", "FPOP")),
names_to = "grp", values_to = "value") %>%
filter (YR == 2020) %>%
mutate(grp=str_replace(grp, "POP", "")) %>% as.data.frame()
We add the variable sex
, the values of which are based on the value of the variablegrp
. We divide the values of the variable value
by 1000 (population in thousands):
zz$sex <- as.factor(substr(zz$grp,1,1))
zz$value <- zz$value/1000
##levels(as.factor(zz_sw$grp))
We set the order of the values of the variable grp
(thelevels
command is for this):
zz <- zz %>% mutate ( grp = factor(grp,
levels = c(
"M100_", "M95_99", "M90_94", "M85_89", "M80_84", "M75_79",
"M70_74", "M65_69", "M60_64", "M55_59", "M50_54", "M45_49",
"M40_44", "M35_39", "M30_34", "M25_29", "M20_24", "M15_19", "M10_14",
"M5_9", "M0_4",
"F0_4", "F5_9",
"F10_14", "F15_19", "F20_24", "F25_29", "F30_34", "F35_39",
"F40_44", "F45_49", "F50_54", "F55_59", "F60_64", "F65_69", "F70_74",
"F75_79", "F80_84", "F85_89", "F90_94", "F95_99", "F100_"
)))
##levels(as.factor(zz_sw$grp))
Finally the chart. Poor man’s population pyramids:
pf1 <- ggplot(zz, aes(x=grp)) +
geom_bar(aes(y=value, fill=sex),
stat="identity", position=position_dodge(width=.4), width=.8, alpha=.4) +
xlab(label="wiek") +
ylab(label="tys") +
theme(legend.position="top") +
coord_flip() +
facet_wrap(~GENC, scales = "free", ncol = 4) +
ggtitle("Piramida wieku", subtitle='')
pf1
Starting point: https://ec.europa.eu/eurostat/data/database main database / set of data can be found in section Database by themes (Tables by themes and the sections below contain selected more important data from the main database)
To access the data, you have to click through the areas/departments/subdivisions thematic. For example, it could be Population and Social conditions → Demography and Migration → Mortality The three icons at the beginning of the line mean we’ve arrived at the data table. Clicking on the yellow ZIP icon will download the complete table (it can be large); clicking on the Explorer icon (magnifying glass icon) will display the data with the possibility of choosing what we want to watch. After the icons there is a table title and after the title (in brackets) Table identifier
After clicking on the Explorer icon, a window similar to this:
The window is divided into 4 panels: menu (large icons above the horizontal blue line), content selection (table customization), content and explanations. Content selection panel contains check boxes allowing for define what should be displayed (click the blue plus icon to display form for selecting possible values). There are as many checkboxes as there are data dimensions; in the example above they are four dimensions: when (time), where (geo), co (month) and measure (unit). For each combination of dimensions, the appropriate set of values will be displayed (please try), which then can be downloaded (icon download from panel menu)
The complete database can be downloaded without clicking on it database identifier (located at the top right of the content selection panel in square brackets). For example:
## the base ID is demo_mmonth curl 'https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?file=data/demo_mmonth.tsv.gz'
Start point: World Health Data Platform / GHO / Themes / Topics / Indicator Groups (https://www.who.int/data/gho/data/themes/topics/indicator-groups/); List of indicators: https://www.who.int/data/gho/data/indicators/
For example: https://www.who.int/data/gho/data/indicators/indicator-details/GHO/gho-ghe-life-tables-lx-number-of-people-left-alive-at-age- x. The description of each indicator is provided in distress Metadata. Summary of all descriptions: https://www.who.int/data/gho/indicator-metadata-registry
There is also an API, but poorly described and probably not fully functional. Especially: download the list of indicators (no descriptions, but there are labels for the tables):
## Api ODATA API (json) curl https://ghoapi.azureedge.net/api/Indicator ## or (Athena API) curl https://apps.who.int/gho/athena/api/GHO ## Indicator Id + title (to find out what it is): curl https://apps.who.int/gho/athena/api/GHO?format=csv&profile=text
From this statement it can be concluded that lx-number-of-people-left-alive-at-age-x is an array identified as LIFE_0000000031
:
curl 'https://apps.who.int/gho/athena/api/GHO/LIFE_0000000031?format=csv> WHO_LT_lx.csv ## returns zero because (presumably) too many at once ## adding a condition helps: curl 'https://apps.who.int/gho/athena/api/GHO/LIFE_0000000031?format=csv&filter=COUNTRY:POL'> WHO_LT_lx_PL.csv ## but then a query that generates a lot of data results in an error: curl 'https://apps.who.int/gho/athena/api/GHO/LIFE_0000000031?format=csv&filter=YEAR:2019'> WHO_LT_lx_2019.csv
The API description is here https://www.who.int/data/gho/info/athena-api and https://www.who.int/data/gho/info/athena-api-examples. The base, however, looks strong dysfunctional since downloading more data looks impossible.
Starting point: http://demografia.stat.gov.pl/bazademografia/ and also the Local Data Bank https://bdl.stat.gov.pl/BDL/dane/podgrup/temat
To access the data, you need to click on the areas / departments / subdivisions until the big blue button with the word Next becomes clickable:
Then click Next to go to the content selection form (what is to be displayed):
and choose what we want to watch (dimensions) declaring relevant values in the selection lists displayed (in the example Years / Gender / Age). After you declare each dimension of the tumor will further become * clickable *. After pressing it, we go to the spatial dimension selection form: data aggregation level (Poland, voivodships, poviats and even communes). We go further:
The data is displayed and by pressing the Export button you can download in one of several proposed formats.
The CSO also provides detailed and up-to-date data on deaths from the website * Deaths by weeks * (https://stat.gov.pl/obszary-tematyczne/ludnosc/ludnosc/zgony-wedlug-tygodni,39,2.html) Data on deaths are shared in the format of a huge Excel spreadsheet (XLSX), and the direct link to the sheet is https://stat.gov.pl/download/gfx/portalinformacyjny/pl/defaultaktualnosci/5468/39/2/1/zgony_wedlug_tygodni_v2.zip
Our World in Data is an educational project aimed at showing * research and data to make progress against the world’s largest problems (research and data in the field of counteracting the world’s biggest problems: hunger, disease, social inequality and more; https://en.wikipedia.org/wiki/Our_World_in_Data or https://pl.wikipedia.org/wiki/Our_World_in_Data) By the way, this approach to the matter, according to some, causes that the project is not so much it is educational what indoctrinative. It is supposed to show that the world goes quickly in the right direction. (Strong criticism of this project from this point of view can be found here: https://www.lareviewofbooks.org/article/pinkers-pollyannish-philosophy-and-its-perfidious-politics/ ( Pinker’s Pollyannish Philosophy and Its Perfidious Politics ) and here: https://www.theguardian.com/commentisfree/2019/nov/22/progressive-politics-capitalism-unions-healthcare-education ( It’s not thanks to capitalism that we’re living longer, but progressive politics *))
The GTC provides access to, inter alia, data sets that are distinguished by their size. They are by definition data for all countries of the world, and in the time horizon reaching hundreds of years, what in many cases is an obvious lame (GDP calculation for Poland under partitions, for example https://ourworldindata.org/economic-growth)
Bearing this in mind and without entering into ideological disputes, what is the purpose of the project, just let’s use data (especially those relatively new, which give a greater guarantee that they are there true and not estimated - read guessing)
Note: OWiD is not a database but a collection of documents, i.e. websites. In these documents there are eye-catching interactive charts plus comments and descriptions of it what’s in the charts of course. Bottom each graph has a set of buttons for selecting what to display ora download button for data download.
Start point: https://ourworldindata.org/
So the American Census Bureau, but also providing various other interesting data, in particular * International Data Base (IDB) Population estimates and projections for 228 countries and areas * https://www.census.gov/data-tools/demo/idb/#/country?YR_ANIM=2021
Interestingly, there are some citizens of Polish origin living in the USA there are pages in Polish (apart from Chinese, Korean, Vietnamese, Russian, French, Portuguese, Spanish, Arabic, Filipino, Creole; there is no Italian and German, but :-)) and how: https://www.census.gov/newsroom/press-releases/2020/2020-census-data-collection-ending/2020-census-data-collection-ending-polish.html
Glossary of epidemiological terms http://www.przeglepidemiol.pzh.gov.pl/slowniczek-terminow-epidemiologiczne
Dictionary of terms (GUS) https://stat.gov.pl/metainformacje/slownik-pojec/
A lot of information is available on the CSO website. There are basically two books in Poland: Holzer’s classic position and the new book Okólski / Fihel, but I do not recommend the latter.
[H2003] Holzer J., Demografia, PWE Warsaw 2003
[PB2010] Poston DL and Bouvier LF, An Introduction to Demography, Cambridge University Press, Cambridge 2010