Learning Objectives

By the end of this project you will:

  • Explore a complicated, real-life data set
  • Use fitModel to obtain the best-fit model of a given data set and estimate parameters
  • Interpret parameters from fitModel and apply models to real-life scenarios

Admin

All questions for this project will be answered in the Project 1 Gradescope assignment. There are two parts to this project, parts a and b. Read through this guidance to find the relevant material and R commands, as well as the required tasks, but note that the only deliverables for this project are to answer the questions in the Gradescope assignments and to upload an R Script at the end of each Gradescope assignment.

Introduction

In this project, we will explore the Cobb-Douglas production function, a function used to model the relationship between manufacturing output and two inputs: labor and capital. In its basic form, the Cobb-Douglas function is the multi-variable function:

\[ GDP(L,K) = AL^\alpha K^\beta \]

where the output \(GDP\) is the gross domestic product of a country, \(L\) represents labor, and \(K\) represents capital. For this project we will use the \(GDP\) of a country in millions of US dollars (USD) for the \(GDP\) variable. The labor variable will use the population of a country in millions of people and the capital variable will use the capital stock of a country in millions of USD. The values \(A\), \(\alpha\) and \(\beta\) are parameters in this model that describe the underlying process. We will use what we know about power functions and modeling to fit this model to some observed data, and to approximate and interpret the resulting coefficients.

Answer questions 2.1-2.6 in Gradescope.

Data

The Cobb-Douglas function is often used to model manufacturing output for a particular sector or industry, but in this project, we will use it to model the whole country’s GDP as a function of a country’s population (labor) and a country’s capital stock per capita (capital). The Penn World Table is a database containing various economic indicators for 183 countries between 1959 and 2019. The pwt_euro data set contains a subset of this database: only countries in Europe. The other two data sets you will use are pwt_africa and pwt_asia which contain a subset of countries in Africa and Asia, respectively. These data sets have already been loaded into R and will be explored in the rest of the project.

## Warning: package 'tidyverse' was built under R version 4.4.1
## Warning: package 'purrr' was built under R version 4.4.2
## Warning: package 'lubridate' was built under R version 4.4.2

Part a

Read through and complete all questions in the Project 1-part a assignment in Gradescope before moving to part b. You need to ensure you have correct models from part a before moving onto part b. There will be no CFD in part b for incorrect models.

Previewing and Introducing the Data

Before beginning any project with data, it is important to preview your data and ensure it has been imported correctly. The data sets you will be working with have already been loaded into Posit Cloud. Use the following commands to view the help file for each data set.

?pwt_euro
?pwt_africa
?pwt_asia

Use the commands below to preview your data and answer questions 3.1-3.6 in Gradescope.

head(pwt_euro)
View(pwt_euro)
tail(pwt_euro)

When you view your data, you may notice NAs in your first several rows. This is not an error. Why do you think Albania is missing data prior to 1970?

Solutions:

head(pwt_euro) #Albania
## # A tibble: 6 × 7
##   isocode country  year rgdpo   pop  rnna region
##   <chr>   <chr>   <dbl> <dbl> <dbl> <dbl> <chr> 
## 1 ALB     Albania  1950    NA    NA    NA Europe
## 2 ALB     Albania  1951    NA    NA    NA Europe
## 3 ALB     Albania  1952    NA    NA    NA Europe
## 4 ALB     Albania  1953    NA    NA    NA Europe
## 5 ALB     Albania  1954    NA    NA    NA Europe
## 6 ALB     Albania  1955    NA    NA    NA Europe
tail(pwt_euro) #Ukraine
## # A tibble: 6 × 7
##   isocode country  year   rgdpo   pop     rnna region
##   <chr>   <chr>   <dbl>   <dbl> <dbl>    <dbl> <chr> 
## 1 UKR     Ukraine  2014 522991.  45.1 6730916. Europe
## 2 UKR     Ukraine  2015 466740.  44.9 6632538  Europe
## 3 UKR     Ukraine  2016 488486.  44.7 6554077  Europe
## 4 UKR     Ukraine  2017 537900.  44.5 6488015  Europe
## 5 UKR     Ukraine  2018 561325.  44.2 6434596. Europe
## 6 UKR     Ukraine  2019 578290.  44.0 6392188  Europe
#number of observations=2730
#variable for labor: 'pop'
#variable for capital: 'rnna'
#variable for GDP: 'rgdpo'

Visualizing the Data

We would like to visualize the relationship between the output GDP and two input variables, population and capital stock per capita (our measure of capital). In each of the data sets, these are labeled as rgdpo, pop, and rnna, respectively.

Thus far, we have explored various ways to plot data. In this project, we will use plot_ly to plot the data. plotly is a widely used library for making professional graphics. The syntax for plotting this data is different than we are used to, but not impenetrable. The first argument is the data set, the next argument tells us which axis of the data set to use for the x-coordinates of points, the next the y-coordinates, and so on. After the plot_ly command we are just setting options to make our graph look nice.

Copy and paste the command below to produce a 3-D plot of the data.

plotly::plot_ly(pwt_euro, x = ~pop, y = ~rnna, z = ~rgdpo) %>% plotly::add_markers(size=12) %>%
  plotly::layout(scene = list(xaxis = list(title = 'Population'),
                      yaxis = list(title = 'Capital Investment'),
                      zaxis = list(title = 'GDP')))

Take a moment to view this plot. Note that you can use your mouse to rotate the plot and zoom in and out.

Determine Appropriate Model

Now that we’ve visualized our data, let’s determine whether the Cobb-Douglas model is appropriate for this relationship. First, let’s take a closer look at the Cobb-Douglas Production Function.

The Function

Recall that when applied to GDP, the Cobb-Douglas Production Function looks like this:

\[ GDP(L,K) = AL^\alpha K^\beta \]

As a reminder, answer questions 2.1-2.6 to specify whether each of the terms in this equation is an input variable, output variable, or a parameter.

Solution: \(L\) and \(K\) are input variables. \(GDP\) is the output. \(A\), \(\alpha\) and \(\beta\) are parameters.

Type of Function

This function looks like a multivariable version of which of the functions we’ve explored so far?

Answer question 4 in Gradescope.

Solution: Power function.

Log of the Function

Take the logarithm of both sides of this equation. Is the log of GDP a linear function of the log of the inputs?

Answer question 5 in Gradescope.

Solution:

(MULTIPLE CHOICE Q WITH TWO YES, TWO NO, WITH CORRECT/INCORRECT FORM OF LOG OF FUNCTION)

Yes; \(\log(GDP) = \log(A) + \alpha\log(L) + \beta\log(K)\).

Plot Transformed Data

Just like in the single variable case, we can plot the transformed data to determine whether our multivariable power function is appropriate. Since we have two inputs, we’ll build a log-log-log plot. Use the plot_ly function to do this. Based on your generated plot, is it reasonable to use a multivariate power function to model the data? Note that in the multivariable case, we’re looking for a plane, not a straight line. You’ll likely have to rotate the plot to see if such a plane exists.

Copy and paste the command below to produce a 3-D plot of the transformed data.

plotly::plot_ly(pwt_euro, x = ~log(pop), y = ~log(rnna), z = ~log(rgdpo)) %>%   plotly::add_markers(size=12) %>%
  plotly::layout(scene = list(xaxis = list(title = 'log(Population)'),
                      yaxis = list(title = 'log(Capital Investment)'),
                      zaxis = list(title = 'log(GDP)')))

Answer questions 5.1 and 5.2 in Gradescope.

Solution: There appears to be a rough linear relationship between the log of GDP and the log of the two inputs, population and capital. It’s not perfect, and there is some spread, but it appears that a linear relationship is reasonable for the plot of transformed data. For this reason, a power model seems resonable for the data.

Fitting Models

Regardless of how you answered the questions above, use fitModel to fit the log of GDP versus the log of population and the log of capital for the pwt_euro data set. Report the best-fit results for \(\alpha\) and \(\beta\).

Answer question 6 in Gradescope. Round to three decimal places.

Solution:

bestFitModel_euro = fitModel(log(rgdpo)~log_A+alpha*log(pop)+beta*log(rnna), data=pwt_euro)
coef(bestFitModel_euro)
##     log_A     alpha      beta 
## 1.3724421 0.2401262 0.7402487

The estimate of \(\alpha\) is 0.240 and the estimate of \(\beta\) is 0.740.

Repeat the model fitting process for the Africa and Asia datasets and record the best fit \(\alpha\) and \(\beta\) for each of the models in Gradescope.

Solution:

bestFitModel_africa = fitModel(log(rgdpo)~log_A+alpha*log(pop)+beta*log(rnna), data=pwt_africa)
coef(bestFitModel_africa)
##     log_A     alpha      beta 
## 3.4524072 0.4365498 0.5126865

The estimate of \(\alpha\) is 0.437 and the estimate of \(\beta\) is 0.513.

bestFitModel_asia = fitModel(log(rgdpo)~log_A+alpha*log(pop)+beta*log(rnna), data=pwt_asia)
coef(bestFitModel_asia)
##     log_A     alpha      beta 
## 1.4661659 0.2674299 0.7353223

The estimate of \(\alpha\) is 0.267 and the estimate of \(\beta\) is 0.735.

Part b: Interpretation and Application of your Models

Inference and Prediction

Now that you have created your models, lets use them for insights and prediction. All questions should be answered in gradescope.

Using your models from part b, answer question 2 in Gradescope.

Solution: What is the estimated GDP for an African country with a population (Labor) of 9.614 million and capital stock (Capital) of 456745.875 million?

bestFitModel_africa(9.614,456745.875)
## [1] 67621.81

This estimation for GDP is an example of using our model for prediction (interpolation).

You want to estimate the GDP for an African country with values that are outside what we have observed so far in the data set. What is the estimated GDP for an African country with a population (Labor) of 300 million and capital stock (Capital) of 200 million?

bestFitModel_africa(300,200)
## [1] 5760.435

This estimation for GDP is an example of using our model for prediction (extrapolation).

In which region does Labor have the greatest effect on GDP compared to the other regions? Africa

In which region does Capital have the greatest effect on GDP compared to the other regions? Europe

Using your model to compare parameters to investigate the influence of labor on a regions GDP is an example of using your model for INFERENCE.

Interpretation of Coefficients

So how do we interpret our parameter estimates? The values \(\alpha\) and \(\beta\) measure the increase in GDP that would result from an increase in population or capital, respectively, holding the other variable constant. The coefficient \(A\) also has an interpretation, but we will not focus on that in this project.

Follow up Questions

For a 10 percent increase in population in a African country, according to this model, by what percent do we expect GDP to increase (leaving capital constant)? For a 10 percent increase in capital, by what percent do we expect GDP to increase (leaving population constant)?

Answer in Gradescope. Round to one decimal place.

Solution:

There are a couple ways to solve this. You can test using two different points that have the 10% increase in either population or capital and compare the resulting GDPs. Or you can calculate 1.1^(alpha)-1 to calculate how a 10% increase in labor would affect GDP.

For a 10 percent increase in population, we would expect a 4.2 percent increase in GDP.

(bestFitModel_africa(110,100)-bestFitModel_africa(100,100))/bestFitModel_africa(100,100)
## [1] 0.04248537
1.1^(0.4365498)-1
## [1] 0.04248537

For a 10 percent increase in capital, we would expect a 5.0 percent increase in GDP.

(bestFitModel_africa(100,110)-bestFitModel_africa(100,100))/bestFitModel_africa(100,100)
## [1] 0.05007779
1.1^(0.5126865)-1
## [1] 0.05007779