Learning Objectives

By the end of this project you will:

Explore a complex, real-life data set
Use fitModel to obtain the best-fit model of a given data set and estimate parameters
Interpret parameters from fitModel and apply models to real-life scenarios

Admin

All questions for this project will be answered in the Project 1 Gradescope assignments. There are two parts to this project, parts a and b. Read through this guidance to find the relevant material and R commands, as well as the required tasks, but note that the only deliverables for this project are to answer the questions in the Gradescope assignments and to upload an R Script at the end of each Gradescope assignment.

Introduction

In this project, we will explore the Cobb-Douglas production function, a function used to model the relationship between manufacturing output and two inputs: labor and capital. In its basic form, the Cobb-Douglas function is the multi-variable function:

\[ GDP(L,K) = AL^\alpha K^\beta \]

where the output \(GDP\) is the gross domestic product of a country, \(L\) represents labor, and \(K\) represents capital. For this project we will use the \(GDP\) of a country in millions of US dollars (USD) for the \(GDP\) variable. The labor variable will use the population of a country in millions of people and the capital variable will use the capital stock, or value of assets, of a country in millions of USD. The values \(A\), \(\alpha\) and \(\beta\) are parameters in this model that describe the underlying process. We will use what we know about power functions and modeling to fit this model to some observed data, and to approximate and interpret the resulting coefficients.

Answer questions 2.1-2.6 in Gradescope.

Data

The Cobb-Douglas function is often used to model manufacturing output for a particular sector or industry, but in this project, we will use it to model the whole country’s GDP as a function of a country’s population (labor) and a country’s capital stock per capita (capital). The Penn World Table is a database containing various economic indicators for 183 countries between 1959 and 2019. The pwt_euro data set contains a subset of this database: only countries in Europe. The other two data sets you will use are pwt_africa and pwt_asia which contain a subset of countries in Africa and Asia, respectively. These data sets have already been loaded into R and will be explored in the rest of the project.

Part a

Read through and complete all questions in the Project 1-part a assignment in Gradescope before moving to part b. You need to ensure you have correct models from part a before moving onto part b. There will be no partial credit in part b for using incorrect models.

Previewing and Introducing the Data

Before beginning any project with data, it is important to preview your data and ensure it has been imported correctly. The data sets you will be working with have already been loaded into Posit Cloud. Use the following commands to view the help file for each data set.

?pwt_euro
?pwt_africa
?pwt_asia

Use the commands below to preview your data and answer questions 3.1-3.6 in Gradescope.

head(pwt_euro)
View(pwt_euro)
tail(pwt_euro)

When you view your data, you may notice NAs in your first several rows. This is not an error. Why may some countries have NAs in the first part of their data?

Repeat the above commands for pwt_asia and pwt_africa datasets.

How do these datasets compare to the European countries’ data?

Visualizing the Data

We would like to visualize the relationship between the output GDP and two input variables, population and capital stock per capita (our measure of capital). In each of the data sets, these are labeled as rgdpo, pop, and rnna, respectively.

Thus far, we have explored various ways to plot data. In this project, we will use plot_ly to plot the data. plotly is a widely used library for making professional graphics. The syntax for plotting this data is different than we are used to, but we can apply what we have learned in the course to figure out this new command. The first argument is the data set, the next argument tells us which axis of the data set to use for the x-coordinates of points, the next the y-coordinates, and so on. After the plot_ly command we are just setting options to make our graph look nice.

Copy and paste the command below to produce a 3-D plot of the data.

plotly::plot_ly(pwt_africa, x = ~pop, y = ~rnna, z = ~rgdpo) %>% plotly::add_markers(size=12) %>%
  plotly::layout(scene = list(xaxis = list(title = 'Population'),
                      yaxis = list(title = 'Capital Investment'),
                      zaxis = list(title = 'GDP')))

Take a moment to view this plot. Note that you can use your mouse to rotate the plot and zoom in and out.

Determine Appropriate Model

Now that we’ve visualized our data, let’s determine whether the Cobb-Douglas model is appropriate for this relationship. First, let’s take a closer look at the Cobb-Douglas Production Function.

The Function

Recall that when applied to GDP, the Cobb-Douglas Production Function looks like this:

\[ GDP(L,K) = AL^\alpha K^\beta \]

As a reminder, answer questions 2.1-2.6 to specify whether each of the terms in this equation is an input variable, output variable, or a parameter.

Type of Function

This function looks like a multivariable version of which of the functions we’ve explored so far?

Answer question 4.1 in Gradescope.

Log of the Function

Take the logarithm of both sides of this equation. Is the log of GDP a linear function of the log of the inputs?

Answer question 4.2 in Gradescope.

Plot Transformed Data

Just like in the single variable case, we can plot the transformed data to determine whether our multivariable power function is appropriate. Since we have two inputs, we’ll build a log-log-log plot. Use the plot_ly function to do this. Based on your generated plot, is it reasonable to use a multivariate power function to model the data? Note that in the multivariable case, we’re looking for a plane, not a straight line. You’ll likely have to rotate the plot to see if such a plane exists.

Copy and paste the command below to produce a 3-D plot of the transformed data.

plotly::plot_ly(pwt_africa, x = ~log(pop), y = ~log(rnna), z = ~log(rgdpo)) %>%   plotly::add_markers(size=12) %>%
  plotly::layout(scene = list(xaxis = list(title = 'log(Population)'),
                      yaxis = list(title = 'log(Capital Investment)'),
                      zaxis = list(title = 'log(GDP)')))

Answer questions 4.3 and 4.4 in Gradescope.

Fitting Models

Regardless of how you answered the questions above, use fitModel to fit the log of GDP versus the log of population and the log of capital for the pwt_euro data set. Report the best-fit results for \(\alpha\) and \(\beta\).

Answer question 5 in Gradescope. Round to three decimal places.

Repeat the model fitting process for the Africa and Asia data sets and record the best fit \(\alpha\) and \(\beta\) for each of the models in Gradescope (questions 6 and 7).

This concludes the end of part a. Be sure to check your models from part a before moving onto part b. There will be no partial credit for using incorrect models in part b.

Part b: Interpretation and Application of Models

Inference and Prediction

Now that you have created your models, let’s use them for insights and prediction. All questions should be answered in Gradescope.

Using your models from part a, answer question 2 in Gradescope.

What is the estimated GDP for an African country with a population (Labor) of 9.614 million people and capital stock (Capital) of 456745.875 million USD?
You want to estimate the GDP for an African country with values that are outside what we have observed so far in the data set. What is the estimated GDP for an African country with a population (Labor) of 300 million and capital stock (Capital) of 200 million?
In which region does Labor have the greatest effect on GDP compared to the other regions?
In which region does Capital have the greatest effect on GDP compared to the other regions?

Interpretation of Coefficients

So how do we interpret our parameter estimates? The values \(\alpha\) and \(\beta\) measure the increase in GDP that would result from an increase in population or capital, respectively, holding the other variable constant. The coefficient \(A\) also has an interpretation, but we will not focus on that in this project.

Follow up Questions

For a 10 percent increase in population in a African country, according to this model, by what percent do we expect GDP to increase (leaving capital constant)? For a 10 percent increase in capital, by what percent do we expect GDP to increase (leaving population constant)?

Answer 3.1 and 3.2 in Gradescope. Round to one decimal place.

Math 141 Project 1 Guidance File