Math 141 Project 1 Solutions
Fall 2025
Learning Objectives
By the end of this project you will:
- Explore a complicated, real-life data set
- Use
fitModel
to obtain the best-fit model of a given data set and estimate parameters - Interpret parameters from
fitModel
and apply models to real-life scenarios
Admin
All questions for this project will be answered in the Project 1 Gradescope assignment. There are two parts to this project, parts a and b. Read through this guidance to find the relevant material and R commands, as well as the required tasks, but note that the only deliverables for this project are to answer the questions in the Gradescope assignments and to upload an R Script at the end of each Gradescope assignment.
Introduction
In this project, we will explore the Cobb-Douglas production function, a function used to model the relationship between manufacturing output and two inputs: labor and capital. In its basic form, the Cobb-Douglas function is the multi-variable function:
\[ GDP(L,K) = AL^\alpha K^\beta \]
where the output \(GDP\) is the gross domestic product of a country, \(L\) represents labor, and \(K\) represents capital. For this project we will use the \(GDP\) of a country in millions of US dollars (USD) for the \(GDP\) variable. The labor variable will use the population of a country in millions of people and the capital variable will use the capital stock of a country in millions of USD. The values \(A\), \(\alpha\) and \(\beta\) are parameters in this model that describe the underlying process. We will use what we know about power functions and modeling to fit this model to some observed data, and to approximate and interpret the resulting coefficients.
Answer questions 2.1-2.6 in Gradescope.
Data
The Cobb-Douglas function is often used to model manufacturing output for a particular sector or industry, but in this project, we will use it to model the whole country’s GDP as a function of a country’s population (labor) and a country’s capital stock per capita (capital). The Penn World Table is a database containing various economic indicators for 183 countries between 1959 and 2019. The pwt_euro
data set contains a subset of this database: only countries in Europe. The other two data sets you will use are pwt_africa
and pwt_asia
which contain a subset of countries in Africa and Asia, respectively. These data sets have already been loaded into R and will be explored in the rest of the project.
## Warning: package 'tidyverse' was built under R version 4.4.1
## Warning: package 'purrr' was built under R version 4.4.2
## Warning: package 'lubridate' was built under R version 4.4.2
Part a
Read through and complete all questions in the Project 1-part a assignment in Gradescope before moving to part b. You need to ensure you have correct models from part a before moving onto part b. There will be no CFD in part b for incorrect models.
Previewing and Introducing the Data
Before beginning any project with data, it is important to preview your data and ensure it has been imported correctly. The data sets you will be working with have already been loaded into Posit Cloud. Use the following commands to view the help file for each data set.
Use the commands below to preview your data and answer questions 3.1-3.6 in Gradescope.
When you view your data, you may notice NAs in your first several rows. This is not an error. Why do you think Albania is missing data prior to 1970?
Solutions:
## # A tibble: 6 × 7
## isocode country year rgdpo pop rnna region
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 ALB Albania 1950 NA NA NA Europe
## 2 ALB Albania 1951 NA NA NA Europe
## 3 ALB Albania 1952 NA NA NA Europe
## 4 ALB Albania 1953 NA NA NA Europe
## 5 ALB Albania 1954 NA NA NA Europe
## 6 ALB Albania 1955 NA NA NA Europe
## # A tibble: 6 × 7
## isocode country year rgdpo pop rnna region
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 UKR Ukraine 2014 522991. 45.1 6730916. Europe
## 2 UKR Ukraine 2015 466740. 44.9 6632538 Europe
## 3 UKR Ukraine 2016 488486. 44.7 6554077 Europe
## 4 UKR Ukraine 2017 537900. 44.5 6488015 Europe
## 5 UKR Ukraine 2018 561325. 44.2 6434596. Europe
## 6 UKR Ukraine 2019 578290. 44.0 6392188 Europe
Visualizing the Data
We would like to visualize the relationship between the output GDP and two input variables, population and capital stock per capita (our measure of capital). In each of the data sets, these are labeled as rgdpo
, pop
, and rnna
, respectively.
Thus far, we have explored various ways to plot data. In this project, we will use plot_ly
to plot the data. plotly
is a widely used library for making professional graphics. The syntax for plotting this data is different than we are used to, but not impenetrable. The first argument is the data set, the next argument tells us which axis of the data set to use for the x-coordinates of points, the next the y-coordinates, and so on. After the plot_ly
command we are just setting options to make our graph look nice.
Copy and paste the command below to produce a 3-D plot of the data.
plotly::plot_ly(pwt_euro, x = ~pop, y = ~rnna, z = ~rgdpo) %>% plotly::add_markers(size=12) %>%
plotly::layout(scene = list(xaxis = list(title = 'Population'),
yaxis = list(title = 'Capital Investment'),
zaxis = list(title = 'GDP')))
Take a moment to view this plot. Note that you can use your mouse to rotate the plot and zoom in and out.
Determine Appropriate Model
Now that we’ve visualized our data, let’s determine whether the Cobb-Douglas model is appropriate for this relationship. First, let’s take a closer look at the Cobb-Douglas Production Function.
The Function
Recall that when applied to GDP, the Cobb-Douglas Production Function looks like this:
\[ GDP(L,K) = AL^\alpha K^\beta \]
As a reminder, answer questions 2.1-2.6 to specify whether each of the terms in this equation is an input variable, output variable, or a parameter.
Solution: \(L\) and \(K\) are input variables. \(GDP\) is the output. \(A\), \(\alpha\) and \(\beta\) are parameters.
Type of Function
This function looks like a multivariable version of which of the functions we’ve explored so far?
Answer question 4 in Gradescope.
Solution: Power function.
Log of the Function
Take the logarithm of both sides of this equation. Is the log of GDP a linear function of the log of the inputs?
Answer question 5 in Gradescope.
Solution:
(MULTIPLE CHOICE Q WITH TWO YES, TWO NO, WITH CORRECT/INCORRECT FORM OF LOG OF FUNCTION)
Yes; \(\log(GDP) = \log(A) + \alpha\log(L) + \beta\log(K)\).
Plot Transformed Data
Just like in the single variable case, we can plot the transformed data to determine whether our multivariable power function is appropriate. Since we have two inputs, we’ll build a log-log-log plot. Use the plot_ly
function to do this. Based on your generated plot, is it reasonable to use a multivariate power function to model the data? Note that in the multivariable case, we’re looking for a plane, not a straight line. You’ll likely have to rotate the plot to see if such a plane exists.
Copy and paste the command below to produce a 3-D plot of the transformed data.
plotly::plot_ly(pwt_euro, x = ~log(pop), y = ~log(rnna), z = ~log(rgdpo)) %>% plotly::add_markers(size=12) %>%
plotly::layout(scene = list(xaxis = list(title = 'log(Population)'),
yaxis = list(title = 'log(Capital Investment)'),
zaxis = list(title = 'log(GDP)')))
Answer questions 5.1 and 5.2 in Gradescope.
Solution: There appears to be a rough linear relationship between the log of GDP and the log of the two inputs, population and capital. It’s not perfect, and there is some spread, but it appears that a linear relationship is reasonable for the plot of transformed data. For this reason, a power model seems resonable for the data.
Fitting Models
Regardless of how you answered the questions above, use fitModel
to fit the log of GDP versus the log of population and the log of capital for the pwt_euro
data set. Report the best-fit results for \(\alpha\) and \(\beta\).
Answer question 6 in Gradescope. Round to three decimal places.
Solution:
bestFitModel_euro = fitModel(log(rgdpo)~log_A+alpha*log(pop)+beta*log(rnna), data=pwt_euro)
coef(bestFitModel_euro)
## log_A alpha beta
## 1.3724421 0.2401262 0.7402487
The estimate of \(\alpha\) is 0.240 and the estimate of \(\beta\) is 0.740.
Repeat the model fitting process for the Africa and Asia datasets and record the best fit \(\alpha\) and \(\beta\) for each of the models in Gradescope.
Solution:
bestFitModel_africa = fitModel(log(rgdpo)~log_A+alpha*log(pop)+beta*log(rnna), data=pwt_africa)
coef(bestFitModel_africa)
## log_A alpha beta
## 3.4524072 0.4365498 0.5126865
The estimate of \(\alpha\) is 0.437 and the estimate of \(\beta\) is 0.513.
bestFitModel_asia = fitModel(log(rgdpo)~log_A+alpha*log(pop)+beta*log(rnna), data=pwt_asia)
coef(bestFitModel_asia)
## log_A alpha beta
## 1.4661659 0.2674299 0.7353223
The estimate of \(\alpha\) is 0.267 and the estimate of \(\beta\) is 0.735.
Part b: Interpretation and Application of your Models
Inference and Prediction
Now that you have created your models, lets use them for insights and prediction. All questions should be answered in gradescope.
Using your models from part b, answer question 2 in Gradescope.
Solution: What is the estimated GDP for an African country with a population (Labor) of 9.614 million and capital stock (Capital) of 456745.875 million?
## [1] 67621.81
This estimation for GDP is an example of using our model for prediction (interpolation).
You want to estimate the GDP for an African country with values that are outside what we have observed so far in the data set. What is the estimated GDP for an African country with a population (Labor) of 300 million and capital stock (Capital) of 200 million?
## [1] 5760.435
This estimation for GDP is an example of using our model for prediction (extrapolation).
In which region does Labor have the greatest effect on GDP compared to the other regions? Africa
In which region does Capital have the greatest effect on GDP compared to the other regions? Europe
Using your model to compare parameters to investigate the influence of labor on a regions GDP is an example of using your model for INFERENCE.
Interpretation of Coefficients
So how do we interpret our parameter estimates? The values \(\alpha\) and \(\beta\) measure the increase in GDP that would result from an increase in population or capital, respectively, holding the other variable constant. The coefficient \(A\) also has an interpretation, but we will not focus on that in this project.
Follow up Questions
For a 10 percent increase in population in a African country, according to this model, by what percent do we expect GDP to increase (leaving capital constant)? For a 10 percent increase in capital, by what percent do we expect GDP to increase (leaving population constant)?
Answer in Gradescope. Round to one decimal place.
Solution:
There are a couple ways to solve this. You can test using two different points that have the 10% increase in either population or capital and compare the resulting GDPs. Or you can calculate 1.1^(alpha)-1 to calculate how a 10% increase in labor would affect GDP.
For a 10 percent increase in population, we would expect a 4.2 percent increase in GDP.
## [1] 0.04248537
## [1] 0.04248537
For a 10 percent increase in capital, we would expect a 5.0 percent increase in GDP.
## [1] 0.05007779
## [1] 0.05007779