Chapter 2 Block 1

2.1 Lesson 1: Introduction to Math 141 and RStudio

2.1.1 Objectives

  1. Familiarize yourself with the RStudio project workspace (RScript, console, environment, history, plots, help, etc.).

  2. Compute quantities in R using basic operations (addition, subtraction, multiplication, division, exponents).

  3. Understand naming rules in R, assign names to quantities, and identify where those quantities are stored in the environment.

  4. Call the name of a stored quantity to show the quantity in the console and perform calculations using previously stored quantities.

2.1.2 Reading

None for today. Students should start reading all of Chapter 1, which should be review.

2.1.3 Note

Lesson 1 can be a whirlwind. Ideally you have already covered the syllabus on Day 0. If you have not, review that. Then make sure everyone has access to the necessary stuff, (Gradescope, posit.cloud, Teams). Explain how we are using posit.cloud this year, and that installing R/RStudio locally is optional. Then launch into R stuff. Ideally this will be a little demonstration by you, and a lot of practice by them. Don’t let them copy and paste your code. Especially for these basic commands, they need to be able to this without constantly referring to notes.

Depending on how fast you go through class material, and how quickly you get up and going on PositCloud, you might run out of material. If that happens, of course go on to Day 2. If things go poorly, you might run out of time. Not a big deal, just merge in with Day 2 stuff.

2.1.4 In class

  1. Go over the layout of the course and the expectations for homework. Have the cadets ensure they have access to posit.cloud, the Teams page and the Gradescope course.

  2. Explain that a local installation of R/RStudio is not required. Point them to the instructions for installing locally if they want a backup. Don’t devote class time to it, but be ready for EI requests. You can send some to me if you want.

  3. Make sure that everyone has access to posit.cloud and has copied the Math 141 Base Image assignment to their own private copy.

  4. Open posit.cloud and start going.

    1. Show how to calculate simple things in R using the command line. 8+3, 2*4, etc. Show trig functions, sin, cos, tan. All arguments are in radians. Show powers, 3^6, and calculate exponentials. They have very likely all seen Euler’s constant, \(e\), at some point. In order to calculate \(e^{p}\) you type exp(p). So to compute \(e^{3}\) you type exp(3). It is very common to type e^3 which will not do what is intended.

    2. Show the history. On the command line, hit the up arrow, it will let you scroll through your last commands. You can edit them and then run. Much faster than typing every time.

    3. Introduce saving calculations to variables. Use = for assignment in R, to match the book. Do stuff like a=sin(8), b=exp(4), c=a+b, etc.

    4. Point out the environment window, you can see all variables you have defined and their values in this window. Note what happens if you try to calculate f+7 if f is not defined. Note how you can see that there is no f in the environment window.

2.1.5 R Commands

No commands per-se, but basic familiarity.

2.1.6 Problems & Activities

  1. Use R to compute several things. Do some by both typing in the terminal and by creating an R script.

    1. (-4)*3-2
    -4*3-2
    ## [1] -14
    1. \(\left| 8*5^{2} - 5*8^{2} \right|\) (use abs)
    abs(8*5^2-5*8^2)
    ## [1] 120
    1. \(5^{- 3} + 6^{3*( - 3)}\ \)
    5^(-3)+6^(3*(-3))
    ## [1] 0.008000099
    1. \({(\pi}^{2} - \ \sqrt[3]{7})/1.41\ \) use the 1/3 power for the cube root, use pi for \(\pi\).
    (pi^2-7^(1/3))/1.41
    ## [1] 5.643031
    1. \(\frac{1}{e}\) either use exp(-1) or 1/(exp(1)).
    1/exp(1)
    ## [1] 0.3678794
    exp(-1)
    ## [1] 0.3678794
  2. Do each of the above, but store the results in variables. So a = ( - 4)*3 - 2, etc. Compute

    1. \(ab - c\)
    a*b-c
    ## [1] -1680.008
    1. \(b + c + 2\)
    b+c+2
    ## [1] 122.008
    1. \(f + 11\) this should fail. \(f\) will not be defined for them. The goal here is to demonstrate that you should look at the environment to see what is what if you are having problems.
    f+11
    1. Define \(f = a^{b}\). Now go back and compute f+11.
    f=a^b
    f+11
    ## [1] 3.430554e+137

2.2 Lesson 2: Computing and Graphing in R

2.2.1 Objectives

  1. Create an Rscript, add and run lines of code to the RScript, and save the RScript.

  2. Understand the difference between running code from the console versus an RScript and when you should use each method.

  3. Be able to run code by:

    1. using ctrl+enter,

    2. highlighting multiple lines and running all

    3. using the "run" button.

  4. In R, use the makeFun command to define a function and the plotFun command to plot a function.

  5. In R, evaluate a previously defined function at a given input.

  6. In R, use the plotPoints command to graph data from a dataset.

2.2.2 Reading

1.1

2.2.3 Note

The first HW is assigned this lesson. Please remind students where they can find the HW and the deadline for submission.

2.2.4 In class

Ideally you can get through these things pretty quickly, and then let them do a bunch of practice on their own.

  1. Continue familiarizing with R.

    1. Show how to use R scripts to type/edit commands. Show

      1. Type command, put cursor on the command you want to run. Hit ctrl+enter or ctrl+r.

      2. Type command, put cursor on the command you want to run. Hit the run button.

      3. Highlight multiple lines, hit ctrl+r or ctrl+enter or hit the run button.

    2. Show them how to define a function.

    f=makeFun(x^2~x)
    1. Evaluate a function that has been defined.
    f(9)
    ## [1] 81
    1. Plot a function
    plotFun(f(x)~x)
    mathaxis.on()

    or

        plotFun(sin(x)~x)

    1. If they want help on any command, they can type ?command. So
      ?plotFun
      or
      ?makeFun

    2. To plot data points, use the plotPoints command. To plot the points (1,4), (5,8), and (4,11) we first make a list of all the x-coordinates.

      x=c(1,5,4)  

      then a list of all y-coordinates,

      y=c(4,8,11)  

      then use plotPoints.

      plotPoints(y~x)

    3. Some commands have optional arguments. Those arguments require a name that goes with them. If the arguments aren’t specified, then a default value is given. plotPoints and plotFun both have an optional argument, add. The default value of add is FALSE. Demonstrate adding to the existing plot.

    x=c(1,5,4) 
    y=c(4,8,11) 
    plotPoints(y~x)
    plotFun(x^2+1~x,add=TRUE)

    1. There are other optional arguments as well. Both will let you set the limits on the x- and y- axes with xlim= and ylim=. Both will let you set the color with col=. Lots and lots of other options.
    plotFun(sin(x)~x,xlim=c(-10,10), ylim=c(-3,3), col="red")

  2. Common student R mistakes:

    1. Mismatched parentheses might lead to R waiting for more input. For example

      plotFun(x^2~x,xlim=c(pi,6*pi)

      won’t work, you are missing a right paren. R will keep waiting for more input with a + sign on the prompt. Hit esc to quit and fix.

    2. Typos in function names. Not capitalizing correctly.

    plotfun(x^2~x) 

    need plotFun instead.

    1. Forgetting the ~
    plotFun(x^2) 

    need x^2~x.

    1. Typing the definition of something, but not executing that definition.
    f=makeFun(x^2~x)

    plotFun(f(x)~x) won’t work if you didn’t actually run the line above. You can see what is going on by either looking in the environment window, or just trying to evaluate f.

    1. Trying to add a plot to an empty plot. If there is no plot in the plotting window, then
    plotFun(x^2~x, add=TRUE)

    will fail, because there is no plot to add to.

    1. Common troubleshooting technique is to restart R. Session –> Restart R.

    2. If you are having a hard time getting plots to show up, sometimes you have to erase all current figures using the broom in the plots window.

2.2.5 R Commands

makeFun, plotFun, plotPoints

2.2.6 Problems & Activities

  1. Define these functions and evaluate at the given point.

    1. \(f(x) = x^{2}\), \(x = 6\)

    2. \(g(x) = e^{\sin(x)}\), \(x = 0,\ x = 2,\ x = 5\)

    3. \(h(x) = 4\cos(x - 4) + f(x)\) , \(x = 3\) actually define \(h\ \)in terms of \(f\).

  2. Plotting.

    1. Plot \(f\) using plotFun

    2. Plot \(g\) on a new figure on interval \[-6,3\] in cyan.

    3. Add a plot of \(h\) to the plot of \(g\)

  3. Also do the following, just type the expression directly in the plotFun command.

    1. \(x^{2} - 3x + 3\) so
    plotFun(x^2-3*x+3~x)

    1. \(\cos\left( \frac{1}{x} \right)\), on the figure above, in red.

    2. \(\sin^{2}{(x) + \cos^{2}(x)}\) on the figure above.

  4. Plot the data points (0,3), (4,10), (-3,5), (6,1), (1/2,1/2) on a new figure.

  5. To the plot above add the graph of \(y = 3x + 2\) in magenta.

  6. To the plot above add the data points (0,4), (1,2), (2,0), (3,-2).

  7. The following is supposed to graph \(x^{2} + 2\). Why doesn’t it work?

    plofun(x^2+2~x)
  8. The following is supposed to define the function \(sin(x)\), and evaluate it at \(x = 9\). What is wrong with it?

    g=makeFun(sin(x))  
    g(9)
  9. Why doesn’t the following work?

    g=makeFun(sin(x)~x  
    g(9)
  10. What is wrong with this?

    g=makeFun(sin(x~x))  
    g(9)
  11. This is supposed to produce a plot of \(\sin(x)\) and \(\cos(x)\) on the same axes. What is wrong with it?

    plotFun(sin(x)~x, xlim=c(-3,3), "magenta")
    plotFun(cos(x)~x, TRUE, "cyan")
  12. This is supposed to produce a plot of \(j(x)\), but when run it produces the error “Error in j(x) : could not find function "j"" What must have happened?

    j=makeFun(x^6-x^3~x)
    plotFun(j(x)~x)

2.3 Lesson 3: Introduction to Functions

2.3.1 Objectives

  1. Determine if a mapping between inputs and outputs is a function.

  2. Given a function, identify the output variable, the input variable(s), and the function’s domain.

  3. Evaluate a function at a given input, by hand and using R.

  4. Given a function, determine whether that function’s inverse exists, and if so, find that inverse.

2.3.2 Reading

Chapter 1. Note: students are expected to know the material in Chapter 1, whether from previous Math courses or from reading Chapter 1.

2.3.3 In class

  1. Introduce functions and inverses. This should be a review since they will have seen functions at some point in their academic careers. Discuss the terms input variable, output variable, and domain. Discuss the notion of an inverse function. Discuss the requirements for a function and for an inverse to exist. Discuss how to find the inverse of a function, if it exists.

  2. Emphasize that one of the main points of this course is to use functions to model data. To do that, we need a basic understanding of functions.

  3. Review how to plot a function in R using plotFun. Also, note the default plotted “domain” and demonstrate how to change it. Emphasize the difference between the domain of a function and the graphical domain when plotting.

2.3.4 R Commands

makeFun, plotFun

2.3.5 Problems & Activities

  1. \(f(x) = 2x + 3\); most will recognize this as a line. Take advantage of their familiarity with the “mx+b” format of a line that they’re familiar with. They should be able to identify “x” as the input and “f(x)” as the output. Have them identify \(f( - 1)\), \(f(0)\), and \(f(2)\) “by hand”. Now demonstrate how to define, evaluate and plot in R:

    f = makeFun(2*x+3 ~ x)
    f(-1)
    ## [1] 1
    f(0)
    ## [1] 3
    f(2)
    ## [1] 7
    plotFun(f(x) ~ x)

    OR

    plotFun(2*x+3 ~ x)

    Set custom limits.

    plotFun(f(x) ~ x, xlim=range(-2,4))

  2. Identify the input variable in the following functions.

\[g(r) = \sin(2r) + cos(3r)\]

\[position(time) = a\cdot{time}^{2} + b\cdottime + c\]

\[h(z) = 40e^{- 3z}\] Note that other non-input quantities in the function are sometimes called parameters.

  1. For each of the following functions, identify the domain, define the function in R, evaluate the function at your choice of input, and plot the function, changing the input limits if needed.

\[f(x) = 2x^{2} - 3x + 1\]

\[height(t) = 10\cos{(\frac{3}{15}t)}\]

\[q(a) = \frac{1}{(a - 5)^{2}}\]

  1. Consider the following functions. Create and plot them in R. Determine whether they have inverses, and if so, find the inverse functions.

\[f(x) = 3x + 7\]

\[g(x) = \sqrt[5]{2x - 3}\]

\[h(t) = t^{2} - 2t + 4\]

\[r(z) = \frac{5z - 3}{7z + 4}\]

2.4 Lesson 4: Modeling with Linear Functions

2.4.1 Objectives

  1. Given a set of observed data, identify whether a linear model is appropriate.

  2. Given a set of observed data that exhibit a linear relationship \(y = mx + b\), approximate the slope \(m\) and intercept \(b\) by hand.

  3. In R, given a set of observed data, use fitModel to obtain the best-fit linear model.

  4. In R, use plotPoints and plotFun to plot data and functions on the same graph.

2.4.2 AD Note:

Throughout the next lessons on modeling, try to do something with the models we build. The objectives below show up on Lesson 10, but it is a good idea to scatter the ideas throughout Lessons 4, 5, 6, 7, 8, 9, 10.

  1. Describe the two primary reasons for modeling data: inference and prediction.

  2. Describe the definition of extrapolation and which input values we can expect a model to provide reasonable outputs for.

    When we are modeling for inference, we are primarily interested in finding the value of the parameters in our model. Something like: here is a set of data that gives the weight, \(w\), at several different elevations (distance from center of the earth), \(r\). We think that the force exerted by gravity (weight) on an object varies something like \(F = c/r^{p}\). What value of \(c\) and \(p\) will fit the data well?

    On the other hand, if we are interested in prediction, then we are interested using the model to predict other things. Questions like: how much will this object weigh if it is at elevation 4000 miles? Prediction comes in two flavors, interpolation and extrapolation, and extrapolation is … let’s just say it is not preferred.

2.4.3 Reading

Section 2.1

2.4.4 In class

  1. Introduction to Modeling. Block 1 involves applying different functions to observed data. While it’s not an objective, you should discuss the modeling cycle and the start of this chapter.

  2. Visualizing Data. To introduce the idea of modeling, start by demonstrating how to visualize real data. We’ll use the data=… construct when plotting from data sets. We will visualize various datasets to determine whether a linear relationship is appropriate.

  3. For linear functions, feel free to draw on their understanding of the \(y=mx+b\) equation of a line. However, they should keep in mind that we are exploring linear functions, so the expression should be \(f(x) = mx + b\). The slope, m, and the intercept, b, can be considered parameters. This term will come up again and again. Parameters are quantities that help describe the behavior of your function. In this case, the slope tells us how our linear function increases or decreases with respect to the input variable.

  4. Approximating a Linear Relationship. If a linear relationship appears appropriate, the next step is to approximate the function that accurately models that relationship. The text shows a methodology of averaging slopes and intercepts, but don’t focus on this. Regarding objective 3, they should be able to approximate by picking points along the “line” and estimating rise over run and using that estimate to find the intercept.

  5. Using fitModel. Once we have an estimate of a model done by-hands, use fitModel to create a “best” model. Feel free to point out that what is “best” is ambiguous at best.

2.4.5 R Commands

makeFun, plotFun, plotPoints, fitModel

2.4.6 Problems & Activities

  1. First, let’s look at the WorldPopulation dataset. If you’d like to see it in the environment in the top right, type and run

    data("WorldPopulation")

    Note that you can click on the dataset in your environment and it will show you the dataset in spreadsheet format. Point out that data stored in table format like this gets a bunch of names. For the most part in data science, and for this class, these are called data frames.

  2. Next, use

    ?WorldPopulation

    to access the documentation for this dataset. Often datasets that exist in packages come with documentation. This dataset contains the world population from 1950 to 2015. You can use this to see the column names or you can use

    names(WorldPopulation) 
    ## [1] "Year"   "People"

    or

    head(WorldPopulation)
    ##   Year People
    ## 1 1950   2.55
    ## 2 1955   2.78
    ## 3 1960   3.04
    ## 4 1965   3.35
    ## 5 1970   3.71
    ## 6 1975   4.09

    This dataset has two columns: Year and People.

  3. Next, plot this using plotPoints. So far we’ve only seen using plotPoints to plot coordinates stored as lists. There are two ways to proceed. WorldPopulation$Year will give you a list that contains the Year column of the WorldPopulation dataframe. So you can plot the data in the dataframe like this:

    plotPoints(WorldPopulation$People ~ WorldPopulation$Year)

    Alternatively, you can just give the column names, and then give an argument that tells us which data frame the columns are talking about.

    plotPoints(People ~ Year, data=WorldPopulation)

  4. Note that this is different from plotFun. Emphasize to the cadets that plotFun is used to plot user-defined functions, while plotPoints is used to plot data.

  5. Clearly, there is a linear relationship between population and year, and we could very easily estimate this.

  6. Have the students repeate on the MonthlyUnemployment and Mortgage30YrAnnual data sets.

  7. For the WorldPopulation dataset, demonstrate how to estimate the linear model. I recommend doing this by projecting the data on the screen, drawing a “guess” and estimating the slope of that guess by picking two points along that line. Use that slope to determine intercept.

  8. Visually, it looks like the population goes up by 1.5 per 20 years, giving us a slope of 0.075. Assuming the population at 1980 is about 4.5, this leads to an intercept of -144. Thus our approximate model is \(People(Year) = - 144 + 0.075Year\).

  9. Demonstrate how to use fitModel to obtain a best fit linear model. This function requires a parameterized tilde expression and returns a function object. You can call this object to see what values of slope and intercept it found.

plotPoints(People ~ Year, data=WorldPopulation)
bestfitmodel=`fitModel`(People~b+m*Year,data=WorldPopulation)
bestfitmodel
## function (Year, ..., transformation = function (x) 
## x) 
## return(transformation(predict(model, newdata = data.frame(Year = Year), 
##     ...)))
## <environment: 0x000001fd0e487c30>
## attr(,"coefficients")
##             b             m 
## -143.69428571    0.07487912 
## attr(,"class")
## [1] "nlsfunction" "function"
plotFun(bestfitmodel(Year)~Year,add=TRUE,col="red")

  1. If you just type

    bestfitmodel  
    ## function (Year, ..., transformation = function (x) 
    ## x) 
    ## return(transformation(predict(model, newdata = data.frame(Year = Year), 
    ##     ...)))
    ## <environment: 0x000001fd0e487c30>
    ## attr(,"coefficients")
    ##             b             m 
    ## -143.69428571    0.07487912 
    ## attr(,"class")
    ## [1] "nlsfunction" "function"

    you can see that the coefficients it is using are: \(b=-143.74\), \(m=0.075\). Pretty close to our by-hands estimate. Alternatively, things look nicer if you run

    coef(bestfitmodel)
    ##             b             m 
    ## -143.69428571    0.07487912
  2. Prediction questions: What do we think the world population was in March of 1982? Is this interpolation or extrapolation? What might it be in February of 2024? Is this interpolation or extrapolation?

    bestfitmodel(1982.25)
    ## [1] 4.734852

    Inference question: What does the slope parameter that we found tell us? (On average, the model predicts that the world gains 0.075 billion people per year)

    1. Repeat with MonthlyUnemployment, and Mortgage15YrAnnual.

2.5 Lesson 5: Modeling with Exponential Functions I

2.5.1 Objectives

  1. Interpret the parameters \(C\) and \(k\) of an exponential function, and describe how the function changes as each parameter is adjusted.

  2. Given a value of \(k\), calculate the doubling/halving rate of an exponential process (and vice-versa).

  3. In R, use makeFun and plotFun to define and plot logarithmic functions.

  4. Use log rules to show that an exponential function becomes \(\ln(y) = mx + b\) in semi-log form, where \(k = m\) and \(C = e^{b}\).

  5. Use log and exponent rules to solve equations or simplify expressions.

  6. Given a function or set of data, use the plotPoints command to obtain a semi-log plot of the data.

2.5.2 Reading

Section 2.2.

2.5.3 AD Note

Today we are just getting used to what the parameters \(C\), \(k\), for an exponential function do, and similar for a logarithmic function. When we get to modeling, Lesson 6, then we’ll introduce the parameter \(h\). However, \(h\) isn’t a parameter we are going to fit, per-se, we are just going to immediately identify it from the data. Moral of story – don’t worry about horizontal shifts today.

2.5.4 In class

  1. In this section, we remind them of exponential growth and decay. They should have seen this in a prior math course, so ideally, this will be partially a review.

  2. Start by defining and plotting an exponential function in R and noting how the values of \(C\) and \(k\) correspond to the plot. Show them how to determine the doubling or halving time given a value of \(k\).

  3. Demonstrate how changing the values of \(C\) and \(k\) change the behavior of the function. You can use Desmos (https://www.desmos.com/calculator) to do this, but use R a couple of times first to demonstrate how to add plots to existing axes.

  4. Briefly review logarithms. In this course, when we refer to logarithm, we are referring to natural logarithm (base e). Note that in R, \(\ln(x)\) is denoted by log(x). Briefly review the log laws, but note that these are prerequisite concepts; if cadets feel like they need a refresher, I recommend problems 27-34 & 41-48 in section 1.6 and problems 1-56 in section 1.7 are very helpful.

  5. Semi-Log Plots. Define a semi-log plot as the plot of the natural logarithm of the \(y\) or output value versus the original input. Demonstrate that the semi-log plot of an exponential function is linear. We can use this to determine whether a set of data exhibits exponential behavior.

2.5.5 R Commands

makeFun, plotFun, plotPoints, Optional: data, names, head

2.5.6 Problems & Activities

  1. Create and plot the function \(f(x) = 2e^{-3x}\) on the domain \([-0.5,4]\). What is the value of C and how do I interpret this value? Same question for k.

    f = makeFun(2*exp(-3*x)~x)
    plotFun(f(x)~x, xlim=c(-0.5,4))
    mathaxis.on()

    C = 2 and represents the “initial value”, or value at time 0, of this process. The value k = -3 represents the decay rate of this process (although the book doesn’t use that term). More generically, it represents how quickly the function shrinks to 0 as x increases.

  2. On the same set of axes, plot the function \(g(x) = 2e^{-5x}\). What changed?

    plotFun(2*exp(-5*x)~x, add=TRUE, col="red")

    This function decreases faster than the original. Note the add=TRUE and col="red". This adds to an existing plot and changes the color of the line to distinguish it.

  3. Exercises 11-14 in the text section 1.4 (finding C and k).

  4. For illustration of logarithmic functions, start with \(f(x) = \log(x)\). Plot this in R. Next, create another function shifted and then plot on the same axes:

    f = makeFun(log(x)~x)
    plotFun(f(x)~x, xlim=range(0,10), ylim=range(-8,8))
    mathaxis.on()
    g = makeFun(log(x-2)~x)
    plotFun(g(x)~x, add=TRUE,col="blueviolet")
    
    h = makeFun(3*log(x)~x)
    plotFun(h(x)~x,  add=TRUE,col="brown4")
    
    m = makeFun(log(x)+2~x)
    plotFun(m(x)~x,  add=TRUE,col="darkgoldenrod4")
    
    y = makeFun(3*log(x-2)+2~x)
    plotFun(y(x)~x,  add=TRUE,col="darkseagreen3")

  5. (Exercise 67, Section 1.4) Consider the function \(f(x) = 31.486e^{-0.048x}\), which models the price in dollars per short ton of coal as a function of year since 1989.

    1. Inference question: Report the vertical intercept (C) of this function and explain it in context of the problem. Report the halving time of this function and explain in the context of the problem.

      INSTRUCTORS \(C = 31.486\) and represents the cost of a short ton of coal when \(x = 0\) (the year 1989). The halving time is given by \(\frac{\ln(2)}{0.048} = 14.44\), which means that under this relationship, we expect the cost of a short ton of coal to decrease by half every 14.44 years. Note that \(k = -0.048\) and can be thought of as the decay rate.

    2. Prediction question: What do we expect the price of coal to be in 2021?

      INSTRUCTORS

      coal = makeFun(31.486*exp(-0.048*x)~x)
      coal(2021)
      ## [1] 2.333879e-41
  1. If the data we collected was based on the years 1989 – 2011, should we trust this prediction?

    INSTRUCTORS This is extrapolation. We should be wary.

  2. Plot the function; also, plot the log of the function versus \(x\). This is known as a semi-log plot. Describe this plot.

    plotFun(coal(x)~x, xlim=range(0,50))
    plotFun(log(coal(x))~x, xlim=range(0,50))

    To see why this is happening, take the log of \(coal(x)\) and simplify. \[\log(coal(x)) = \log(31.486e^{-0.048x}) = \log(31.486) - 0.048x\] Note that the log of an exponential function is a linear function of x.

2.6 Lesson 6: Modeling with Exponential Functions II

2.6.1 Objectives

  1. Use a semi-log plot to determine whether an exponential model is appropriate for a given set of data.

  2. Given a set of observed data that exhibit an exponential relationship, approximate the parameters by hand.

  3. Given a set of data, determine an appropriate horizontal shift, \(h\), to use.

  4. In R, given a set of observed data, use fitModel to obtain the best-fit exponential model.

  5. Given parameters for a semi-log fit of data, \(\ln(y)=m(x-h)+b\) use algebra to write an equivalent model for \(y\), \(y=Ce^k(x-h)\) or \(y=Ce^{kx}\).

  6. Evaluate an exponential model at a given input and interpret the meaning of the answer.

2.6.2 Reading

Section 2.2

2.6.3 In class

  1. Semi-Log Plots. Review semi-log plots and why the semi-log plot of an exponential function is linear. We can use this to determine whether a set of data exhibits exponential behavior.

  2. Determining “Exponential-ness” of Data. Open the Mortgage30YrAnnual dataset. Build a semi-log plot of the data. Note that we use the plotPoints function instead of plotFun, but the concept is the same: we are plotting the log of one variable versus the other variable. Since this looks reasonably linear, an exponential model is appropriate. Practice with other datasets.

  3. Approximate Fit. Discuss how we could estimate the parameters of the exponential model “by hand”. Using the linearity of the semi-log transferred data, we can approximate the values of \(k\) and \(C\) by estimating the slope and intercept and deriving the parameters from those. Once again the book estimates slopes by averaging consecutive slopes. Once again, we can skip that. Just estimate slopes by picking a couple of points on the semi-log plot and calculating a slope. We can verify our estimate by plotting our exponential model on top of our observed data.

  4. Shifting Discuss that for practical reasons, instead of modeling with plain exponential functions, \(Ce^{kx}\), we’ll model with functions of the form \(Ce^{k(x-h)}\). This is strictly for practical reasons when using computers for modeling. \(h\), for us, is always the furthest left \(x\)-coordinate of the data. Thus, it isn’t a parameter for us to estimate, we just have it.

  5. Best Fit Exponential Model. We will use fitModel to obtain the best fit. We can plot the data and both fitted models on the same set of axes.

2.6.4 AD Note

It is tempting to force students to do algebra to convert a model of the semi-log data into an exponential model of the original data. That’s how I present things in class, and in these notes. However, it is completely fine for the students to straight memorize \(C=e^k\) and \(k=m\).

2.6.5 R Commands

plotFun, plotPoints, Optional: data, names, head

2.6.6 Problems & Activities

  1. Start with the Mortgage30YrAnnual dataset:

    1. Use plotPoints to visualize the relationship between the two variables.
    plotPoints(Rate~Year,Mortgage30YrAnnual)

    1. Describe whether a linear model is appropriate to describe this relationship.

    2. Use fitModel to obtain a best fit linear model and plot on top of the data.

      A linear model does not look horrible, let’s see how well we can do.

      myfit = fitModel(Rate~b+Year*m, data=Mortgage30YrAnnual)
      plotFun(myfit(Year)~Year, add=TRUE, col="red")

      While this isn’t horrible, there is a definite “bend” in the data that we can’t recreate with a linear model. Also, our linear model will predict negative interest rates in the future. While that would be great for me, it probably isn’t realistic.

  2. To determine if an exponential model is appropriate, we can plot the log of the Rate variable versus Year. If the data appear linear, then an exponential model is appropriate. Note that these are certainly not always clear-cut, opinions could vary as to how linear data looks. When in doubt, you can always just try fitting the model and see how it does.

    plotPoints(Rate~Year, data=Mortgage30YrAnnual)
    plotPoints(log(Rate)~Year, data=Mortgage30YrAnnual)

    The semi-log plot looks fairly linear, so an exponential model looks reasonable.

For the following datasets, use a semi-log plot to determine whether an exponential model is appropriate.

  1. AAPLStockMonthly

    plotPoints(AdjClose~Month, data=AAPLStockMonthly)
    plotPoints(log(AdjClose)~Month, data=AAPLStockMonthly)

    The semi-log plot isn’t horrible, but neither is it great. It is certainly more linear than the original plot. Thus, we expect an exponential model to be reasonable.

  2. APCalculus

    plotPoints(Exams~Year, data=APCalculus)
    plotPoints(log(Exams)~Year, data=APCalculus)

    The semi-log plot looks about as linear as the original data. An exponential model is probably not going to be good.

  3. NetherlandsPopulation

    plotPoints(Population~Year, data=NetherlandsPopulation)
    plotPoints(log(Population)~Year, data=NetherlandsPopulation)

    The semi-log plot is more linear than the original data, so we expect that an exponential model will be better than a linear model. We still don’t expect an exponential model to be great.

  4. LifeExpectancyPhysicians Note, you can run ?LifeExpectancyPhysicians to get a description of the data.

    plotPoints(LifeExpectancy~Physicians, data=LifeExpectancyPhysicians)
    plotPoints(log(LifeExpectancy)~Physicians, data=LifeExpectancyPhysicians)

    The semi-log plot doesn’t look any more linear than the original data. We don’t expect an exponential model to be appropriate.

  1. This data looks linear on a semi-log plot. So, we can model the semi-log data with a line. Our model will be \[\log(Rate)=m\cdot Year + b.\] Plot the data, estimate \(m\) and \(b\).

    plotPoints(log(Rate)~Year, data=Mortgage30YrAnnual)

    Pick a couple of points. It doesn’t matter too much which ones you pick, we are just trying to get an estimate here. I think our line should pass through about (1981,2.8) and (2010,1.5).

    plotPoints(c(2.8,1.5)~c(1981,2010),pch=19,col="red",add=TRUE)

    That gives us a slope of \(m=\frac{2.8-1.5}{1981-2010}\approx -0.045\). Our line passes through (1981,2.8), so our model is \[\log(Rate) - 2.8 = -0.045(Year-1981)\] which is the same as \[\log(Rate) = -0.045Year+(2.8+1981\cdot0.045) = -0.045Year+91.95\] This is a linear model of \(\log(Rate)\), so we can fit this model on the semi-log plot of the data.

    plotFun(-0.045*Year+91.95~Year,add=TRUE,col="plum4")

    Now we can turn our model into a model of \(Rate\) rather than \(\log(Rate)\). Our model says that \[\log(Rate) = -0.045Year+91.95.\] Exponentiate to get a model for \(Rate\) that is exponential. \[Rate=e^{-0.045Year+91.95} = e^{91.95}e^{-0.045Year}.\] This is an exponential model with \(C=e^{91.95}\approx 8.58\times 10^{39}\) and \(k=-0.045\).

    Visualize this fit.

    plotPoints(Rate~Year, data=Mortgage30YrAnnual)
    
    expfit = makeFun(exp(91.95)*exp(-0.045*Year)~Year)
    plotFun(expfit(Year)~Year, add=TRUE,col="springgreen3")

    One more thing. Our \(C\) value was absurdely large in the exponential model. This isn’t a problem if you are performing calculations by hand, but it turns out that for the computer, it is better to avoid such huge numbers. We can back up one step when computing our exponential model. Our model of the semi-log data is \[\log(Rate) - 2.8 = -0.045(Year-1981).\] Re-write as \[\log(Rate) = -0.045(Year-1981)+2.8.\] Exponentiate to get \[Rate=e^{2.8}e^{-0.045(Year-1981)}\] This is a model in the form \[Ce^{k(Year-h)}.\] where \(C=e^{2.8}\approx 16.44\)$, \(k=-0.045\), \(h=1981\).

  2. Summarize the different forms of models from the example above, because there will be confusion for sure.

    If you model the semi-log data with a line, you can write the model as either \[\log(Rate) = -0.045Year+91.95\] or \[\log(Rate) = -0.045(Year-1981)+2.8.\] Either is completely correct. They are the same model. However, the first one gives you this model for \(Rate\). \[Rate=e^{91.95}e^{-0.045Year}\] If you start from the second model, you get \[Rate=e^{2.8}e^{-0.045(Year-1981)}.\] The two exponential models are also equivalent. However, when we use fitModel in a minute, we want to use the second form, because our results will turn out better for computational reasons.

  3. We can obtain the “best fit” exponential model using fitModel. Because fitting linear models is easy, we’ll tell it to make a linear model of our semi-log data. For computational reasons, we’ll tell it to fit a model in the form \[\log(Rate) = m (Year-1981)+b.\] When fitting with an exponential model, we’ll always use this shifted form. We will always use the left-most time coordinate from the data as the shift.

    bfit = fitModel(log(Rate)~m*(Year-1981)+b, data=Mortgage30YrAnnual)
    coef(bfit)
    ##           m           b 
    ## -0.03739385  2.64752113

    Our linear model of the semi-log data is \[\log(Rate) = -0.037(Year-1981)+2.65.\] We can convert it to an exponential model. \[Rate=e^{2.65}e^{-0.037(Year-1981)}=14.15e^{-0.037(Year-1981)}.\] If we want our model in un-shifted form, we can do that also. \[Rate=14.15e^{-0.037(Year-1981)}= 14.15e^{-0.037*-1981}e^{-0.037Year} = 9.62\times 10^{32} e^{-0.037Year}.\] Students should be able to report the model in either form.

    Finally, see how well fitModel did. Compare the semi-log fit from fitModel to our by-hands fit.

    plotPoints(log(Rate)~Year,data=Mortgage30YrAnnual)
    plotFun(-0.045*(Year-1981)+2.8~Year,add=TRUE,col="steelblue3")
    plotFun(-0.037*(Year-1981)+2.65~Year,add=TRUE,col="thistle3")

    Compare the exponential models.

    plotPoints(Rate~Year,data=Mortgage30YrAnnual)
    plotFun(exp(2.8)*exp(-0.045*(Year-1981))~Year,add=TRUE,col="steelblue3")
    plotFun(exp(2.65)*exp(-0.037*(Year-1981))~Year,add=TRUE,col="thistle3")

    INSTRUCTORS NOTE: Watch out! The function bfit that fitModel produces is not evaluating \(-0.037(Year-1981)+2.65\), as you might expect. Rather, it evaluates \(e^{2.76}e^{-0.037(Year-1981)}\). I find this unintuitive. So, even if you ask fitModel to model semi-log data, the function produced is modeling the original data.

    If I want to visualize my semilog data along with my model, I’d have to do this:

    plotPoints(log(Rate)~Year, data=Mortgage30YrAnnual)
    plotFun(log(bfit(Year))~Year, add=TRUE, col="red")

    If I want to visualize my original data and my model, I should do:

    plotPoints(Rate~Year, data=Mortgage30YrAnnual)
    plotFun(bfit(year)~year, add=T, col="red")

  4. Prediction: Using this model, what do we predict the mortgage rate will be in 2015? What was it in June 1993? Is it reasonable to estimate it in February of 1884?

  5. Inference: What does the parameter, \(k\), that we found, mean in this context? Interpret it in terms of the amount of time required for interest rates to double/halve.

  6. Repeat with `AAPLStockMonthly.

    1. Visualize the data.

      plotPoints(AdjClose~Month,data=AAPLStockMonthly)

      A linear model is clearly not appropriate. What about an exponential model?

      plotPoints(log(AdjClose)~Month,data=AAPLStockMonthly)

      Ok, this semi-log plot looks somewhat linear. Let’s try to go for an exponential model.

    2. Create a model of the semi-log data. We’ll make a linear model of the semi-log data. The left-most time coordinate is at month 1. So we’ll use \(h=1\).

      AAPLmodel=fitModel(log(AdjClose)~m*(Month-1)+b,data=AAPLStockMonthly)
      coef(AAPLmodel)
      ##           m           b 
      ##  0.01274598 -1.63848926

      Great, now our model reads: \[\log(AdjClose) = 0.013(Month-1)-1.64.\]

    3. Create a model for the original data.

      The corresponding model for \(AdjClose\) directly is \[AdjClose=e^{-1.64}e^{0.013(Month-1)}=0.19e^{0.013(Month-1)}.\] The relevant plot for the linear fit of the semi-log data is:

      plotPoints(log(AdjClose)~Month,data=AAPLStockMonthly)
      plotFun(log(AAPLmodel(Month))~Month,add=TRUE,col="tomato3")

      The plot for the model of actual \(AdjClose\) price is:

        plotPoints(AdjClose~Month,data=AAPLStockMonthly)
        plotFun(AAPLmodel(Month)~Month,add=TRUE,col="tomato3")

      This model is clearly not wonderful, but the best we can do with the techniques we have!

    4. Make an inference.

      The parameter \(k=0.013\) means that we think the price of Apple stock will double roughly every \(k=\frac{\ln(0.013)}{\ln(2)}=6.27\) months.

  7. As time permits,for the following datasets, use a semi-log plot to determine whether an exponential model is appropriate, and if so, approximate a good fit and use fitModel to find the “best fit”. Create a prediction and inference for each model: APCalculus, NetherlandsPopulation, LifeExpectancyPhysicians

2.7 Lesson 7: Modeling with Power Functions

2.7.1 Objectives

  1. Given a set of data, use plotPoints to obtain a log-log plot of the data.

  2. Use a log-log plot to determine whether a power model is appropriate for a given set of data.

  3. Given a set of observed data that exhibit a power relationship, approximate the parameters by hand.

  4. In R, given a set of observed data, use fitModel to obtain the best-fit power model.

  5. Evaluate a power model at a given input and interpret the meaning of the answer.

2.7.2 Reading

Section 2.3.

2.7.3 In class

  1. Power Models and Log-Log Plots. Start with a motivating example (the EngineRPM dataset). When plotting the data, this looks potentially exponential, but the semi-log plot is not linear. However, if I take the log of BOTH variables and plot (a log-log plot), this looks quite linear. This is because a power model (\(f(x) = Cx^{k}\)) is more appropriate. Show why a power model has a linear log-log relationship.

  2. Approximating the Fit of a Power Model. Similar to last lesson, we will use the linearity of the log-log plot to estimate the parameters of the power model. The intercept the line gives us and estimate of \(\log C\) and the slope gives us an estimate of \(k\).

  3. Best Fit Power Model. Again, similar to last lesson, we will use fitModel to obtain a best fit power model. This will require us to log transform both variables and then find the best linear fit.

2.7.4 AD Note

It is tempting to want to force students to do algebra to transform a model of log-log data into a model of the original data. That is how I present things in class, and what I’m doing in these notes. However, it is completely fine for the students to straight memorize \(C=e^k\) and \(k=m\).

2.7.5 R Commands

plotFun, fitModel, plotPoints

2.7.6 Problems & Activities

  1. Plot the RPM vs Mass of the EngineRPM dataset.

    plotPoints(RPM~Mass, data=EngineRPM)

  2. Note that this looks potentially exponential. However, when we build the semi-log plot, the result is not linear.

    plotPoints(log(RPM)~Mass,data=EngineRPM)

  3. So we try something else, a log-log plot.

        plotPoints(log(RPM)~log(Mass),data=EngineRPM)

  4. Build a linear model of the log-log data. Just eyeballing it, I think the linear model should pass through \((0,9.5)\) and \((10,5.5)\).

    plotPoints(c(9.5,5.5)~c(0,10),pch=19,col="violetred2",add=TRUE)

    This gives a slope of \(m=\frac{9.5-5.5}{0-10}=-0.4.\) Our line passes through \((0,9.5)\), so our linear model of the log-log data is \[\log(RPM)-9.5=-0.4(\log(Mass)-0)\] \[\log(RPM)=0.4\log(Mass)+9.5\]

    plotFun(-0.4*x+9.5~x,add=TRUE,col="thistle4")

  5. Convert to a model of the data. \[\log(RPM)=0.4\log(Mass)+9.5\] Exponentiate. \[RPM=e^{-0.4\log(Mass)+9.5} = (e^{\log(Mass)})^{-0.4}e^{9.5}=e^{9.5}Mass^{-0.4}\] Now we have a model for \(RPM\) directly. When we plot this it will be hard to visualize because we have many small engines with masses near 0. Our model goes to infinity as mass goes to 0. On the other hand, there are some huge engines. The figure below is reasonable, at least.

    plotPoints(RPM~Mass,data=EngineRPM,xlim=c(0.1,7000))
    plotFun(exp(9.5)*Mass^(-0.4)~Mass,add=TRUE,col="slategrey",npts=1000)

  6. Generalize. Power functions have the form \(f(x) = Cx^{k}\) where \(C\) and \(k\) are nonzero constants. We model with power functions when the log-log plot of data is roughly linear. Page 162 has a good visual of how the human eye can confuse an exponential and power relationship. Our model above is a power model with \(C=e^{9.5}\approx 13359\) and \(k=-0.4\).

  7. Use fitModel. Again, we could ask fitModel to build a model in the form \(RPM=C\cdot Mass^k\) directly. However, because it is easy to fit linear models, it is better to ask it to fit a linear model to the log-log data, and then work back to a model of the original data.

    RPMModel=fitModel(log(RPM)~m*log(Mass)+b,data=EngineRPM)
    coef(RPMModel)
    ##          m          b 
    ## -0.3396215  9.5138096

    The model coming from fitModel is \[\log(RPM)=-0.34\log(Mass)+9.5.\] The model for \(RPM\) directly is \[RPM=e^{9.5}Mass^{-0.34}.\]

    It is completely fine for the students to straight memorize \(C=e^k\) and \(k=m\).

    INSTRUCTORS: Once again, fitModel is producing a model of the original data, not the log-log data. It is also expecting to take in an RPM, not a log of RPM. Therefore, you can do this:

    plotPoints(log(RPM)~log(Mass), data=EngineRPM)
    plotFun(log(RPMModel(exp(x)))~x, add=T, col="red")

    Or this makes sense to do:

    plotPoints(RPM~Mass, data=EngineRPM,xlim=c(0,7000))
    plotFun(RPMModel(mass)~mass, add=T, col="violetred2")

  8. Prediction: At what RPM do we expect an engine with mass 100 kg to spin?

  9. Exercises 67-70 is a set of problems using other datasets from MMAC. These serve as excellent practice and should be used for pod and/or board work. Be sure to ask students to make a prediction based on the results of their models. Please do not use Exercises 71-74, it may be used for a GR question.

2.8 Lesson 8: Modeling with Sine Functions

2.8.1 Objectives

  1. Given a parameterized sine function, describe how the function changes as each parameter is adjusted.

  2. Given a plot of data, determine whether a sine function is appropriate to model the dataset.

  3. Given a set of observed data that exhibit a sinusoidal relationship, approximate the parameters by hand.

  4. In R, given a set of observed data, use fitModel to obtain the best-fit sine model.

  5. Evaluate a sine model at a given input and interpret the meaning of the answer.

2.8.2 Reading

Section 2.4

2.8.3 Course Director’s Note

This is probably a good point to remind students of the upcoming quiz, which will be administered in class. The quiz will be 30 minutes and it will assess lesson material through Lesson 9.

2.8.4 In class

  1. Cadets should recognize that the default measure of angles in this course is radians. The sin, cos and other trig functions in R, by default, accept radian inputs. NOTE: cadets’ issued calculators will be set to degrees. It might be worth pointing this out to them.

  2. While all six of the trig functions are covered in the text, our focus should be on sine and cosine. This is because when we get to modeling, we will be exclusively using sine to model.

  3. Introduce the parametrization of the sine function. \[f(x) = A\sin\left(\frac{2\pi}{P}(x - h)\right) + v\] This will likely be familiar to them. \(A\) is amplitude, \(P\) is period, \(h\) is horizontal or phase shift, \(v\) is vertical shift.

  4. In R, show some examples of trigonometric functions and what happens when you change the parameters. You are welcome to use Desmos if you like.

  5. Approximating the Fit of a Sinusoidal Model. Using the TemperaturesDanville dataset, demonstrate how to approximate a sinusoidal fit. I recommend going in the order of \(v\), \(A\), \(P\) then \(h\), but any order will do fine.

  6. Best Fit Sinusoidal Model. We will use fitModel to obtain a best fit sinusoidal model. Unlike exponential/power models, we do this directly without transforming the data. However, we should include initial values into fitModel. We can use our visual approximations.

2.8.5 R Commands

plotFun, fitModel, plotPoints, Optional: data, names, head

2.8.6 Problems & Activities

  1. Start with a basic function, \(f(x) = \sin(x)\), and explore what happens when you change the parameters. The cadets should be following along on their own computers:

    f = makeFun(sin(x)~x)
    plotFun(f(x)~x, xlim=range(-10,10), ylim=c(-8,8))
    mathaxis.on()
    
    g = makeFun(sin(x-2)~x)
    plotFun(g(x)~x, add=TRUE,col="tomato3")
    
    h = makeFun(4*sin(x)~x)
    plotFun(h(x)~x, add=TRUE,col="turquoise3")
    
    m = makeFun(sin(x)+2~x)
    plotFun(m(x)~x, add=TRUE, col="purple3")
    
    y = makeFun(sin(2*x)~x)
    plotFun(y(x)~x, add=TRUE, col="orange2")

  2. From the TemperaturesDanville dataset, plot Temperature versus Month. This dataset shows the average maximum temperature by month, in Danville, KY. The dataframe contains 60 observations, so we expect periodic behavior.

    plotPoints(Temperature~Month, data=TemperaturesDanville)
    mathaxis.on()
    grid.on(col="lightgrey")

    What is a reasonable value of \(v\)? This parameter represents vertical shift. Where is this plot centered? One method is to take the halfway point between the maximum and the minimum. A fair guess seems to be about 64 degrees.

    What is a reasonable value of \(A\)? This parameters represents the amplitude, or how far up and down this function deviates from \(v\). One method is to find the difference between the maximum and the minimum and divide by two. A fair guess seems to be around 29 degrees.

    What about \(P\)? This represents the period. How often does this oscillating process repeat one full cycle? One strategy is to measure from peak to peak. It looks like a good guess for \(P\) is 12 months. This makes sense because we are talking about average temperatures.

    Finally, what about \(h\). This represents horizontal (or phase) shift and is harder to capture, especially when input variable does not start at 0. In our case it does, so we simply have to find where is the first place our model crosses \(v\), or 64. A reasonable guess is 3 months.

    Putting it all together:

    plotFun(29*sin(2*pi/12*(Month-3))+64~Month, add=TRUE,col="magenta2",npts=10000)

    Good guess.

  3. Let’s use fitModel to find the best fit. Note that we don’t have to transform the data in this case. However, we need to give fitModel some initial values. We can use our guesses from above.

    bestfit = fitModel(Temperature~A*sin(2*pi/P*(Month-h))+v,
                   data=TemperaturesDanville,
                   start=list(A=29,v=64,P=12,h=3))
    
    plotFun(bestfit(Month)~Month, col="red", add=TRUE,npts=10000)

  4. Prediction: What do we expect the average maximum temperature to be in Danville, Kentucky, during February, 2024? Is this extrapolation or interpolation?

  5. Inference: The value of \(A\) that we found was about 23.28. What does this parameter value mean? If we modeled another locale and found that \(A\) for that locale was 19.11, how would the climate of that location compare to the climate of Danville?

  6. Exercises 41-71. These are good practice for examining sinusoidal relationships and using R to find best fits. Good for pod and board work.

2.9 Lesson 9: Modeling with Sigmoid Functions

2.9.1 Objectives

  1. In R, use the pnorm command to define sigmoid functions in the form \(y = A \cdot pnorm(x,m,s) + v\).

  2. Given a parametrized sigmoid function \(y = A \cdot pnorm(x,m,s) + v\), describe how the function changes as each parameter is adjusted.

  3. Given a plot of data, determine whether a sigmoid model is appropriate.

  4. Given a set of observed data that exhibit a sigmoidal relationship, approximate the parameters by hand.

  5. In R, given a set of observed data, use fitModel to obtain the best-fit sigmoid model in the form \(y = A \cdot pnorm(x,m,s) + v\).

  6. Evaluate a sigmoid model at a given input and interpret the meaning of the answer.

2.9.2 Reading

Section 2.5 and Supplemental Reading.

2.9.3 In class

  1. Intro to Sigmoid Functions. Introduce the concept of a sigmoidal function as a function that experiences exponential growth or decay initially, and then “flattens out”. Use the TwitterUsers3 data as an example. As with other technology examples, we would expect the number of active Twitter uses to increase exponentially and then flatten out over time.

  2. Parameterization of Sigmoid Functions. Discuss the parameterization of a sigmoid function using pnorm. Go over the interpretation of each parameter. Note that input shifting/scaling is performed by changing parameters m and s inside of pnorm while output shifting/scaling is performed outside of pnorm with \(v\) and \(A\). (While it’s interesting, the parameterization in chapter 2 of the book is not required.)

  3. Approximating the Fit of a Sigmoid Model. Using the TwitterUsers3 dataset, demonstrate how to approximate a sigmoid fit. I recommend going in the order of \(v\), \(A\), \(m\) then \(s\), but any order will do fine.

  4. Best Fit Sigmoid Model. We will use fitModel to obtain a best fit sigmoid model. This can be done directly without transforming the data.

2.9.4 R Commands

plotFun, pnorm, plotPoints, fitModel

2.9.5 Problems & Activities

  1. The sigmoid function is often used to model phenomena that start at some baseline, increase exponentially for a period and then flatten out over time. Examples include active twitter users, first-time iPhone purchases, disease outbreaks, etc. A simple sigmoid function has the form: \[f(x) = \frac{1}{1 + e^{- x}}\]

    The text parameterizes this form, but we will use pnorm instead. The reason is to avoid unnecessary albegra. pnorm is the CDF for the normal distribution – that makes it nice when modeling some things. We mostly model population growth over time, which makes the book’s sigmoid function more appropriate, but either one would be totally suitable for this course.

  2. Using pnorm(x,m,s) helps our input scaling. The value \(x\) is our input variable. The value \(m\) represents the center of our function. In other words, when did we see the biggest increase in active Twitter users, or biggest increase in new disease cases? The value \(s\) is a measure of spread or rate. High values of \(s\) imply a slower incline while small values indicate a faster incline. Plot this function with different values of \(m\) and \(s\). Note that at the input \(m+s\), the output is about two-thirds of the way up from the center output to the max output. This is important for model approximation.

    sig1=makeFun(pnorm(x)~x)
    plotFun(sig1(x)~x,xlim=range(-5,5),ylim=c(-0.2,1.3),col="mediumpurple")
    mathaxis.on()
    grid.on(col="lightgrey")
    
    sig2=makeFun(pnorm(x,1,1)~x)
    plotFun(sig2(x)~x,add=TRUE,lty=2,col="lightsalmon1")
    
    sig3=makeFun(pnorm(x,0,2)~x)
    plotFun(sig3(x)~x,add=TRUE,lty=2,col="darkseagreen3")

    For output scaling, note that changing \(v\) shifts the function up and down while changing \(A\) scales the function vertically. This is similar to how these parameters impact a sinusoidal function:

    sig4=makeFun(0.8*pnorm(x,0,1)~x)
    plotFun(sig4(x)~x,add=TRUE,lty=2,col="deepskyblue3")
    
    sig5=makeFun(pnorm(x,0,1)+0.2~x)
    plotFun(sig5(x)~x,add=TRUE,lty=2,col="hotpink")

  3. Plot the TwitterUsers3 dataset. Visually approximate parameter values:

    plotPoints(Users~Year,data=TwitterUsers3)
    grid.on(col="lightgrey")

    \(v\): The best guess at vertical shift is zero. Not only does the plot look like it would flatten off to the left at 0 users, but 0 is the logical value considering Twitter started with no initial users.

    \(A\): The best guess for vertical scale is the range of function output (maximum minus minimum). In this case a good guess is 300.

    \(m\): This one’s tougher in this case. We’re looking for the point where the curve changes concavity. Because this plot seems to be truncated on the left, this doesn’t occur in the visual center of the plot. Instead, this appears to happen at about year 12.5.

    \(s\): The toughest one. Start at your “center” (about Year = 12.5 and Users = 150). Find where the curve reaches about two-thirds of the way between this center (150) and the max (300). Our estimate of \(s\) is the input value where this occurs (about 14.2) minus the center input value (12.5). Thus, our guess is about 1.7.

    mest=12.5
    sest=1.7
    vest=0
    aest=300
    sigest=makeFun(aest*pnorm(Year,mest,sest)+vest~Year)
    plotFun(sigest(Year)~Year,add=TRUE,col="hotpink")

    Not bad.

  4. Here is a figure I made to help demonstrate how to estimate the parameters of a sigmoid.

  5. Now use fitModel to find the best sigmoid fit. Use our guesses above as initial values:

    plotPoints(Users~Year,data=TwitterUsers3)
    sigfit=fitModel(Users~A*pnorm(Year,m,s)+v,
                   data=TwitterUsers3,
                   start=list(A=aest,m=mest,s=sest,v=vest))
    
    plotFun(sigfit(Year)~Year,add=TRUE,col="red")

  6. Exercises 47-66, and 72-76. Excellent opportunities to practice modeling sigmoid relationships. Do not use Exercises 67-71, as it will most likely be a GR question.

2.9.6 For Instructors Information about estimating spread

You might ask, where does this crazy procedure for estimating the spread come from? Let’s let the spread be \(\sigma\) and the mean be \(\mu\). Our pnorm function is actually the cdf of the normal distribution with mean \(\mu\) and standard deviation \(\sigma\). That is,

\[pnorm(x;\mu,\sigma) = \int_{-\infty}^{x}{\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}(\frac{s - \mu}{\sigma})^2}ds}\]

Go ahead, check for yourself using R.

pnorm(2,5,1)
## [1] 0.001349898
integrate(1/(1*sqrt(2*pi))*exp(-1/2*((s-5)/1)^2)~s,xlim=c(-1000,2))
## 0.001349898 with absolute error < 3.5e-08
## [1] 0.001349898

Let’s assume that \(v = 0\), \(A = 1\). Let \(f(x)\) be the pdf, so

\[f(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}(\frac{s - \mu}{\sigma})^2}\]

and

\[pnorm(x) = \int_{-\infty}^{x}{f(s)ds}\]

First, our procedure says “go 2/3 of the way up from \(pnorm(\mu)\) to \(pnorm(\infty)\), find the corresponding \(x\) coordinate”. I’ll call the corresponding \(x\) coordinate \(x_{crazy}\).

We know from probability that \(pnorm(\mu) = 1/2\), and \(pnorm(\infty) = 1\). So 2/3 of the distance between them is \(\frac{2}{6} = \frac{1}{3}.\) Going \(1/3\) up from \(1/2\) gets you to output value \(\frac{1}{2} + \frac{1}{3} = \frac{5}{6}\). So we are really determining \(x_{crazy}\) to be the number such that \(pnorm(x_{crazy}) = 5/6\).

Ok, keep that in your pocket for a minute. We are talking about the normal distribution, and a standard rule of thumb is that about 2/3 of the probability lies within \(\sigma\) of \(\mu\). Check yourself. Pick any values of \(\mu\) and \(\sigma\) you like:

mu=1; sigma=0.5;
f=makeFun(1/(sigma*sqrt(2*pi))*exp(-1/2*((s - mu)/sigma)^2)~s);
integrate(f(x)~x,xlim=c(mu-sigma,mu+sigma))
## 0.6826895 with absolute error < 7.6e-15
## [1] 0.6826895

So, \[\frac{2}{3} \approx \int_{\mu - \sigma}^{\mu + \sigma}{f(s)ds = pnorm(\mu + \sigma) - pnorm(\mu - \sigma)}\]

We also know that the total probability should be one, so

\[1 = \int_{-\infty}^{\infty}{f(s)ds}\]

Therefore, the area not within one standard deviation of the mean is about 1/3.

\[\frac{1}{3} \approx \int_{-\infty}^{\infty}{f(s)ds} - \int_{\mu - \sigma}^{\mu + \sigma}{f(s)}ds = \int_{-\infty}^{\mu - \sigma}{f(s)ds + \int_{\mu + \sigma}^{\infty}{f(s)ds}}\]

Because the normal distribution is symmetric about the mean, we have

\[\int_{-\infty}^{\mu - \sigma}{f(s)ds = \int_{\mu + \sigma}^{\infty}{f(s)ds}}\]

But these two add up to give about 1/3, so each must be about 1/6. Thus,

\[\frac{1}{6} \approx \int_{-\infty}^{\mu - \sigma}{f(s)ds = pnorm(\mu - \sigma)}\]

Since \(pnorm(\mu + \sigma) - pnorm(\mu - \sigma) \approx 2/3\), that means that \(pnorm(\mu + \sigma) \approx 5/6\). But we calculated \(x_{crazy}\) so that \(pnorm(x_{crazy}) = 5/6\). Therefore, we must have \(x_{crazy} \approx \mu + \sigma\).

2.10 Lesson 10: Single Variable Modeling

2.10.1 Objectives

  1. Given a set of observed data, determine the most appropriate function type to model the dataset.

  2. Given a set of observed data and proposed model type, approximate the parameters of that model by hand.

  3. In R, given a set of observed data, use the fitModel command to obtain the best-fit model of the most appropriate type.

  4. Describe how collecting additional data can change how the model fits the data or which type of model is the most appropriate.

  5. Describe the two primary reasons for modeling data: inference and prediction.

  6. Describe the definition of extrapolation and which input values we can expect a model to provide reasonable outputs for.

2.10.2 Reading

Section 2.6.

2.10.3 In class

  1. Putting it Together. In this lesson, we will put together the last seven lessons; given a dataset, we will identify a suitable model form and approximate a model fit. Feel free to cover the modeling cycle at the start of this section, but note that it is not an objective.

  2. Practice with MMAC Datasets. Cadets should use a large part of this lesson to practice plotting data, identifying appropriate model forms (using semi-log and log-log plots if necessary), visually approximating parameters, and using fitModel to obtain best fits. Please do not use the BlastData dataset, it may be used as a GR question.

  3. Additional Data. At the end of this lesson, discuss what happens when data gets added to a set. Use the twitter example (page 225-7) as a good demonstration. If you can’t get to this today, you can bring it up next lesson.

  4. Modeling Purposes & Limitations. Describe the two primary reasons for modeling data: inference and prediction. Inference: we build models to describe the underlying relationship between the variables. Prediction: we build models to predict the output for certain values of the input(s). When we try to predict an output for an input that is outside the range of our data, that is extrapolation, and should be avoided.

2.10.4 R Commands

plotFun, pnorm, fitModel, plotPoints

2.10.5 Problems & Activities

  1. First, let’s look at the NaturalGasConsumption dataset. Have them follow along as you explore the dataset and then plot \(CubicFeet\) versus \(Year\). What kind of model does this look like? Either exponential or power. To determine that, we’ll have to look at semi-log and log-log plots:

    plotPoints(CubicFeet~Year,data=NaturalGasConsumption)
    plotPoints(log(CubicFeet)~Year,data=NaturalGasConsumption)
    plotPoints(log(CubicFeet)~log(Year),data=NaturalGasConsumption)

    Both the semi-log and log-log plots look linear, so which is appropriate? Exponential or power? In this case, either one will give a reasonable model.

    If we model with an exponential function, we first identify that \(h=1950\). Build a linear model of the semi-log data.

    CubicFeetModel1=fitModel(log(CubicFeet)~m*(Year-1950)+b,data=NaturalGasConsumption)
    coef(CubicFeetModel1)
    ##           m           b 
    ##  0.06052149 15.66248660

    This gives us the model \[CubicFeet=e^{15.7}e^{0.61(Year-1950)}.\]

    plotPoints(CubicFeet~Year,data=NaturalGasConsumption)
    grid.on(col="lightgrey")
    
    plotFun(CubicFeetModel1(Year)~Year,add=TRUE,col="darkseagreen")

    It is going to be hard to beat that. We can try, of course. To built a power model, ask fitModel to build a linear model of the log-log data.

      CubicFeetModel2=fitModel(log(CubicFeet)~m*log(Year)+b,data=NaturalGasConsumption)
    coef(CubicFeetModel2)
    ##         m         b 
    ##  118.6276 -883.0119

    This model is \[CubicFeet=e^{-883}Year^{118.7}\] These huge and tiny powers make us think that this model might not be good in the long run, but this model does indeed fit reasonably well.

    plotPoints(CubicFeet~Year,data=NaturalGasConsumption)
    grid.on(col="lightgrey")
    
    plotFun(CubicFeetModel2(Year)~Year,add=TRUE,col="coral3")

  2. They should now practice this for various datasets. Perhaps you can set each pod to work on a different set. Some sets I suggest:

ElectronicMailOrderSales, Hawaii, BodyMassMetabolicRate

Please do not use the BlastData dataset, it may be used as a GR question.

  1. ExamineTwitterUsers1, TwitterUsers2, and TwitterUsers3. The point is that as you collect more data, the type of model you decide to use may change. On a small enough timescale almost anything looks linear!

  2. Exercises 31-130 from Section 2.6.

2.11 Lesson 11: Multivariable Functions

2.11.1 Objectives

  1. Given a multivariable function, evaluate the function at a given point (set of inputs) by hand or using R.

  2. Given a multivariable function in tabular or graphical form, evaluate the function at given points, and identify whether the function is increasing or decreasing with respect to each input variable.

  3. Given a surface plot or contour plot, estimate where extreme values (min/max) of the function occur.

  4. In R, use makeFun to define a multivariable function.

  5. In R, use the plotFun command to generate a surface plot or a contour plot for a function with two inputs on a specified domain.

2.11.2 Reading

Section 1.2.

2.11.3 In class

  1. This lesson will serve as an introduction to functions with more than one input. The rest of this block will focus on univariate functions, but in this lesson, we will note that functions can have more than one input, and many of the concepts are the same.

  2. The bulk of this lesson should be focused on interpreting visualizations and constructing them in R. If they have done the reading, they will be able to input values into a function by hand and produce an output. Where they will need practice is with building and interpreting visualizations, particularly contour plots.

  3. Start by showing them a contour plot. You can generate your own in RStudio or use one from the text. (I like to relate contour plots to hiking maps. Some of them will have seen these before.) On this contour plot, describe how to find the value of the function at particular inputs. Also, use the contour plot to find extrema and to describe how/whether the function changes as one of the inputs change. We will revisit this when we get to partial derivatives, so it will be helpful for them to be able to intuitively think about how functions change with respect to single variables.

  4. Next, demonstrate how to build a multivariable function in R and how to obtain surface plots and contour plots. I recommend doing this via interactive example. You work through an example on RStudio while they do the same at their desks.

  5. Note that the text asks them to sketch a contour plot by hand. I am not requiring them to do this.

2.11.4 R Commands

makeFun, plotFun with surface and filled options

2.11.5 Problems & Activities

  1. Use the function \(f(x,y) = x^{2} - xy + y^{2}\); have them define this in R, evaluate it at particular coordinates, and then plot. Use both surface and contour plots. The text focuses mostly on contour plots, which are fine, but Calc 3 will likely focus mostly on surface plots. This will be a good opportunity to have them explore x and y limits as well as the levels argument.

    f = makeFun(x^2 - x*y + y^2 ~ x & y)
    
    f(-1,1)
    ## [1] 3
    f(0,2)
    ## [1] 4
    f(1,2)
    ## [1] 3
    plotFun(f(x,y) ~ x & y, surface = TRUE,xlim=c(-5,5),ylim=c(-5,5))
    plotFun(f(x,y)~x&y)

    plotFun(f(x,y)~x&y, filled=FALSE, xlim=range(-5,5), ylim=range(-5,5), levels=c(0,5,20,50))

  2. For the contour plot above, ask them questions about how the function changes as x or y increases, starting at particular coordinates.

  3. Exercises 33-40 in the text section 1.2.

  4. Exercises 83-88.

2.12 Lesson 13: Dimension Analysis

2.12.1 Objectives

  1. Describe the difference between quantities, dimensions and units.

  2. Distinguish between fundamental and derived dimensions.

  3. Apply rules of dimension arithmetic to find the dimension of an operation with quantities, or describe why the operation is invalid.

  4. Given the equation of a model and the dimensions of certain quantities in that model, use dimension algebra to solve for the dimension of another specified quantity in the model.

2.12.2 Reading

Section 2.7.

2.12.3 In class

  1. Introduce Quantities vs Dimensions vs Units. The whole idea behind this lesson is to pay attention to units and dimensions. This is a valuable skill that they can carry with them to downstream courses. Numbers themselves sometimes don’t have meaning until you provide some kind of context (dimensionality).

  2. Common Dimensions and Basic Rules. The primary dimensions we will focus on are length (L), mass (M), time (T), temperature (theta), and money (B). Feel free to convey to the cadets that the others are optional and not tested. Also note that all of the common derived dimensions on page 243 are based on just mass, length and time. Go over rules 1-3 in the text (addition, multiplication, and exponentiation). We are not concerned with rule 4, or with the topic of generalized products.

  3. Obtaining Derived Dimensions. Go through examples of obtaining the dimension of a quantity that is a function of other quantities. Provide them time to practice this on their own at their tables or on the board.

2.12.4 R Commands

2.12.5 Problems & Activities

  1. Go over the basic dimensions. Cadets struggle with the notation here, so it’s good to introduce this early and stick with it throughout the lesson. Some examples:

    1. [3 minutes] = T

    2. [14.2 kg] = M

    3. [$0.78] = B

    4. [44 mm] = L

    5. [1.2 ft³] = L³

  2. Now go over the basic rules and some examples.

    1. Rule 1: Addition. We can only add quantities that have the same dimension. While 13+4 = 17, it does not make sense to add 13 km to $4. Can the following be computed?

      1. 17 minutes + 2 hours

      2. 17 minutes + 2 meters

      3. 0.5 lbs + 432 grams

    2. Rule 2: Multiplication. You can multiply/divide quantities of different dimensions. There are no restrictions, but sometimes the resulting dimension can be tough to interpret.

    3. Rule 3: Exponentiation. \(a^{r}\) is only a valid operation if \(r\) is dimensionless.

  3. In the text, have them work through exercises 17-30 on page 258-9. They should identify whether the operation is possible and if so, what is the resulting dimension.

  4. Exercises 51, 52, 61, and if you’re adventurous, exercises 56-58 (using the SIR model).

  5. Exercises 62-71.