13 Appendix B - Using R Markdown to create reports
This appendix explains how to take an R script that carries out data analysis that generates tables, visualizations, and other analyses, and place it into a R Markdown script that can knit the relevant output together with text to generate a report. Basic options for controlling the look of reports are reviewed.
13.1 R Markdown
One of the powerful features of R and R Studio is the ability to use a markdown programming language that can “knit” or “weave” together the results of analysis done in R with text, thus creating a report. R Markdown is a language, a set of commands, options, and syntax, for authoring HTML, PDF, and Word documents. The coding capabilities needed to create Word and PDF files are a bit beyond this introduction (many things can go wrong!), so we limit the discussion here to creating reports in HTML. HTML is the standard markup language for creating documents to be displayed by web browsers such as Chrome, Firefox, Safari, or Microsoft Edge. For more details on using R Markdown see http://rmarkdown.rstudio.com. An excellent tutorial is available at: https://ourcodingclub.github.io/tutorials/rmarkdown/.
13.2 Getting started
Open RStudio. There are two ways to get started on an R Markdown script.
First, you can open a default script by using the File-New File menu selection command in RStudio. This will offer you a menu of different formats. Choose R Markdown. Do not choose the ordinary Markdown option that is lower down. When you choose R Markdown, you will be prompted to give the script document a title and an author name. Usually you will want a short title, and if there is more than one author then separate the names with commas. When you click OK, a script document will open. Notice that in the tab above the script the document is named “Untitled.” Save this script right away, in the folder where you keep your R work, and give the file document a short name appropriate to the work you are doing. Notice that when you save, the file extension is .Rmd. If you close the script, and go to the folder where you have saved it, you will see the file with extension .Rmd.
Second, you could open a script that has been provided to you, and is already an R Markdown script. It should have the file extension .Rmd. It will already have a file name, of course. it is good practice to save it after opening and amending the file name, by, perhaps, adding your initials to the file name.
If you have opened an R Markdown script using the file menu command, or you have a different template script provided to you, you can usually immediately “knit” the script to get an idea how it works. When you click the Knit button or icon (with the ball of yarn and the knitting needle) an HTML document will be generated that includes both content as well as the output of any embedded R code chunks within the document. It takes about 30 seconds to knit, but it can feel like an eternity. Once finished, the knitted report will pop open in a new window. Read through the report. Close the report window. Use Finder or Windows Explorer to go to your folder and see that the .html file is there. Open the file in your web browser, just to see how you can open the file in a browser.
There is one hitch when you go to knit a new R Markdown file for the first time. You have to save it in a folder and give the file a name. If the tab of the script still says “Untitled” then R cannot knit the script and generate a report. Save your .Rmd using a new name for it, such as MK_first_rmarkdown.Rmd
.
When you need to generate an R Markdown script, you can use this initial script that you have saved, or a script provided to you, or use the file menu commands. We suggest that whenever you need to create a new R Markdown script, you take a regular R script you are working on and have verified that it works, and then you copy and paste it into this initial R Markdown script, adding new code chunks and editing text, and saving under a new name. This first R Markdown script should become your “R Markdown template” script (unless you have been provided with another one. There is never any need to create a new R Markdown script from scratch.
An important thing to know about R Markdown is that when you knit R opens a completely new and blank “environment” and works through the commands in your R Markdown script in order, code chunk by code chunk. It does not matter that you have a dataset or dataframe already in your environment. The knitting process will not recognize that. So your script commands must work in the order they are written, one line after another. It is always good practice when doing R Markdown to make sure that you start your R script “fresh,” highlight the whole script, and run it. If you get an error, it is best to debug the error in R script mode, and not in R Markdown mode. Then once your script works perfectly, cut and paste it appropriately into code chunks into an R Markdown script and knit.
13.3 Editing an R Markdown script
R Markdown scripts have three components. The YAML at the top, gray-colored code chunks set of by three backticks, and white spaces in between the code chunks where you may type text and control the formatting of text.
At the top of the script is a section set off by three dashes ``—’’ above and below. This is called the YAML. Never adjust the format here until you are sure you know what you are doing. R Markdown is very sensitive to bad formatting there (extra spaces, etc.). The author name, date, and title can be changed. But do not change the format.
---
title: "Assignment 1"
author: "Natalia Lafourcade"
date: "2024-04-13"
output: html_document
---
After the YAML, R Markdown scripts contain gray-colored code chunks that are set off by three backticks flush with the left margin. For example, you might see:
``` \{r readdata, echo=TRUE, warnings=FALSE }
# Read Kenya DHS 2022 data from a website
url <- "https://github.com/mkevane/econ42/raw/main/kenya_earnings.csv"
kenya <- read.csv(url)
```
The top of each code chunk has braces { } after the backticks and inside the braces are various options that follow the lowercase letter r
. This code chunk is called readdata
. As one might expect from the name, the chunk has two commands, one that reads in a .csv file, and other that creates a new variable. A code chunk called readdata
more generally might contain R commands that read in and wrangle the data, creating new variables, renaming variables, and other pre-analysis operations. Code chunks may be named, or not named. One rule is that two code chunks cannot have the same name. The code chunk names must be different.
Code chunks contain R commands. The commands must be in the same order as they were in the R script.
Using several code chunks is better than using a single code chunk. There are two reasons to use many code chunks.
The first is that the whole point of “knitting” a report is that you want text woven into the output produced by R. The text will discuss a table or a figure. The text might introduce a table of descriptive statistics. After that, there might be more text describing the regression specification, and then the regression results in a table produced in R, and then more text after the table interpreting the estimated coefficients.
The second reason is for troubleshooting. If you have a mistake in the code, knitting will sometimes return an error code that identifies the first line of the code chunk where your error is. If you only have one long code chunk, this is useless. It is good practice to break up code into several code chunks so that when troubleshooting you can more easily find the place where the mistake is.
Here are several code chunks that you can copy and paste into an R Markdown script and knit. The code chunks first load packages, that should already have been installed in your computer.
```\{r setup, messages=FALSE, comments=NA, echo=TRUE}
# Load the packages
# (must have been installed)
library(tidyverse)
library(modelsummary)
library(sandwich)
library(estimatr)
# turn off scientific notation except for big numbers, and round decimals
options(scipen = 9)
options(digits = 3)
```
Then the second code chunk reads in the Kenya DHS 2022 data.
``` r
# Read Kenya DHS 2022 data from a website
url <- "https://github.com/mkevane/econ42/raw/main/kenya_earnings.csv"
kenya <- read.csv(url)
```
Then the third code chunk creates a table of descriptive statistics.
\{r descriptives, echo=TRUE, warnings=FALSE, messages=FALSE, comments=NA} datasummary(All(subset(kenya, earnings_usd<=1000)) ~Mean + SD + Median, data=subset(kenya, earnings_usd<=1000), title="Kenya earnings dataset")
The fourth code chunk generates a scatterplot of the data.
```\{r plot1, echo=TRUE, warnings=FALSE, message=FALSE, comments=NA}
ggplot(data=subset(kenya, earnings_usd<=1000), aes(earnings_usd)) +
geom_histogram(breaks=seq(0, 1000, by = 50),
col="black", fill="gray", alpha = .5) +
labs(x="Earnings", y="Count") + theme_bw()
```
As can be seen, code chunks can have a variety of options. Usually, your knitted report should not have the code appearing. Use echo=FALSE option in the code chunk. Sometimes, especially for the setup code chunk where you load packages, or data wrangling code chunks, you want to suppress all R output. Use the include=FALSE option in the code chunk (but be careful because that also excludes output such as tables or plots).
The first line of a code chunk might look like this:
```\{r name, echo=FALSE, comment=NA, warnings=FALSE, messages=FALSE}
```
This code chunk has a name, and four options. The options suppress the code from echoing in the report, suppress comments that R produces when carrying out commands, and suppresses warnings and messages that R produces. Comments, warnings, and messages would ordinarily show up in the Console in a plain R script.
In between code chunks is white space. This is where you put your text. Normally, your report would start with an introduction. That introductory text should be placed in the white space after the initial code chunks that load the packages, read in the data, and do some preliminary data wrangling. In most reports, you would want section headers. Use the hashtags to control the size of the font. For example, three hashtags will produce a nicely sized header. There must be a space between the hashtags and the text.
### 1. Introduction
This text chunk will render some basic text that will constitute the introduction of the report. The text will automatically wrap around the line. To make a new paragraph in the report you should hit Enter twice after the period of the last sentence in the earlier paragraph. Then you will have a blank line between paragraphs.
See how there is a blank line between the two paragraphs? This second paragraph might contain a new idea, right? The three hashtages ### before Introduction will make it appear in a different font when you knit.
When writing text and inserting section headings, be sparing in your use of hashtags to change the font size. Uniform font size with bold and italics usually looks better than large headers, or with text three times bigger than the font size of the tables.
One cool thing you can do in R Markdown is embed R commands in text so that you do not have to cut and paste specific numbers in the text. For example, suppose you wanted to write in your text part of the report the sentence, “The mean of income is 32,000.” You want the number to be rounded, perhaps, and not have 8 decimals. You could then have in your text chunk:
The mean of income is r round(mean(df$income, na.rm=TRUE), 2)
When it knits the rounded mean will appear with two decimal places. Backticks are used to set off the embedded command, and the letter r has to be first, followed by a space, and then the R commands. usually you only want to do this when calculating a specific single value number, and want to insert it in the text.
13.4 Common reasons an R Markdown script does not knit
- There are many reasons why your script may not knit. We list below some common ones.
- Your file may be saved as a .R file and it needs to be saved as a .Rmd file in order to knit.
- Perhaps you have included an
install.packages()
command in your script, which will cause knit to fail. Hashtag # thetinstall.packages()
command after the package is installed, or delete that line. You cannot install packages in the R Markdown environment that opens when you knit the script. - It may also be the case that the R Markdown script includes the command to have R load a package, that is, with the
library()
command, but the package is not actually installed on your computer. In that case, the knitting will fail. You will need to install the relevant package, and after installing, try to knit again. - Relatedly, your script asks R to run a command, but the package for that command has not been loaded with a
library()
command in the script. - You may have a space before the beginning of a code chunk’s three back-ticks. The backticks must be flush with left margin. When they are flush, and when in R Markdown mode, the code chunk will be gray
- Possibly your YAML at the top is not correct. Go copy a YAML from a template and replace what you have. The first code chunk should start right after the YAML.
- The most important problem in R Markdown is setting the working directory in the R Markdown environment in order to read in files on your computer harddrive. If you are reading in files from your harddrive, your working directory needs to be set correctly. It is usually good to use the setwd() command right before the read.csv() command (or any other command that is reading in data from computer, rather than from an Internet address). Have both in the same code chunk, first setting the directory then reading data file.
One final thing to notice. When you open an R Markdown file, in the script window the lower right hand corner tells you the mode is R Markdown, and not R Script. You can toggle between the different modes, but usually you would stay in R Markdown mode when working on an R Markdown script.
13.5 Summary
- It may be useful to close with some key points, concepts, commands, and options:
- Knit: weaves together results of code and text to create a report, for an appropriate script written in the R Markdown syntax.
- In the text chunk, backticks can be used to embed R commands so the results appear in the text.
- An R Markdown cheat sheet is available from R Studio and is very helpful. Search for it in Google or use this url address: https://rmarkdown.rstudio.com/lesson-15.html
- Options in each code chunk can be used to control appearance. The echo=FALSE option is the most important, so that your report does not also include the underlying code.
- Other good options to include in the code chunk first line are messages=FALSE, comment=NA and warnings=FALSE.