Preface
In the 21st century, nobody does statistics by pen and paper. Those statistical tables found at the back of statistics textbooks belong in the museum. Statistics you hear from the news and research are produced by computers. This book uses R. You may ask why not using a spreadsheet, such as Microsoft Excel or Google Sheets. While they are excellent tools - easy to use, they have significant shortcomings. In particular, they lack of reproducibility, as they operate through point-clicking, drag-and-drop, etc.
This book intends to provide readers with the bare minimum of R to start analyzing data. In particular, this book is written for readers who have scant experience in computer programming or scripting. I want them to focus on deploying R’s rich library of statistical tools to get the desired results rather than being bogged down by the technical details and cryptic syntax of the R language. Although most people think a book like this is about R programming, I prefer to call it scripting. The nuance is that the goal of programming is to implement algorithms - a step-by-step procedure to tackle a problem like a cooking recipe. Here we focus on what (expected results) instead of how (algorithms); just tells the R functions here is the data and the parameters, please process them and return the results.
Usage
The book consists of two parts. Part I focuses on the basics. Chapter 1 discusses R objects and operations act on these objects, descriptive statistical functions, and data wrangling using base R and tidyverse
. Chapter 2 covers on data visualization. Both base R and ggplot2
graphical systems will be discussed. Importantly, it provides guidelines to readers how to choose the most suitable graph according to the properties of data. A picture is worth a thousand words. This book will discuss the purpose, pros and cons of each type of graph. Part II includes use cases that leverage the topics discussed in the previous parts.
Scripting is best learned through hands-on practice; it is like learning to swim. You don’t just listen to lectures. You’ve got to jump into the water and get wet! To try out the examples and exercises, you need to have access to RStudio. There are two options: sign in to the RStudio cloud service from here: (https://posit.cloud/plans?utm_source=Website&utm_medium=IDE_Download), or install R and RStudio on your own computers. Let me use a car analogy to illustrate the relationship between R and RStudio. R is the engine and RStudio is the dashboard. RStudio provides an Interactive Development Environment (IDE) that provides a user-friendly environment for you to unleash the power of R. Both of them work on household Windows and macOS computers. The installation is straightforward; it involves few double-clicks. Here are the links to download R (https://cran.r-project.org/) and RStudio (https://posit.co/download/rstudio-desktop/).
Lastly, this is a live book that I will continue to update with topics to help readers like you. I invite you to check back periodically to discover new material.