B Developing R packages

This appendix collects various notes and resources on developing R packages.

B.1 Basics

Having created various R projects, an R user may be tempted to become an R developer by wrapping up an R project into an R package. The main idea behind R packages is to make collections of R code available to other users.

As with all things R, the process of writing R packages is mostly learned by doing and will always remain a continuous journey, rather than a destination. Embarking on this journey requires acquiring some new terminology and skills, but is supported by helpful instructions and tools.

B.1.1 What are R packages?

Using R implies relying on vast amounts of R code that is structured and provided in the form of R packages. Whenever we run R, a set of core R packages — including base, datasets, graphics, stats, and utils — is being loaded and provides a common collection of data objects and functions. When we use existing functions to create new data objects or functions, developing a new R package provide a great way of bundling and sharing code with other R users.

B.1.2 Contents and structure

R packages contain collections of R code (data and functions) and auxiliary files (e.g., documentation and images) in one directory. Standard files include

  • DESCRIPTION to define metadata and organize functions
  • NAMESPACE to list imported and exported objects

and the following subdirectories:

  • R/ for storing R scripts
  • data/ for storing data files
  • man/ for storing documentation files and images
  • vignettes/ for storing articles on package use
  • tests/ for storing code verification scripts

B.1.3 Rules and tools

R packages are governed by many rules and conventions (see CRAN’s guidelines on Writing R Extensions). For instance, any R object (e.g., all arguments of all functions) must be documented and the package must work without errors on various R systems.

Fortunately, other R developers have created many useful tools — mostly in the form of other R packages (e.g., devtools, roxygen2, usethis, etc.) — to facilitate R package development (see Resources below).

B.1.4 Private vs. published R packages

R packages come in many forms and dwell in many different locations.

Package habitats

R packages can be entirely private: Many users maintain collections of useful functions in the form of an R package that is only stored on their personal system. On the other end of the spectrum, The Comprehensive R Archive Network (CRAN) currently hosts over 22,000 published R packages on a network of servers around the world. The collection of packages hosted on CRAN (and the related Bioconductor site) comprise the official canon of the currently available R packages.

An intermediary option consists in creating an R package that is hosted on another server and hence can be downloaded and installed by those R users who have access to this server. Popular web platforms for hosting such semi-public packages are R-Forge and GitHub.com.

CRAN

Publishing a package on CRAN involves considerable effort, regular monitoring and occasional updates by a person identifying as the package’s maintainer. For instance, the package must run without errors or warnings on multiple R versions and software platforms (including Mac OS, Microsoft Windows, and various Unix systems). The main benefit of hosting a package on CRAN is that the package is easily made available to R users worldwide.

B.1.5 Authors and other roles

The author(s) of an R package are usually denoted as aut in the package DESCRIPTION, and referred to as its designer(s) or developer(s). Other common roles include the package creator/maintainer cre, contributor ctb, thesis advisor ths, or translator trl (see Hornik et al., 2012, for details).

B.2 Resources

This section collects links to various resources for R package development.

B.2.1 Instructions

Many manuals and books provide comprehensive instructions on developing R packages:

Cheatsheet on R package development from Posit cheatsheets.

Figure B.1: Cheatsheet on R package development from Posit cheatsheets.

B.2.2 Tools

Developing R packages typically relies on a vast ecosystem of sophisticated tools. These involve not only R packages, but also general purpose software tools (like IDEs and version control systems). Most of them are dynamically developing and freely available online.

R packages

Tools for package development include the following R packages:

Version control

Version control is an important but somewhat daunting topic when developing larger projects. Here are some links to explain this concept, explain why it is useful, and get you started:

B.2.3 Miscellaneous resources

Other useful resources on R package development include:

  • All R packages available on CRAN

  • R-Forge is another platform for developing of R packages and R-related software projects

  • Posit is an open source data science company that provides and maintains many popular data science tools:
    • the Positron code editor
    • the Quarto publishing system
    • the RStudio IDE, or
    • the Shiny web app system. Additionally,
    • their cheatsheets provide visual summaries of key R tasks and packages