4  Case study: ER injuries

Code
```{r}
#| label: setup

knitr::opts_chunk$set(
  results = "hold"
)
library(glossary)
glossary::glossary_path("../glossary-pb/glossary.yml")


library(shiny)
library(vroom)
library(tidyverse)
```

4.1 Introduction

This time I have loaded the packages {shiny}, {vroom} (for fast file reading) and {tidyverse} (for general data analyzing) only once in the initializing setup code chunk.

Chapter section list

4.2 The data

We’re going to explore data from the National Electronic Injury Surveillance System (NEISS), collected by the Consumer Product Safety Commission. This is a long-term study that records all accidents seen in a representative sample of hospitals in the United States. It’s an interesting dataset to explore because every one is already familiar with the domain, and each observation is accompanied by a short narrative that explains how the accident occurred. You can find out more about this dataset at https://github.com/hadley/neiss.

In this chapter, I’m going to focus on just the data from 2017. This keeps the data small enough (~10 MB) that it’s easy to store in git (along with the rest of the book), which means we don’t need to think about sophisticated strategies for importing the data quickly (we’ll come back to those later in the book). You can see the code I used to create the extract for this chapter at https://github.com/hadley/mastering-shiny/blob/main/neiss/data.R.

R Code 4.1 : Download the NEISS dataset

Run this code chunk manually if the file(s) still needs to be downloaded.
Code
dir.create("neiss")

download <- function(name) {
  url <- "https://raw.github.com/hadley/mastering-shiny/main/neiss/"
  download.file(paste0(url, name), paste0("neiss/", name), quiet = TRUE)
}
download("injuries.tsv.gz")
download("population.tsv")
download("products.tsv")

The main dataset we’ll use is injuries, which contains around 250,000 observations:

R Code 4.2 : Show first rows of the NEISS dataset

Code
injuries <- vroom::vroom("neiss/injuries.tsv.gz")
#> Rows: 255064 Columns: 10
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr  (6): sex, race, body_part, diag, location, narrative
#> dbl  (3): age, prod_code, weight
#> date (1): trmt_date
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Code
injuries
#> # A tibble: 255,064 × 10
#>    trmt_date    age sex    race       body_part  diag  location prod_code weight
#>    <date>     <dbl> <chr>  <chr>      <chr>      <chr> <chr>        <dbl>  <dbl>
#>  1 2017-01-01    71 male   white      Upper Tru… Cont… Other P…      1807   77.7
#>  2 2017-01-01    16 male   white      Lower Arm  Burn… Home           676   77.7
#>  3 2017-01-01    58 male   white      Upper Tru… Cont… Home           649   77.7
#>  4 2017-01-01    21 male   white      Lower Tru… Stra… Home          4076   77.7
#>  5 2017-01-01    54 male   white      Head       Inte… Other P…      1807   77.7
#>  6 2017-01-01    21 male   white      Hand       Frac… Home          1884   77.7
#>  7 2017-01-01    35 female not stated Lower Tru… Stra… Home          1807   87.1
#>  8 2017-01-01    62 female not stated Lower Arm  Lace… Home          4074   87.1
#>  9 2017-01-01    22 male   not stated Knee       Disl… Home          4076   87.1
#> 10 2017-01-01    58 female not stated Lower Leg  Frac… Home          1842   87.1
#> # ℹ 255,054 more rows
#> # ℹ 1 more variable: narrative <chr>

Each row represents a single accident with 10 variables:

  • trmt_date is date the person was seen in the hospital (not when the accident occurred).
  • age, sex, and race give demographic information about the person who experienced the accident.
  • body_part is the location of the injury on the body (like ankle or ear); location is the place where the accident occurred (like home or school).
  • diag gives the basic diagnosis of the injury (like fracture or laceration).
  • prod_code is the primary product associated with the injury. weight is statistical weight giving the estimated number of people who would suffer this injury if this dataset was scaled to the entire population of the US.
  • narrative is a brief story about how the accident occurred.

We’ll pair it with two other data frames for additional context: products lets us look up the product name from the product code, and population tells us the total US population in 2017 for each combination of age and sex.

Code Collection 4.1 : Load and show additional datasets

R Code 4.3 : Load and show the products dataset

Code
products <- vroom::vroom("neiss/products.tsv")
products
#> # A tibble: 38 × 2
#>    prod_code title                                   
#>        <dbl> <chr>                                   
#>  1       464 knives, not elsewhere classified        
#>  2       474 tableware and accessories               
#>  3       604 desks, chests, bureaus or buffets       
#>  4       611 bathtubs or showers                     
#>  5       649 toilets                                 
#>  6       676 rugs or carpets, not specified          
#>  7       679 sofas, couches, davenports, divans or st
#>  8      1141 containers, not specified               
#>  9      1200 sports or recreational activity, n.e.c. 
#> 10      1205 basketball (activity, apparel or equip.)
#> # ℹ 28 more rows

R Code 4.4 : Load ans show the population dataset

Code
population <- vroom::vroom("neiss/population.tsv")
population
#> # A tibble: 170 × 3
#>      age sex    population
#>    <dbl> <chr>       <dbl>
#>  1     0 female    1924145
#>  2     0 male      2015150
#>  3     1 female    1943534
#>  4     1 male      2031718
#>  5     2 female    1965150
#>  6     2 male      2056625
#>  7     3 female    1956281
#>  8     3 male      2050474
#>  9     4 female    1953782
#> 10     4 male      2042001
#> # ℹ 160 more rows

4.3 Exploration

Code Collection 4.2 : Explore the dataset(s)

R Code 4.5 : Pull out the injuries associated with toilets

Code
selected <- injuries |> filter(prod_code == 649)
nrow(selected)
#> [1] 2993

From 255064 accidents in the year 2017 2993 (1.17%) involved a toilet.

R Code 4.6 : Location where the accident with the toilet happened

Code
selected |> count(location, wt = weight, sort = TRUE)
#> # A tibble: 6 × 2
#>   location                         n
#>   <chr>                        <dbl>
#> 1 Home                       99603. 
#> 2 Other Public Property      18663. 
#> 3 Unknown                    16267. 
#> 4 School                       659. 
#> 5 Street Or Highway             16.2
#> 6 Sports Or Recreation Place    14.8

As you might expect, injuries involving toilets most often occur at home.

R Code 4.7 : Body part affected of the injury connected with a toilet

Code
selected |> count(body_part, wt = weight, sort = TRUE)
#> # A tibble: 24 × 2
#>    body_part        n
#>    <chr>        <dbl>
#>  1 Head        31370.
#>  2 Lower Trunk 26855.
#>  3 Face        13016.
#>  4 Upper Trunk 12508.
#>  5 Knee         6968.
#>  6 N.S./Unk     6741.
#>  7 Lower Leg    5087.
#>  8 Shoulder     3590.
#>  9 All Of Body  3438.
#> 10 Ankle        3315.
#> # ℹ 14 more rows

The most common body parts involved possibly suggest that these are falls (since the head and face are not usually involved in routine toilet usage).

R Code 4.8 : Diagnosis resulted from the accident that involved a toilet

Code
selected |> count(diag, wt = weight, sort = TRUE)
#> # A tibble: 20 × 2
#>    diag                        n
#>    <chr>                   <dbl>
#>  1 Other Or Not Stated   32897. 
#>  2 Contusion Or Abrasion 22493. 
#>  3 Inter Organ Injury    21525. 
#>  4 Fracture              21497. 
#>  5 Laceration            18734. 
#>  6 Strain, Sprain         7609. 
#>  7 Dislocation            2713. 
#>  8 Hematoma               2386. 
#>  9 Avulsion               1778. 
#> 10 Nerve Damage           1091. 
#> 11 Poisoning               928. 
#> 12 Concussion              822. 
#> 13 Dental Injury           199. 
#> 14 Hemorrhage              167. 
#> 15 Crushing                114. 
#> 16 Dermat Or Conj           84.2
#> 17 Burns, Not Spec          67.2
#> 18 Puncture                 67.2
#> 19 Burns, Thermal           34.0
#> 20 Burns, Scald             17.0

The diagnoses seem rather varied. There is no pattern recognizable.

We can also explore the pattern across age and sex. We have enough data here that a table is not that useful, and so I make a plot, Screenshot 4.1, that makes the patterns more obvious.

Code Collection 4.3 : Pattern across age and sex

R Code 4.9 : Calculate and show the number of accidents broken down by age and sex

Code
summary <- selected %>% 
  count(age, sex, wt = weight)
summary

summary %>% 
  ggplot(aes(age, n, colour = sex)) + 
  geom_line() + 
  labs(y = "Estimated number of injuries")
#> # A tibble: 208 × 3
#>      age sex          n
#>    <dbl> <chr>    <dbl>
#>  1     0 female    4.76
#>  2     0 male     14.3 
#>  3     1 female  253.  
#>  4     1 male    231.  
#>  5     2 female  438.  
#>  6     2 male    632.  
#>  7     3 female  381.  
#>  8     3 male   1004.  
#>  9     4 female  261.  
#> 10     4 male    843.  
#> # ℹ 198 more rows
Screenshot 4.1: Estimated number of injuries caused by toilets, broken down by age and sex

We see a spike for young boys peaking at age 3, and then an increase (particularly for women) starting around middle age, and a gradual decline after age 80. I suspect the peak is because boys usually use the toilet standing up, and the increase for women is due to osteoporosis (i.e. I suspect women and men have injuries at the same rate, but more women end up in the ER because they are at higher risk of fractures).

One problem with interpreting this pattern is that we know that there are fewer older people than younger people, so the population available to be injured is smaller. We can control for this by comparing the number of people injured with the total population and calculating an injury rate. See the next tab, where a rate per 10,000 is used.

R Code 4.10 : Calculate and show the number of accidents per 10,000 people broken down by age and sex

Code
summary <- selected %>% 
  count(age, sex, wt = weight) %>% 
  left_join(population, by = c("age", "sex")) %>% 
  mutate(rate = n / population * 1e4)

summary

summary %>% 
  ggplot(aes(age, rate, colour = sex)) + 
  geom_line(na.rm = TRUE) + 
  labs(y = "Injuries per 10,000 people")
#> # A tibble: 208 × 5
#>      age sex          n population   rate
#>    <dbl> <chr>    <dbl>      <dbl>  <dbl>
#>  1     0 female    4.76    1924145 0.0247
#>  2     0 male     14.3     2015150 0.0708
#>  3     1 female  253.      1943534 1.30  
#>  4     1 male    231.      2031718 1.14  
#>  5     2 female  438.      1965150 2.23  
#>  6     2 male    632.      2056625 3.07  
#>  7     3 female  381.      1956281 1.95  
#>  8     3 male   1004.      2050474 4.90  
#>  9     4 female  261.      1953782 1.33  
#> 10     4 male    843.      2042001 4.13  
#> # ℹ 198 more rows
Screenshot 4.2: Estimated rate of injuries per 10,000 people, broken down by age and sex

(Note that the rates only go up to age 80 because I couldn’t find population data for ages over 80.)

Plotting the rate yields a strikingly different trend after age 50: the difference between men and women is much smaller, and we no longer see a decrease. This is because women tend to live longer than men, so at older ages there are simply more women alive to be injured by toilets.

Finally, we can look at some of the narratives. Browsing through these is an informal way to check our hypotheses, and generate new ideas for further exploration. Here I pull out a random sample of 20:

R Code 4.11 : Pull out 20 random examples of the narrative

Code
selected %>% 
  sample_n(20) %>% 
  pull(narrative)
#>  [1] "81YOM H'TMA F'HD- FELL TOILET TO FLOOR AT NH"                                                                                     
#>  [2] "58YOF WHEELCHAIR BOUND WAS TRANSFERING FROM WHEELCHAIR TO TOILET WHEN SHE FELL DX LT PATELLA FX"                                  
#>  [3] "83YOM PASSED OUT WHILE SITTING ON THE TOILET AT THE NURSING HOME AND FELL ONTO FACE FRACTURED NOSE"                               
#>  [4] "96YOF WITH FLANK PAIN AFTER FALLING FROM TOILET DX PAIN*"                                                                         
#>  [5] "49 YOM SLIPPED AND FELL IN BATHROOM & BROKE TOILET.DX:  SPRAIN R KNEE, R SHOULDER."                                               
#>  [6] "64 YOF GROUND LEVEL FALL, STANDING UP FROM TOILET & HIT HEAD ON FLOOR C/O HEADACHE DX HEADACHE UNSPECIFIED"                       
#>  [7] "85YOF GOT UP TO USE COMMODE AND \"TRIPPED ON MY FEET\" TAILBONE/HEAD PAIN DX-ACCIDENTAL FALL, ACUTE HEAD INJ, COCCYGEAL CONTUSION"
#>  [8] "52YOM HIP PAIN- FELT POP WHILE USING TOILET"                                                                                      
#>  [9] "93YOF FX SHLDR- STOOD FROM TOILET & FELL ON FLOOR"                                                                                
#> [10] "79YOM FELL TRANSFERING FROM TOILET TO WHEELCHAIR HIT HEAD DX BLUNT HEADTRAUMA"                                                    
#> [11] "62 YOM TRANSFERRING FROM POTTY CHAIR TO WHEELCHAIR AND FELL. C/O KNEE PAIN, DX KNEE PAIN"                                         
#> [12] "75 Y ROLD MALE TRANSFERRING FROM WHEELCHAIR TO TOILET AND FELL AND FX C7"                                                         
#> [13] "45YOM CONT NOSE, FELL OFF TOILET"                                                                                                 
#> [14] "83YOF H'TMA MID-BACK- FELL TOILET AT NH"                                                                                          
#> [15] "19-YOF DRINKING HEAVILY, PASSED OUT, HIT HEAD ON TOILET.  DX:  HANGOVER, CLOSED HEAD INJURY."                                     
#> [16] "79YOM FELL BETWEEN THE TOILET AND THE WALL AND FRACTURED NECK"                                                                    
#> [17] "85 YF TRIPPED AND FELL STRIKING HEAD ON TOILET. DX SDH"                                                                           
#> [18] "61YOM HAD A SYNCOPAL EPISODE GETTING OFF TOILET, FELL FORWARD STRIKINGFACE, . DX - BLUNT HEAD TRAUMA"                             
#> [19] "72YOF WAS INTOXICATED BAC OF 219 AND FELL OFF A TOILET AND SUSTAINED ATHUMB LACERATION"                                           
#> [20] "88YF WAS HAVING COUGING FIT, FELL OFF TOILET HITTING HEAD INTO THE SHOWER,-LOC>>CHI/LAC"

Having done this exploration for one product, it would be very nice if we could easily do it for other products, without having to retype the code. So let’s make a Shiny app!

4.4 Prototype

Note

When building a complex app, I strongly recommend starting as simple as possible, so that you can confirm the basic mechanics work before you start doing something more complicated.

Here I’ll start with one input (the product code), three tables, and one plot. When designing a first prototype, the challenge is in making it “as simple as possible”. There’s a tension between getting the basics working quickly and planning for the future of the app. Either extreme can be bad:

  • if you design too narrowly, you’ll spend a lot of time later on reworking your app;
  • if you design too rigorously, you’ll spend a bunch of time writing code that later ends up on the cutting floor.

To help get the balance right, I often do a few pencil-and-paper sketches to rapidly explore the UI and reactive graph before committing to code.

4.5 Polish tables

Now that we have the basic components in place and working, we can progressively improve our app. The first problem with this app is that it shows a lot of information in the tables, where we probably just want the highlights. To fix this we need to first figure out how to truncate the tables. I’ve chosen to do that with a combination of {forcats} functions: I convert the variable to a factor, order by the frequency of the levels, and then lump together all levels after the top 5.

R Code 4.13 : Polish injuries table

Code
(
  injuries |> 
    dplyr::mutate(
      diag = forcats::fct_lump(
        forcats::fct_infreq(diag), 
        n = 5))  |> 
    dplyr::group_by(diag)  |> 
    dplyr::summarise(n = base::as.integer(base::sum(weight)))
)
#> # A tibble: 6 × 2
#>   diag                        n
#>   <fct>                   <int>
#> 1 Other Or Not Stated   1806436
#> 2 Fracture              1558961
#> 3 Laceration            1432407
#> 4 Strain, Sprain        1432556
#> 5 Contusion Or Abrasion 1451987
#> 6 Other                 1929147

I wrote a little function to automate this for any variable. The details aren’t really important here, but we’ll come back to them in (XXX_12?).

R Code 4.14 : Function for polish table

Code
count_top <- function(df, var, n = 5) {
  df |> 
    dplyr::mutate(
      {{ var }} := forcats::fct_lump(
        forcats::fct_infreq({{ var }}), 
        n = n)
      )  |> 
    dplyr::group_by({{ var }})  |> 
    dplyr::summarise(n = base::as.integer(base::sum(weight)))
}

I made one other change to improve the aesthetics of the app: I forced all tables to take up the maximum width (i.e. fill the column that they appear in). This makes the output more aesthetically pleasing because it reduces the amount of incidental variation.

R Code 4.15 : Polished ER shiny app

4.6 Rate vs count

So far, we’re displaying only a single plot, but we’d like to give the user the choice between visualizing the number of injuries or the population-standardized rate. First I add a control to the UI. Here I’ve chosen to use a shiny::selectInput() because it makes both states explicit, and it would be easy to add new states in the future:

R Code 4.16 : ER shiny app with rate and count

#| '!! shinylive warning !!': |
#|   shinylive does not work in self-contained HTML documents.
#|   Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 720
#| components: [editor, viewer]


library(shiny)
library(tidyverse)
library(munsell)

##>> get data and functions
basic_url = "https://raw.githubusercontent.com/petzi53/learning-shiny/refs/heads/master/"
base::source(base::paste0(basic_url, "R/neiss.R"))
##>>


library(shiny)
library(tidyverse)
library(munsell)

ui <- fluidPage(
  fluidRow(
    column(8,
      selectInput("code", "Product",
        choices = setNames(products$prod_code, products$title),
        width = "100%"
      )
    ),
    column(2, selectInput("y", "Y axis", c("rate", "count")))
  ),
  fluidRow(
    column(4, tableOutput("diag")), 
    column(4, tableOutput("location")), 
    column(4, tableOutput("body_part"))
  ),
  fluidRow(
    column(12, plotOutput("age_sex"))
  )
)

server <- function(input, output, session) {
  selected <- reactive(injuries %>% filter(prod_code == input$code))
  
  ##<< tables
  output$diag <- renderTable(count_top(selected(), diag), width = "100%")
  output$body_part <- renderTable(count_top(selected(), body_part), width = "100%")
  output$location <- renderTable(count_top(selected(), location), width = "100%")
  ##>>
  
  summary <- reactive({
    selected() %>%
      count(age, sex, wt = weight) %>%
      left_join(population, by = c("age", "sex")) %>%
      mutate(rate = n / population * 1e4)
  })

    output$age_sex <- renderPlot({
    if (input$y == "count") {
      summary() %>%
        ggplot(aes(age, n, colour = sex)) +
        geom_line() +
        labs(y = "Estimated number of injuries")
    } else {
      summary() %>%
        ggplot(aes(age, rate, colour = sex)) +
        geom_line(na.rm = TRUE) +
        labs(y = "Injuries per 10,000 people")
    }
  }, res = 96)
}


shinyApp(ui, server)

4.7 Narrative

Finally, I want to provide some way to access the narratives because they are so interesting, and they give an informal way to cross-check the hypotheses you come up with when looking at the plots. In the R code, I sample multiple narratives at once, but there’s no reason to do that in an app where you can explore interactively.

There are two parts to the solution.

  • First we add a new row to the bottom of the UI. I use an action button to trigger a new story, and put the narrative in a shiny::textOutput():
  • I then use shiny::eventReactive() to create a reactive that only updates when the button is clicked or the underlying data changes.

Code Collection 4.4 : Code snippets for narratives of ER injuries

R Code 4.17 : Code Snippet: Create button in the UI

Code
  fluidRow(
    column(2, actionButton("story", "Tell me a story")),
    column(10, textOutput("narrative"))
  )

R Code 4.18 : Code snippet: Update whenever button pressed or data changed

Code
  narrative_sample <- eventReactive(
    list(input$story, selected()),
    selected() %>% pull(narrative) %>% sample(1)
  )
  output$narrative <- renderText(narrative_sample())

R Code 4.19 : ER injuries shiny app finished version

#| '!! shinylive warning !!': |
#|   shinylive does not work in self-contained HTML documents.
#|   Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 780
#| components: [editor, viewer]


library(shiny)
library(tidyverse)
library(munsell)

##>> get data and functions
basic_url = "https://raw.githubusercontent.com/petzi53/learning-shiny/refs/heads/master/"
base::source(base::paste0(basic_url, "R/neiss.R"))
##>>


library(shiny)
library(tidyverse)
library(munsell)

ui <- fluidPage(
  fluidRow(
    column(8,
      selectInput("code", "Product",
        choices = setNames(products$prod_code, products$title),
        width = "100%"
      )
    ),
    column(2, selectInput("y", "Y axis", c("rate", "count")))
  ),
  fluidRow(
    column(4, tableOutput("diag")), 
    column(4, tableOutput("location")), 
    column(4, tableOutput("body_part"))
  ),
  fluidRow(
    column(12, plotOutput("age_sex"))
  ),
  fluidRow(
    column(2, actionButton("story", "Tell me a story")),
    column(10, textOutput("narrative"))
  )
)

server <- function(input, output, session) {
  selected <- reactive(injuries %>% filter(prod_code == input$code))
  
    ##<< tables
    output$diag <- renderTable(count_top(selected(), diag), width = "100%")
    output$body_part <- renderTable(count_top(selected(), body_part), width = "100%")
    output$location <- renderTable(count_top(selected(), location), width = "100%")
    ##>>
  
    summary <- reactive({
      selected() %>%
        count(age, sex, wt = weight) %>%
        left_join(population, by = c("age", "sex")) %>%
        mutate(rate = n / population * 1e4)
    })

    output$age_sex <- renderPlot({
    if (input$y == "count") {
      summary() %>%
        ggplot(aes(age, n, colour = sex)) +
        geom_line() +
        labs(y = "Estimated number of injuries")
    } else {
      summary() %>%
        ggplot(aes(age, rate, colour = sex)) +
        geom_line(na.rm = TRUE) +
        labs(y = "Injuries per 10,000 people")
    }
  }, res = 96)
    
    narrative_sample <- eventReactive(
      list(input$story, selected()),
      selected() %>% pull(narrative) %>% sample(1)
  )
    output$narrative <- renderText(narrative_sample())
}


shinyApp(ui, server)

4.8 Exercises