Each cell represents a count of individuals for a given combination of plot
and Sex
.
The variable sex
is implicit. It’s values are spread along two columns F
and M
.
plot | F | M |
---|---|---|
1 | 11 | 12 |
2 | 21 | 22 |
3 | 31 | 32 |
long <- wide %>% gather(key = "sex", value = "n", F:M) long ## # A tibble: 6 x 3 ## plot sex n ## <int> <chr> <int> ## 1 1 F 11 ## 2 2 F 21 ## 3 3 F 31 ## 4 1 M 12 ## 5 2 M 22 ## 6 3 M 32
Take long
, then group by sex
, then sum the n
of individuals.
long %>% group_by(sex) %>% summarize(n = sum(n)) ## # A tibble: 2 x 2 ## sex n ## <chr> <int> ## 1 F 63 ## 2 M 66
Automatically with googlesheets: File > Version history
Manually: Write cleaning steps in plain text (e.g. README.txt)
(Different kinds of data may be in different, related tables.)
Time-series for a small mammal community in southern Arizona to study the effects of rodents and ants on the plant community
Rodents sampled on 24 plots, with different treatments.
Real dataset (used in over 100 publications).
Slightly simplified for the workshop.
Demo google forms
See lesson.
Broman KW, Woo KH. (2018) Data organization in spreadsheets. PeerJ Preprints 6:e3183v2 https://doi.org/10.7287/peerj.preprints.3183v2
Dates stored as integer: Number of days from 1899-12-31.
WARNING: Can’t parse dates before 1899-12-31
Date:
YYYY-MM-DD
YEAR
, MONTH
, DAY
YYYYMMDD
.Generalizing to date-time: YYYYMMDDhhmmss
Example in R:
library(lubridate) as_datetime("20180923145108") ## [1] "2018-09-23 14:51:08 UTC"
Format dates for any program to interpret them correctly.
From be_pragmatic()
Cell range and Criteria may come from different sheets.
You can control what type of data gets entered.
You can explore invalid data with sorting.
You can flag invalid data with conditional formatting.
In googlesheets:
File > Download as > Comma-separated values (.csv, current sheet)
When whould you export Tab-separated values (.tsv)?
Export data to plain text formats (.csv, .tsv).