3  Attribute data operations

3.1 Introduction

Attribute data is non-spatial information associated with geographic (geometry) data. This chapter teaches how to manipulate geographic objects based on attributes such as the names of bus stops in a vector dataset and elevations of pixels in a raster dataset.

3.2 Vector attribute manipulation

Geographic vector datasets are well supported in R thanks to the {sf} class, which extends base R’s data.frame. Like data frames, sf objects have one column per attribute variable (such as ‘name’) and one row per observation or feature (e.g., per bus station). {sf} provides generics that allow sf objects to behave like regular data frames.

R Code 3.1 : Methods for sf objects

Code
base::library(sf) |> base::suppressPackageStartupMessages()
utils::methods(class = "sf")
#>  [1] [                            [[<-                        
#>  [3] [<-                          $<-                         
#>  [5] aggregate                    as.data.frame               
#>  [7] cbind                        coerce                      
#>  [9] dbDataType                   dbWriteTable                
#> [11] duplicated                   identify                    
#> [13] initialize                   merge                       
#> [15] plot                         points                      
#> [17] print                        rbind                       
#> [19] show                         slotsFromS3                 
#> [21] st_agr                       st_agr<-                    
#> [23] st_area                      st_as_s2                    
#> [25] st_as_sf                     st_as_sfc                   
#> [27] st_bbox                      st_boundary                 
#> [29] st_break_antimeridian        st_buffer                   
#> [31] st_cast                      st_centroid                 
#> [33] st_collection_extract        st_concave_hull             
#> [35] st_convex_hull               st_coordinates              
#> [37] st_crop                      st_crs                      
#> [39] st_crs<-                     st_difference               
#> [41] st_drop_geometry             st_exterior_ring            
#> [43] st_filter                    st_geometry                 
#> [45] st_geometry<-                st_inscribed_circle         
#> [47] st_interpolate_aw            st_intersection             
#> [49] st_intersects                st_is_full                  
#> [51] st_is_valid                  st_is                       
#> [53] st_join                      st_line_merge               
#> [55] st_m_range                   st_make_valid               
#> [57] st_minimum_rotated_rectangle st_nearest_points           
#> [59] st_node                      st_normalize                
#> [61] st_point_on_surface          st_polygonize               
#> [63] st_precision                 st_reverse                  
#> [65] st_sample                    st_segmentize               
#> [67] st_set_precision             st_shift_longitude          
#> [69] st_simplify                  st_snap                     
#> [71] st_sym_difference            st_transform                
#> [73] st_triangulate_constrained   st_triangulate              
#> [75] st_union                     st_voronoi                  
#> [77] st_wrap_dateline             st_write                    
#> [79] st_z_range                   st_zm                       
#> [81] text                         transform                   
#> see '?methods' for accessing help and source code
Code
base::detach("package:sf", unload = TRUE)

Many of the generic methods (those not starting with st_) like aggregate(), cbind(), merge(), rbind() and ] are for manipulating data frames.

  • rbind(), for example, binds rows of data frames together, one ‘on top’ of the other.
  • $<- creates new columns.

A key feature of sf objects is that they store spatial and non-spatial data in the same way, as columns in a data.frame.

sf objects can also extend the {tidyverse} classes for data frames, tbl_df and tbl. Thus {sf} enables the full power of R’s data analysis capabilities to be unleashed on geographic data, whether you use base R or tidyverse functions for data analysis.

Code Collection 3.1 : Basic properties of vector data objects

R Code 3.2 : Basic properties of a sf object with geometry column

Code
utils::data("world", package = "spData")
base::class(world)
#> [1] "sf"         "tbl_df"     "tbl"        "data.frame"
Code
base::dim(world)
#> [1] 177  11
Code
skimr::skim(world)
#> Warning: Couldn't find skimmers for class: sfc_MULTIPOLYGON, sfc; No
#> user-defined `sfl` provided. Falling back to `character`.
Data summary
Name world
Number of rows 177
Number of columns 11
_______________________
Column type frequency:
character 7
numeric 4
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
iso_a2 2 0.99 2 2 0 175 0
name_long 0 1.00 4 35 0 177 0
continent 0 1.00 4 23 0 8 0
region_un 0 1.00 4 23 0 7 0
subregion 0 1.00 9 25 0 22 0
type 0 1.00 7 17 0 5 0
geom 0 1.00 135 24760 0 177 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
area_km2 0 1.00 832558.33 2163425.48 2416.87 46185.25 185004.13 621860.35 1.701851e+07 ▇▁▁▁▁
pop 10 0.94 42815798.06 149413217.06 56295.00 3754725.00 10401062.00 30748039.00 1.364270e+09 ▇▁▁▁▁
lifeExp 10 0.94 70.85 8.21 50.62 64.96 72.87 76.78 8.359000e+01 ▂▃▅▇▅
gdpPercap 17 0.90 17105.99 18668.07 597.14 3752.37 10734.07 24232.74 1.208601e+05 ▇▂▁▁▁

R Code 3.3 : Properties of a sf object withput geometry column

Code
world_df = sf::st_drop_geometry(world)
base::class(world_df)
#> [1] "tbl_df"     "tbl"        "data.frame"
Code
base::dim(world_df)
#> [1] 177  10
Code
dplyr::glimpse(world_df)
#> Rows: 177
#> Columns: 10
#> $ iso_a2    <chr> "FJ", "TZ", "EH", "CA", "US", "KZ", "UZ", "PG", "ID", "AR", …
#> $ name_long <chr> "Fiji", "Tanzania", "Western Sahara", "Canada", "United Stat…
#> $ continent <chr> "Oceania", "Africa", "Africa", "North America", "North Ameri…
#> $ region_un <chr> "Oceania", "Africa", "Africa", "Americas", "Americas", "Asia…
#> $ subregion <chr> "Melanesia", "Eastern Africa", "Northern Africa", "Northern …
#> $ type      <chr> "Sovereign country", "Sovereign country", "Indeterminate", "…
#> $ area_km2  <dbl> 19289.97, 932745.79, 96270.60, 10036042.98, 9510743.74, 2729…
#> $ pop       <dbl> 885806, 52234869, NA, 35535348, 318622525, 17288285, 3075770…
#> $ lifeExp   <dbl> 69.96000, 64.16300, NA, 81.95305, 78.84146, 71.62000, 71.039…
#> $ gdpPercap <dbl> 8222.2538, 2402.0994, NA, 43079.1425, 51921.9846, 23587.3375…
Code
skimr::skim(world_df)
Data summary
Name world_df
Number of rows 177
Number of columns 10
_______________________
Column type frequency:
character 6
numeric 4
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
iso_a2 2 0.99 2 2 0 175 0
name_long 0 1.00 4 35 0 177 0
continent 0 1.00 4 23 0 8 0
region_un 0 1.00 4 23 0 7 0
subregion 0 1.00 9 25 0 22 0
type 0 1.00 7 17 0 5 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
area_km2 0 1.00 832558.33 2163425.48 2416.87 46185.25 185004.13 621860.35 1.701851e+07 ▇▁▁▁▁
pop 10 0.94 42815798.06 149413217.06 56295.00 3754725.00 10401062.00 30748039.00 1.364270e+09 ▇▁▁▁▁
lifeExp 10 0.94 70.85 8.21 50.62 64.96 72.87 76.78 8.359000e+01 ▂▃▅▇▅
gdpPercap 17 0.90 17105.99 18668.07 597.14 3752.37 10734.07 24232.74 1.208601e+05 ▇▂▁▁▁

world contains ten non-geographic columns (and one geometry list column) with 177 rows representing the world’s countries. The function sf::st_drop_geometry() keeps only the attributes data of an sf object, in other words removing its geometry and convert it to a “normal” data.frame.

You can see that skimr::skim() reproduces a warning just in the first case. dplyr::glimpse() would produce an error, but both listings work in the second case without problems.

Note 3.1: Different names of the geometry column

The geometry column of sf objects is typically called “geometry” or “geom”, but any name can be used. The following command, for example, creates a geometry column named g:

sf::st_sf(base::data.frame(n = world$name_long), g = world$geom)

This enables geometries imported from spatial databases to have a variety of names such as “wkb_geometry” and “the_geom”.

Dropping the geometry column before working with attribute data can be useful; data manipulation processes can run faster when they work only on the attribute data and geometry columns are not always needed. For most cases, however, it makes sense to keep the geometry column, explaining why the column is ‘sticky’ (it remains after most attribute operations unless specifically dropped). Non-spatial data operations on sf objects only change an object’s geometry when appropriate (e.g., by dissolving borders between adjacent polygons following aggregation).

Becoming skilled at geographic attribute data manipulation means becoming skilled at manipulating data frames.

Remark 3.1. : {sf} has tidyverse compatibility with some pitfalls

For many applications, the tidyverse package {dplyr} (see Package Profile Section A.1) offers an effective approach for working with data frames. Tidyverse compatibility is an advantage of {sf} over its predecessor {sp}, but there are some pitfalls to avoid.

I have already experience with {dplyr} data wrangling. If there is no difference using sf objects or "tbl_df resp. tbl objects, I may not include these passages into this notebook.

3.2.1 Vector attribute subsetting (empty)

skipped

3.2.2 Chaining commands with pipes (empty)

skipped

3.2.3 Vector attribute aggregation (empty)

skipped

3.2.4 Vector attribute joining (empty)

skipped

3.2.5 Creating attributes and removing spatial information (empty)

skipped

3.3 Manipulating raster objects

This section shows how raster objects work by creating them from scratch, building on Section Section 2.3.2. Because of their unique structure, subsetting and other operations on raster datasets work in a different way.

Code Collection 3.2 : Creating raster objects from scratch

R Code 3.4 : Creating raster object with numerical data

Code
elev = terra::rast(nrows = 6, ncols = 6,
            xmin = -1.5, xmax = 1.5, ymin = -1.5, ymax = 1.5,
            vals = 1:36)
elev
#> class       : SpatRaster 
#> dimensions  : 6, 6, 1  (nrow, ncol, nlyr)
#> resolution  : 0.5, 0.5  (x, y)
#> extent      : -1.5, 1.5, -1.5, 1.5  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 
#> source(s)   : memory
#> name        : lyr.1 
#> min value   :     1 
#> max value   :    36

The result is a raster object with 6 rows and 6 columns (specified by the nrow and ncol arguments), and a minimum and maximum spatial extent in x and y direction (xmin, xmax, ymin, ymax). The vals argument sets the values that each cell contains: numeric data ranging from 1 to 36 in this case.

R Code 3.5 : Creating raster object with categorical data

Code
grain_order = base::c("clay", "silt", "sand")
grain_char = base::sample(grain_order, 36, replace = TRUE)
grain_fact = base::factor(grain_char, levels = grain_order)
grain = terra::rast(nrows = 6, ncols = 6, 
             xmin = -1.5, xmax = 1.5, ymin = -1.5, ymax = 1.5,
             vals = grain_fact)

grain
#> class       : SpatRaster 
#> dimensions  : 6, 6, 1  (nrow, ncol, nlyr)
#> resolution  : 0.5, 0.5  (x, y)
#> extent      : -1.5, 1.5, -1.5, 1.5  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 
#> source(s)   : memory
#> categories  : label 
#> name        : lyr.1 
#> min value   :  clay 
#> max value   :  sand
Note 3.2

elev and grain are the same raster objects as stored in {spData} as elev.tif respectively as grain.tif.

This is helpful as I do not need to save or wrap/unwrap these objects for use in later chunk. Just loading via <name> = terra::rast(base::system.file("raster/<name>.tif", package = "spData")) is enough.

The raster object stores the corresponding look-up table or “Raster Attribute Table” (RAT) as a list of data frames, which can be viewed with terra::cats(grain). Each element of this list is a layer of the raster. It is also possible to use the function terra::levels() for retrieving and adding new or replacing existing factor levels.

R Code 3.6 : Raster Attribute Table (RAT)

Code
grain2 = terra::rast(base::system.file("raster/grain.tif", package = "spData"))
levels(grain2) = data.frame(value = c(0, 1, 2), wetness = c("wet", "moist", "dry"))
terra::cats(grain2)
terra::levels(grain2)
terra::identical(terra::cats(grain2), terra::levels(grain2))
#> [[1]]
#>   value wetness
#> 1     0     wet
#> 2     1   moist
#> 3     2     dry
#> 
#> [[1]]
#>   value wetness
#> 1     0     wet
#> 2     1   moist
#> 3     2     dry
#> 
#> [1] TRUE

Categorical raster objects can also store information about the colors associated with each value using a color table. The color table is a data frame with three (red, green, blue) or four (alpha) columns, where each row relates to one value. Color tables in {terra} can be viewed or set with the terra::coltab() function. Importantly, saving a raster object with a color table to a file (e.g., GeoTIFF) will also save the color information.

The following figure simulates book’s Figure 3.2.. There are some differences such as that the legend is outside the graphic and in a different format.

R Code 3.7 : Plot numerical and categorical raster object

Code
elev = terra::rast(base::system.file("raster/elev.tif", package = "spData"))

grain <- terra::rast(base::system.file("raster/grain.tif", package = "spData"))
grain2 <-  grain 
levels(grain2) <- data.frame(value = c(0, 1, 2), wetness = c("clay", "silt", "sand"))
col_df <- data.frame(value = 0:2, col = c(clay = "#a52a2a", silt = "#f4a460", sand = "#bc8f8f"))
terra::coltab(grain2) <- col_df

par(mfrow = c(1, 2))
terra::plot(elev, col = terra::map.pal("blues"))
terra::plot(grain2)
Figure 3.1: Raster datasets with numeric (left) and categorical values (right).

3.3.1 Raster subsetting

Remark 3.2. : Numbered Remark Title

As learning about raster data (und therefore {terra}) is not my main focuse I will not go into details in this section. See also Remark 2.2.

Raster subsetting in the book is done with the base R operator [, which accepts a variety of inputs.

Resource 3.1 : New package {tidyterra}

There is the new package {tidyterra} which provide common methods of the {tidyverse} packages for objects created with the {terra} package: SpatRaster and SpatVector. It also provides geoms for plotting these objects with {ggplot2}.

From the four possible inputs for raster data, that is

  • row-column indexing,
  • cell IDs,
  • coordinates, and
  • another spatial object

only the first two options can be considered non-spatial operations. (For subestting spatial operations see (XXXSection4-3-1?).)

Code Collection 3.3 : Raster single layer subsetting with base R and {tidyterra}

R Code 3.8 : Raster subsetting with base R operator [

Code
elev = terra::rast(base::system.file("raster/elev.tif", package = "spData"))

elev[1, 1] # row 1, column 1
elev[1] # cell ID 1
elev[2, 1] # row 2, column 1
#>   elev
#> 1    1
#>   elev
#> 1    1
#>   elev
#> 1    7

R Code 3.9 : Raster subsetting with {tidyterra}

Code
elev = terra::rast(base::system.file("raster/elev.tif", package = "spData"))

elev |> 
    tidyterra::slice_rows(2) |> 
    tidyterra::slice_cols(1)
#> class       : SpatRaster 
#> dimensions  : 1, 1, 1  (nrow, ncol, nlyr)
#> resolution  : 0.5, 0.5  (x, y)
#> extent      : -1.5, -1, 0.5, 1  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 (EPSG:4326) 
#> source(s)   : memory
#> varname     : elev 
#> name        : elev 
#> min value   :    7 
#> max value   :    7
Code
elev |> 
    tidyterra::slice_colrows(
        rows = 2,
        cols = 1
    )
#> class       : SpatRaster 
#> dimensions  : 1, 1, 1  (nrow, ncol, nlyr)
#> resolution  : 0.5, 0.5  (x, y)
#> extent      : -1.5, -1, 0.5, 1  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 (EPSG:4326) 
#> source(s)   : memory
#> varname     : elev 
#> name        : elev 
#> min value   :    7 
#> max value   :    7

{tidyterra} applies many {dplyr} commands to {terra}. The above two subsettng commands are specialized functios adapted to the raster format that are provided with {tidyterra} in addition to the standard {dplyr} functions: slice(), slice_head(), slice_tail(), slice_min(), slice_max(), and slice_sample().

Subsetting of multi-layered raster objects will return the cell value(s) for each layer. For example, two_layers = c(grain, elev); two_layers[1] returns a data frame with one row and two columns — one for each layer. To extract all values, you can also use values().

Code Collection 3.4 : Subsetting of multi-layered raster objects

R Code 3.10 : Subsetting of multi-layered raster objects

Code
elev = terra::rast(base::system.file("raster/elev.tif", package = "spData"))
grain = terra::rast(base::system.file("raster/grain.tif", package = "spData"))

two_layers <- c(grain, elev) 
two_layers[1]
#>   grain elev
#> 1  silt    1
Code
head(terra::values(two_layers)) # show first six values of all layers
#>      grain elev
#> [1,]     1    1
#> [2,]     0    2
#> [3,]     1    3
#> [4,]     2    4
#> [5,]     2    5
#> [6,]     2    6
Code
two_layers |> 
    tidyterra::slice_colrows(
        rows = 1,
        cols = 1
    )
#> class       : SpatRaster 
#> dimensions  : 1, 1, 2  (nrow, ncol, nlyr)
#> resolution  : 0.5, 0.5  (x, y)
#> extent      : -1.5, -1, 1, 1.5  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 (EPSG:4326) 
#> source(s)   : memory
#> varname     : grain 
#> names       : grain, elev 
#> min values  :  silt,    1 
#> max values  :  silt,    1
Code
two_layers |> 
    tidyterra::slice_head()
#> class       : SpatRaster 
#> dimensions  : 1, 1, 2  (nrow, ncol, nlyr)
#> resolution  : 0.5, 0.5  (x, y)
#> extent      : -1.5, -1, 1, 1.5  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 (EPSG:4326) 
#> source(s)   : memory
#> varname     : grain 
#> names       : grain, elev 
#> min values  :  silt,    1 
#> max values  :  silt,    1

R Code 3.11 : Numbered R Code Title (Tidyverse)

Code
elev = terra::rast(base::system.file("raster/elev.tif", package = "spData"))
grain = terra::rast(base::system.file("raster/grain.tif", package = "spData"))

two_layers <- c(grain, elev) |>