3 Attribute data operations

3.1 Introduction

Attribute data is non-spatial information associated with geographic (geometry) data. This chapter teaches how to manipulate geographic objects based on attributes such as the names of bus stops in a vector dataset and elevations of pixels in a raster dataset.

3.2 Vector attribute manipulation

Geographic vector datasets are well supported in R thanks to the {sf} class, which extends base R’s data.frame. Like data frames, sf objects have one column per attribute variable (such as ‘name’) and one row per observation or feature (e.g., per bus station). {sf} provides generics that allow sf objects to behave like regular data frames.

R Code 3.1 : Methods for sf objects

Code

base::library(sf) |> base::suppressPackageStartupMessages()
utils::methods(class = "sf")

#>  [1] [                            [[<-                        
#>  [3] [<-                          $<-                         
#>  [5] aggregate                    as.data.frame               
#>  [7] cbind                        coerce                      
#>  [9] dbDataType                   dbWriteTable                
#> [11] duplicated                   identify                    
#> [13] initialize                   merge                       
#> [15] plot                         points                      
#> [17] print                        rbind                       
#> [19] show                         slotsFromS3                 
#> [21] st_agr                       st_agr<-                    
#> [23] st_area                      st_as_s2                    
#> [25] st_as_sf                     st_as_sfc                   
#> [27] st_bbox                      st_boundary                 
#> [29] st_break_antimeridian        st_buffer                   
#> [31] st_cast                      st_centroid                 
#> [33] st_collection_extract        st_concave_hull             
#> [35] st_convex_hull               st_coordinates              
#> [37] st_crop                      st_crs                      
#> [39] st_crs<-                     st_difference               
#> [41] st_drop_geometry             st_exterior_ring            
#> [43] st_filter                    st_geometry                 
#> [45] st_geometry<-                st_inscribed_circle         
#> [47] st_interpolate_aw            st_intersection             
#> [49] st_intersects                st_is_full                  
#> [51] st_is_valid                  st_is                       
#> [53] st_join                      st_line_merge               
#> [55] st_m_range                   st_make_valid               
#> [57] st_minimum_rotated_rectangle st_nearest_points           
#> [59] st_node                      st_normalize                
#> [61] st_point_on_surface          st_polygonize               
#> [63] st_precision                 st_reverse                  
#> [65] st_sample                    st_segmentize               
#> [67] st_set_precision             st_shift_longitude          
#> [69] st_simplify                  st_snap                     
#> [71] st_sym_difference            st_transform                
#> [73] st_triangulate_constrained   st_triangulate              
#> [75] st_union                     st_voronoi                  
#> [77] st_wrap_dateline             st_write                    
#> [79] st_z_range                   st_zm                       
#> [81] text                         transform                   
#> see '?methods' for accessing help and source code

Code

base::detach("package:sf", unload = TRUE)

Many of the generic methods (those not starting with st_) like aggregate(), cbind(), merge(), rbind() and ] are for manipulating data frames.

rbind(), for example, binds rows of data frames together, one ‘on top’ of the other.
$<- creates new columns.

A key feature of sf objects is that they store spatial and non-spatial data in the same way, as columns in a data.frame.

sf objects can also extend the {tidyverse} classes for data frames, tbl_df and tbl. Thus {sf} enables the full power of R’s data analysis capabilities to be unleashed on geographic data, whether you use base R or tidyverse functions for data analysis.

Code Collection 3.1 : Basic properties of vector data objects

With geometry
Without geometry

R Code 3.2 : Basic properties of a sf object with geometry column

Code

utils::data("world", package = "spData")
base::class(world)

#> [1] "sf"         "tbl_df"     "tbl"        "data.frame"

Code

base::dim(world)

#> [1] 177  11

Code

skimr::skim(world)

#> Warning: Couldn't find skimmers for class: sfc_MULTIPOLYGON, sfc; No
#> user-defined `sfl` provided. Falling back to `character`.

Data summary
Name	world
Number of rows	177
Number of columns	11
_______________________
Column type frequency:
character	7
numeric	4
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
iso_a2	2	0.99	2	2	175
name_long	0	1.00	4	35	177
continent	0	1.00	4	23	8
region_un	0	1.00	4	23	7
subregion	0	1.00	9	25	22
type	0	1.00	7	17	5
geom	0	1.00	135	24760	177

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
area_km2	0	1.00	832558.33	2163425.48	2416.87	46185.25	185004.13	621860.35	1.701851e+07	▇▁▁▁▁
pop	10	0.94	42815798.06	149413217.06	56295.00	3754725.00	10401062.00	30748039.00	1.364270e+09	▇▁▁▁▁
lifeExp	10	0.94	70.85	8.21	50.62	64.96	72.87	76.78	8.359000e+01	▂▃▅▇▅
gdpPercap	17	0.90	17105.99	18668.07	597.14	3752.37	10734.07	24232.74	1.208601e+05	▇▂▁▁▁

R Code 3.3 : Properties of a sf object withput geometry column

Code

world_df = sf::st_drop_geometry(world)
base::class(world_df)

#> [1] "tbl_df"     "tbl"        "data.frame"

Code

base::dim(world_df)

#> [1] 177  10

Code

dplyr::glimpse(world_df)

#> Rows: 177
#> Columns: 10
#> $ iso_a2    <chr> "FJ", "TZ", "EH", "CA", "US", "KZ", "UZ", "PG", "ID", "AR", …
#> $ name_long <chr> "Fiji", "Tanzania", "Western Sahara", "Canada", "United Stat…
#> $ continent <chr> "Oceania", "Africa", "Africa", "North America", "North Ameri…
#> $ region_un <chr> "Oceania", "Africa", "Africa", "Americas", "Americas", "Asia…
#> $ subregion <chr> "Melanesia", "Eastern Africa", "Northern Africa", "Northern …
#> $ type      <chr> "Sovereign country", "Sovereign country", "Indeterminate", "…
#> $ area_km2  <dbl> 19289.97, 932745.79, 96270.60, 10036042.98, 9510743.74, 2729…
#> $ pop       <dbl> 885806, 52234869, NA, 35535348, 318622525, 17288285, 3075770…
#> $ lifeExp   <dbl> 69.96000, 64.16300, NA, 81.95305, 78.84146, 71.62000, 71.039…
#> $ gdpPercap <dbl> 8222.2538, 2402.0994, NA, 43079.1425, 51921.9846, 23587.3375…

Code

skimr::skim(world_df)

Data summary
Name	world_df
Number of rows	177
Number of columns	10
_______________________
Column type frequency:
character	6
numeric	4
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
iso_a2	2	0.99	2	2	175
name_long	0	1.00	4	35	177
continent	0	1.00	4	23	8
region_un	0	1.00	4	23	7
subregion	0	1.00	9	25	22
type	0	1.00	7	17	5

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
area_km2	0	1.00	832558.33	2163425.48	2416.87	46185.25	185004.13	621860.35	1.701851e+07	▇▁▁▁▁
pop	10	0.94	42815798.06	149413217.06	56295.00	3754725.00	10401062.00	30748039.00	1.364270e+09	▇▁▁▁▁
lifeExp	10	0.94	70.85	8.21	50.62	64.96	72.87	76.78	8.359000e+01	▂▃▅▇▅
gdpPercap	17	0.90	17105.99	18668.07	597.14	3752.37	10734.07	24232.74	1.208601e+05	▇▂▁▁▁

world contains ten non-geographic columns (and one geometry list column) with 177 rows representing the world’s countries. The function sf::st_drop_geometry() keeps only the attributes data of an sf object, in other words removing its geometry and convert it to a “normal” data.frame.

You can see that skimr::skim() reproduces a warning just in the first case. dplyr::glimpse() would produce an error, but both listings work in the second case without problems.

Note 3.1: Different names of the geometry column

The geometry column of sf objects is typically called “geometry” or “geom”, but any name can be used. The following command, for example, creates a geometry column named g:

sf::st_sf(base::data.frame(n = world$name_long), g = world$geom)

This enables geometries imported from spatial databases to have a variety of names such as “wkb_geometry” and “the_geom”.

Dropping the geometry column before working with attribute data can be useful; data manipulation processes can run faster when they work only on the attribute data and geometry columns are not always needed. For most cases, however, it makes sense to keep the geometry column, explaining why the column is ‘sticky’ (it remains after most attribute operations unless specifically dropped). Non-spatial data operations on sf objects only change an object’s geometry when appropriate (e.g., by dissolving borders between adjacent polygons following aggregation).

Becoming skilled at geographic attribute data manipulation means becoming skilled at manipulating data frames.

Remark 3.1. : {sf} has tidyverse compatibility with some pitfalls

For many applications, the tidyverse package {dplyr} (see Package Profile Section A.1) offers an effective approach for working with data frames. Tidyverse compatibility is an advantage of {sf} over its predecessor {sp}, but there are some pitfalls to avoid.

I have already experience with {dplyr} data wrangling. If there is no difference using sf objects or "tbl_df resp. tbl objects, I may not include these passages into this notebook.

3.3 Manipulating raster objects

This section shows how raster objects work by creating them from scratch, building on Section Section 2.3.2. Because of their unique structure, subsetting and other operations on raster datasets work in a different way.

Code Collection 3.2 : Creating raster objects from scratch

numerical
categorical

R Code 3.4 : Creating raster object with numerical data

Code

elev = terra::rast(nrows = 6, ncols = 6,
            xmin = -1.5, xmax = 1.5, ymin = -1.5, ymax = 1.5,
            vals = 1:36)
elev

#> class       : SpatRaster 
#> dimensions  : 6, 6, 1  (nrow, ncol, nlyr)
#> resolution  : 0.5, 0.5  (x, y)
#> extent      : -1.5, 1.5, -1.5, 1.5  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 
#> source(s)   : memory
#> name        : lyr.1 
#> min value   :     1 
#> max value   :    36

The result is a raster object with 6 rows and 6 columns (specified by the nrow and ncol arguments), and a minimum and maximum spatial extent in x and y direction (xmin, xmax, ymin, ymax). The vals argument sets the values that each cell contains: numeric data ranging from 1 to 36 in this case.

R Code 3.5 : Creating raster object with categorical data

Code

grain_order = base::c("clay", "silt", "sand")
grain_char = base::sample(grain_order, 36, replace = TRUE)
grain_fact = base::factor(grain_char, levels = grain_order)
grain = terra::rast(nrows = 6, ncols = 6, 
             xmin = -1.5, xmax = 1.5, ymin = -1.5, ymax = 1.5,
             vals = grain_fact)

grain

#> class       : SpatRaster 
#> dimensions  : 6, 6, 1  (nrow, ncol, nlyr)
#> resolution  : 0.5, 0.5  (x, y)
#> extent      : -1.5, 1.5, -1.5, 1.5  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 
#> source(s)   : memory
#> categories  : label 
#> name        : lyr.1 
#> min value   :  clay 
#> max value   :  sand

Note 3.2

elev and grain are the same raster objects as stored in {spData} as elev.tif respectively as grain.tif.

This is helpful as I do not need to save or wrap/unwrap these objects for use in later chunk. Just loading via <name> = terra::rast(base::system.file("raster/<name>.tif", package = "spData")) is enough.

The raster object stores the corresponding look-up table or “Raster Attribute Table” (RAT) as a list of data frames, which can be viewed with terra::cats(grain). Each element of this list is a layer of the raster. It is also possible to use the function terra::levels() for retrieving and adding new or replacing existing factor levels.

R Code 3.6 : Raster Attribute Table (RAT)

Code

grain2 = terra::rast(base::system.file("raster/grain.tif", package = "spData"))
levels(grain2) = data.frame(value = c(0, 1, 2), wetness = c("wet", "moist", "dry"))
terra::cats(grain2)
terra::levels(grain2)
terra::identical(terra::cats(grain2), terra::levels(grain2))

#> [[1]]
#>   value wetness
#> 1     0     wet
#> 2     1   moist
#> 3     2     dry
#> 
#> [[1]]
#>   value wetness
#> 1     0     wet
#> 2     1   moist
#> 3     2     dry
#> 
#> [1] TRUE

Categorical raster objects can also store information about the colors associated with each value using a color table. The color table is a data frame with three (red, green, blue) or four (alpha) columns, where each row relates to one value. Color tables in {terra} can be viewed or set with the terra::coltab() function. Importantly, saving a raster object with a color table to a file (e.g., GeoTIFF) will also save the color information.

The following figure simulates book’s Figure 3.2.. There are some differences such as that the legend is outside the graphic and in a different format.

R Code 3.7 : Plot numerical and categorical raster object

Code

elev = terra::rast(base::system.file("raster/elev.tif", package = "spData"))

grain <- terra::rast(base::system.file("raster/grain.tif", package = "spData"))
grain2 <-  grain 
levels(grain2) <- data.frame(value = c(0, 1, 2), wetness = c("clay", "silt", "sand"))
col_df <- data.frame(value = 0:2, col = c(clay = "#a52a2a", silt = "#f4a460", sand = "#bc8f8f"))
terra::coltab(grain2) <- col_df

par(mfrow = c(1, 2))
terra::plot(elev, col = terra::map.pal("blues"))
terra::plot(grain2)

Figure 3.1: Raster datasets with numeric (left) and categorical values (right).

3.3.1 Raster subsetting

Remark 3.2. : Numbered Remark Title

As learning about raster data (und therefore {terra}) is not my main focuse I will not go into details in this section. See also Remark 2.2.

Raster subsetting in the book is done with the base R operator [, which accepts a variety of inputs.

Resource 3.1 : New package {tidyterra}

There is the new package {tidyterra} which provide common methods of the {tidyverse} packages for objects created with the {terra} package: SpatRaster and SpatVector. It also provides geoms for plotting these objects with {ggplot2}.

From the four possible inputs for raster data, that is

row-column indexing,
cell IDs,
coordinates, and
another spatial object

only the first two options can be considered non-spatial operations. (For subestting spatial operations see (XXXSection4-3-1?).)

Code Collection 3.3 : Raster single layer subsetting with base R and {tidyterra}

base R
{tidyterra}

R Code 3.8 : Raster subsetting with base R operator [

Code

elev = terra::rast(base::system.file("raster/elev.tif", package = "spData"))

elev[1, 1] # row 1, column 1
elev[1] # cell ID 1
elev[2, 1] # row 2, column 1

#>   elev
#> 1    1
#>   elev
#> 1    1
#>   elev
#> 1    7

R Code 3.9 : Raster subsetting with {tidyterra}

Code

elev = terra::rast(base::system.file("raster/elev.tif", package = "spData"))

elev |> 
    tidyterra::slice_rows(2) |> 
    tidyterra::slice_cols(1)

#> class       : SpatRaster 
#> dimensions  : 1, 1, 1  (nrow, ncol, nlyr)
#> resolution  : 0.5, 0.5  (x, y)
#> extent      : -1.5, -1, 0.5, 1  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 (EPSG:4326) 
#> source(s)   : memory
#> varname     : elev 
#> name        : elev 
#> min value   :    7 
#> max value   :    7

Code

elev |> 
    tidyterra::slice_colrows(
        rows = 2,
        cols = 1
    )

#> class       : SpatRaster 
#> dimensions  : 1, 1, 1  (nrow, ncol, nlyr)
#> resolution  : 0.5, 0.5  (x, y)
#> extent      : -1.5, -1, 0.5, 1  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 (EPSG:4326) 
#> source(s)   : memory
#> varname     : elev 
#> name        : elev 
#> min value   :    7 
#> max value   :    7

{tidyterra} applies many {dplyr} commands to {terra}. The above two subsettng commands are specialized functios adapted to the raster format that are provided with {tidyterra} in addition to the standard {dplyr} functions: slice(), slice_head(), slice_tail(), slice_min(), slice_max(), and slice_sample().

Subsetting of multi-layered raster objects will return the cell value(s) for each layer. For example, two_layers = c(grain, elev); two_layers[1] returns a data frame with one row and two columns — one for each layer. To extract all values, you can also use values().

Code Collection 3.4 : Subsetting of multi-layered raster objects

base R
header2

R Code 3.10 : Subsetting of multi-layered raster objects

Code

elev = terra::rast(base::system.file("raster/elev.tif", package = "spData"))
grain = terra::rast(base::system.file("raster/grain.tif", package = "spData"))

two_layers <- c(grain, elev) 
two_layers[1]

#>   grain elev
#> 1  silt    1

Code

head(terra::values(two_layers)) # show first six values of all layers

#>      grain elev
#> [1,]     1    1
#> [2,]     0    2
#> [3,]     1    3
#> [4,]     2    4
#> [5,]     2    5
#> [6,]     2    6

Code

two_layers |> 
    tidyterra::slice_colrows(
        rows = 1,
        cols = 1
    )

#> class       : SpatRaster 
#> dimensions  : 1, 1, 2  (nrow, ncol, nlyr)
#> resolution  : 0.5, 0.5  (x, y)
#> extent      : -1.5, -1, 1, 1.5  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 (EPSG:4326) 
#> source(s)   : memory
#> varname     : grain 
#> names       : grain, elev 
#> min values  :  silt,    1 
#> max values  :  silt,    1

Code

two_layers |> 
    tidyterra::slice_head()

#> class       : SpatRaster 
#> dimensions  : 1, 1, 2  (nrow, ncol, nlyr)
#> resolution  : 0.5, 0.5  (x, y)
#> extent      : -1.5, -1, 1, 1.5  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 (EPSG:4326) 
#> source(s)   : memory
#> varname     : grain 
#> names       : grain, elev 
#> min values  :  silt,    1 
#> max values  :  silt,    1

R Code 3.11 : Numbered R Code Title (Tidyverse)

Code

elev = terra::rast(base::system.file("raster/elev.tif", package = "spData"))
grain = terra::rast(base::system.file("raster/grain.tif", package = "spData"))

two_layers <- c(grain, elev) |>

3.1 Introduction

3.2 Vector attribute manipulation

3.2.1 Vector attribute subsetting (empty)

3.2.2 Chaining commands with pipes (empty)

3.2.3 Vector attribute aggregation (empty)

3.2.4 Vector attribute joining (empty)

3.2.5 Creating attributes and removing spatial information (empty)

3.3 Manipulating raster objects

3.3.1 Raster subsetting