The is the supporting information for: Atkinson et al. Monitoring for fisheries or for fish? Declines in monitoring of salmon spawners continue despite a conservation crisis.
The document contains (i) the supplementary figures and tables referred to in the main text, (ii) the details of the generalized linear regression analysis conducted to quantify the effect of commercial catch, price on monitoring effort for each species, and (iii) a summary of the NuSEDS cleaning procedure.
Figure S1. Annual number of salmon stream
populations monitored over time in NuSEDS after filtering zeros.
Figure S2. Annual proportion of salmon
stream populations assessed over time in NuSEDS after filtering
zeros.
Figure S3. Annual number of pink salmon
populations in odd and even years monitored in NuSEDS and broken down by
region (panels a through f). Zeros were filtered.
Figure S4. Annual number of salmon
populations monitored in NuSEDS broken down by region (panels a through
i) and by species (Chinook, chum, coho, pink, and sockeye), including
observations recorded as zero counts in the NuSEDS dataset.
Figure S5. The average annual change in
populations monitored since 1986 as estimated via linear regression for
each species and region, including observations recorded as zero counts
in the NuSEDS dataset. Dots and horizontal bars represent the slopes and
their 95% confidence intervals, respectively.
Figure S6. Pattern in zero counts
(MAX_ESTIMATE) for Fraser sockeye populations (n = 351) in the cleaned
NuSEDS data before filtering out zeros.
Figure S7. Frequency of the different
survey data quality values (ESTIMATE_CLASSIFICATION) in the cleaned
NuSEDS data after filtering zeros.
Figure S8. Frequency of the different
survey data quality values (ESTIMATE_CLASSIFICATION) in the cleaned
NuSEDS data before filtering zeros.
Figure S9. Frequency of the different
survey data quality values per species since 1998 in the cleaned NuSEDS
data after filtering zeros. The stream survey quality grouping is based
on the field ESTIMATE_CLASSIFICATION: “High” for “TRUE ABUNDANCE
(TYPE-1)”; “Medium-High” for “TRUE ABUNDANCE (TYPE-2)”; “Medium” for
“RELATIVE ABUNDANCE (TYPE-3)”; “Medium-Low” for “RELATIVE ABUNDANCE
(TYPE-4)”; “Low” for “RELATIVE ABUNDANCE (TYPE-5)”, “PRESENCE-ABSENCE
(TYPE-6)”, “PRESENCE/ABSENCE (TYPE-6)”, “RELATIVE: VARYING MULTI-YEAR
METHODS” and “RELATIVE: CONSTANT MULTI-YEAR METHODS”.
Figure S10. Frequency of the different
survey data quality values per species since 1998 in the cleaned NuSEDS
data before filtering zeros. The stream survey quality grouping is based
on the field ESTIMATE_CLASSIFICATION: “High” for “TRUE ABUNDANCE
(TYPE-1)”; “Medium-High” for “TRUE ABUNDANCE (TYPE-2)”; “Medium” for
“RELATIVE ABUNDANCE (TYPE-3)”; “Medium-Low” for “RELATIVE ABUNDANCE
(TYPE-4)”; “Low” for “RELATIVE ABUNDANCE (TYPE-5)”, “PRESENCE-ABSENCE
(TYPE-6)”, “PRESENCE/ABSENCE (TYPE-6)”, “RELATIVE: VARYING MULTI-YEAR
METHODS” and “RELATIVE: CONSTANT MULTI-YEAR METHODS”.
Table S1. Number of Conservation Units (CUs), populations, and sites present in the NuSEDS data in each region after cleaning (NAs counts were removed, zeros were kept). Sites are unique locations where spawners from one or more CUs may be counted. Populations are unique combinations of CUs and sites.
Region | Number of CUs | Number of populations | Number of sites |
---|---|---|---|
Yukon | 15 | 60 | 54 |
Transboundary | 22 | 65 | 37 |
Haida Gwaii | 28 | 756 | 233 |
Nass | 20 | 307 | 82 |
Skeena | 54 | 777 | 265 |
Central Coast | 111 | 1681 | 404 |
Vancouver Island & Mainland Inlets | 83 | 2328 | 668 |
Fraser | 57 | 1054 | 617 |
Columbia | 2 | 2 | 1 |
Overall | 392 | 7030 | 2361 |
The goal of the analysis is to quantify the effect of commercial catch on monitoring effort accounting for the effect of price over time and for each species.
We fit two models where the number of populations monitored
(count_pop
) is expressed as (1) a function of landing
value, i.e., the product of commercial catch in kg
(catch_kg
) and commercial price per kg
(price_kg
), and (2) as a function of commercial catch
alone. For both models we include species
as a predictor
and its interaction with the other predictor. We compare the two models
using the Akaike information criterion (AIC).
We selected the negative binomial distribution to correct the overdispersion observed when using the Poisson distribution. The overdispersion is estimated by dividing the residual deviance by the residual degrees of freedom and it is < 1.1 for all four models (two models fitted on two datasets, each reasulting from from filtering or keeping the zeros in NuSEDS, respectively). We assessed collinearity among explanatory variables using pairwise plots and correlation coefficients. We used deviance residuals to verify the models’ assumptions with diagnostic plots (i.e., homoscedasticity of the residuals). We calculated a pseudo-R2 as the ratio between the null deviance minus the residual deviance divided by the null deviance.
Dataset containing the number of populations (count_pop
)
extracted from NuSEDS after filtering out the zeros:
## species year count_pop
## 1 Pink 1926 0
## 2 Pink 1927 0
## 3 Pink 1928 1
## 4 Pink 1929 0
## 5 Pink 1930 0
## 6 Pink 1933 0
Commercial Canadian catch data – number of fish (count
)
and weight in kg (wt_kg
) – downloaded from the North Pacific Anadromous Fish
Commission website in July 2024:
## species year count wt_kg
## 1 Coho 1925 3079200 11739000
## 2 Coho 1926 2843400 10840000
## 3 Coho 1927 2874500 10780000
## 4 Coho 1928 3231200 12151000
## 5 Coho 1929 3236900 12111000
## 6 Coho 1930 3216900 12169000
Price per kg data downloaded from DFO’s 2023 Economic analysis:
## Year Sockeye Coho Pink Chum Chinook
## 1 1952 6.25 3.27 2.00 1.84 4.19
## 2 1953 5.34 3.08 1.77 1.68 4.29
## 3 1954 5.20 3.56 1.82 1.72 4.39
## 4 1955 5.49 4.02 2.03 2.26 5.25
## 5 1956 6.11 5.04 2.00 2.68 6.03
## 6 1957 6.04 3.42 2.01 1.91 5.10
These three datasets are combined into one for the analysis:
## species year count_pop count_catch catch_kg price_kg
## 1 Pink 1952 322 11217000 25281000 2.00
## 2 Pink 1953 369 11106000 30476000 1.77
## 3 Pink 1954 438 5439000 12722000 1.82
## 4 Pink 1955 371 11240000 31270000 2.03
## 5 Pink 1956 452 7352000 14311000 2.00
## 6 Pink 1957 367 11310000 28297000 2.01
The data range from 1952 to 2020.
Figure S11. Number of populations monitored, catch and price over time for each species. Each variables was normalised across all species to be displayed on the same Y axis. Zeros counts were filtered out in NuSEDS.
We check collinearity using pairwise plots comparing covariates and correlation coefficients:
Figure S12. Pairwise plot showing the distribution, relationship and correlation among the variables.
The correlations among explanatory variables are acceptable.
We notice that the explanatory variables catch
(catch_kg
) and price (price_kg
) are right
skewed, so we transform them with the square root function to improve
the model fit:
Figure S13. Pairwise plot showing the distribution,
relationship and correlation among the variables with catch
and price
transformed with the square root function.
We first fit a GLM with a Poisson distribution and the log-link function:
glm_cps <- glm(count_pop ~ catch_kg_sqrt:price_kg_sqrt * species,
data = data_here, family = poisson(link = "log"))
glm_cs <- glm(count_pop ~ catch_kg_sqrt * species,
data = data_here, family = poisson(link = "log"))
We estimate if there is overdispersion by dividing the residual deviance by the residual degrees of freedom:
s_glm_cps <- summary(glm_cps)
round(s_glm_cps$deviance / s_glm_cps$df.residual,2)
## [1] 17.08
s_glm_cs <- summary(glm_cs)
round(s_glm_cs$deviance / s_glm_cs$df.residual,2)
## [1] 18.72
The values are much larger than 1, indicating overdispersion.
To solve this issue, we fit the same models but with a negative binomial distribution:
glm_cps <- glm.nb(count_pop ~ catch_kg_sqrt:price_kg_sqrt * species,
data = data_here, link = "log")
glm_cs <- glm.nb(count_pop ~ catch_kg_sqrt * species,
data = data_here, link = "log")
s_glm_cps <- summary(glm_cps)
round(s_glm_cps$deviance / s_glm_cps$df.residual,2)
## [1] 1.04
s_glm_cs <- summary(glm_cs)
round(s_glm_cs$deviance / s_glm_cs$df.residual,2)
## [1] 1.04
There is no more overdispersion.
We check the deviance residuals for glm_cps
:
Figure S14. Residual diagnostic plots for the generalised linear model with a negative binomial distribution and log-link function to predict the number of populations monitored as a function of commercial catches, price and species (n = 345). Zero counts in NuSEDS were removed.
And for glm_cs
:
Figure S15. Residual diagnostic plots for the generalised linear model with a negative binomial distribution and log-link function to predict the number of populations monitored as a function of commercial catches and species (n = 345). Zero counts in NuSEDS were removed.
The residuals plots do not invalidate both models.
We compare the two models using AIC and pseudo-R2. The pseudo-R2 is expressed as the ratio between the null deviance minus the residual deviance divided by the null deviance.
## model AIC R2_pseudo
## 1 glm_cps 3977.5 0.804
## 2 glm_cs 4007.7 0.786
The model expressing the effect of landed value (i.e,, the product of
catch catch_kg
and price price_kg
) on the
number of population monitored (glm_cps
) is the best model
according to AIC (delta AIC = 30.2).
The parameter estimates for glm_cps
are:
## parameters estimate SE P_value
## 1 (Intercept) 5.255 0.0709 < 0.001
## 2 speciesChum 0.722 0.0919 < 0.001
## 3 speciesCoho 0.540 0.0878 < 0.001
## 4 speciesPink 0.223 0.0885 < 0.05
## 5 speciesSockeye 0.192 0.0924 < 0.05
## 6 catch_kg_sqrt:price_kg_sqrt 0.000 0.0000 0.1
## 7 catch_kg_sqrt:price_kg_sqrt:speciesChum 0.000 0.0000 < 0.001
## 8 catch_kg_sqrt:price_kg_sqrt:speciesCoho 0.000 0.0000 < 0.001
## 9 catch_kg_sqrt:price_kg_sqrt:speciesPink 0.000 0.0000 < 0.001
## 10 catch_kg_sqrt:price_kg_sqrt:speciesSockeye 0.000 0.0000 0.61
The parameter estimates for glm_cs
are:
## parameters estimate SE P_value
## 1 (Intercept) 5.317 0.0782 < 0.001
## 2 catch_kg_sqrt 0.000 0.0000 0.51
## 3 speciesChum 0.620 0.1110 < 0.001
## 4 speciesCoho 0.447 0.0968 < 0.001
## 5 speciesPink 0.082 0.1056 0.44
## 6 speciesSockeye 0.120 0.1026 0.24
## 7 catch_kg_sqrt:speciesChum 0.000 0.0000 < 0.05
## 8 catch_kg_sqrt:speciesCoho 0.000 0.0000 < 0.001
## 9 catch_kg_sqrt:speciesPink 0.000 0.0000 < 0.05
## 10 catch_kg_sqrt:speciesSockeye 0.000 0.0000 0.78
We predict number of populations monitored
(count_pop
) as a function of Landed value =
catch (catch_kg
) \(\times\) price
(price_kg
) for each species using the glm_cps
model:
Figure S16. Prediction of the number of populations monitored as a function of landed value (in million of CAD), which is the product between commercial catch (in kg) and price (in CAD per kg). A generalized linear model with a negative binomial distribution and log-link function was fitted with catch, price and species as predictive variables (n = 345). Lines and polygons represent model fits and 95% confidence intervals, respectively; circles represent the data, which span from 1952 and 2020. Zero counts in NuSEDS were filtered out. This figure is the one appearing in the main text.
We use here the equivalent dataset but where zeros were considered as positive observation to assess the number of population monitored in a given year:
## species year count_pop count_catch catch_kg price_kg
## 1 Pink 1952 322 11217000 25281000 2.00
## 2 Pink 1953 369 11106000 30476000 1.77
## 3 Pink 1954 438 5439000 12722000 1.82
## 4 Pink 1955 371 11240000 31270000 2.03
## 5 Pink 1956 452 7352000 14311000 2.00
## 6 Pink 1957 373 11310000 28297000 2.01
Figure S17: Number of populations monitored, catch and price over time for each species. Each variables was normalised across all species to be displayed on the same Y axis. Zeros counts were not filtered out in NuSEDS (also the figure shows the number).
We check collinearity using pairwise plots comparing covariates and
correlation coefficients (after transforming catch
catch_kg
and price price_kg
with the
square root function):
Figure S18: Pairwise plot showing the distribution,
relationship and correlation among the variables with catch
and price
transformed with the square root function.
The correlations among explanatory variables are acceptable.
We first fit a GLM with a Poisson distribution and the log-link function:
glm_cps <- glm(count_pop ~ catch_kg_sqrt:price_kg_sqrt * species,
data = data_here, family = poisson(link = "log"))
glm_cs <- glm(count_pop ~ catch_kg_sqrt * species,
data = data_here, family = poisson(link = "log"))
We estimate if there is overdispersion by dividing the residual deviance by the residual degrees of freedom:
s_glm_cps <- summary(glm_cps)
round(s_glm_cps$deviance / s_glm_cps$df.residual,2)
## [1] 17.55
s_glm_cs <- summary(glm_cs)
round(s_glm_cs$deviance / s_glm_cs$df.residual,2)
## [1] 19.24
The values are much larger than 1, indicating overdispersion.
To solve this issue, we fit the same models but with a negative binomial distribution:
glm_cps <- glm.nb(count_pop ~ catch_kg_sqrt:price_kg_sqrt * species,
data = data_here, link = "log")
glm_cs <- glm.nb(count_pop ~ catch_kg_sqrt * species,
data = data_here, link = "log")
s_glm_cps <- summary(glm_cps)
round(s_glm_cps$deviance / s_glm_cps$df.residual,2)
## [1] 1.04
s_glm_cs <- summary(glm_cs)
round(s_glm_cs$deviance / s_glm_cs$df.residual,2)
## [1] 1.04
There is no more overdispersion.
We check the deviance residuals for glm_cps
:
Figure S20. Residual diagnostic plots for the generalised linear model with a negative binomial distribution and log-link function to predict the number of populations monitored as a function of commercial catches, price and species (n = 345). Zero counts in NuSEDS were kept.
And for glm_cs
:
Figure S21. Residual diagnostic plots for the generalised linear model with a negative binomial distribution and log-link function to predict the number of populations monitored as a function of commercial catches and species (n = 345). Zero counts in NuSEDS were kept.
The residuals plots do not invalidate both models.
We compare the two models using AIC and pseudo-R2. The pseudo-R2 is expressed as the ratio between the null deviance minus the residual deviance divided by the null deviance.
## model AIC R2_pseudo
## 1 glm_cps 4006.7 0.785
## 2 glm_cs 4036.2 0.766
The model expressing the effect of landed value (i.e,, the product of
catch catch_kg
and price price_kg
) on the
number of population monitored (glm_cps
) is the best model
according to AIC (delta AIC = 29.5).
The parameter estimates for glm_cps
are:
## parameters estimate SE P_value
## 1 (Intercept) 5.300 0.0718 < 0.001
## 2 speciesChum 0.689 0.0932 < 0.001
## 3 speciesCoho 0.537 0.0890 < 0.001
## 4 speciesPink 0.186 0.0897 < 0.05
## 5 speciesSockeye 0.441 0.0935 < 0.001
## 6 catch_kg_sqrt:price_kg_sqrt 0.000 0.0000 0.22
## 7 catch_kg_sqrt:price_kg_sqrt:speciesChum 0.000 0.0000 < 0.001
## 8 catch_kg_sqrt:price_kg_sqrt:speciesCoho 0.000 0.0000 < 0.001
## 9 catch_kg_sqrt:price_kg_sqrt:speciesPink 0.000 0.0000 < 0.001
## 10 catch_kg_sqrt:price_kg_sqrt:speciesSockeye 0.000 0.0000 0.07
The parameter estimates for glm_cs
are:
## parameters estimate SE P_value
## 1 (Intercept) 5.376 0.0792 < 0.001
## 2 catch_kg_sqrt 0.000 0.0000 0.92
## 3 speciesChum 0.577 0.1124 < 0.001
## 4 speciesCoho 0.434 0.0980 < 0.001
## 5 speciesPink 0.034 0.1070 0.75
## 6 speciesSockeye 0.370 0.1036 < 0.001
## 7 catch_kg_sqrt:speciesChum 0.000 0.0000 < 0.01
## 8 catch_kg_sqrt:speciesCoho 0.000 0.0000 < 0.001
## 9 catch_kg_sqrt:speciesPink 0.000 0.0000 < 0.01
## 10 catch_kg_sqrt:speciesSockeye 0.000 0.0000 0.47
We predict number of populations monitored
(count_pop
) as a function of Landed value =
catch (catch_kg
) \(\times\) price
(price_kg
) for each species using the glm_cps
model:
Figure S22. Prediction of the number of populations monitored as a function of landed value (in million of CAD), which is the product between commercial catch (in kg) and price (in CAD per kg). A generalized linear model with a negative binomial distribution and log-link function was fitted with catch, price and species as predictive variables (n = 345). Lines and polygons represent model fits and 95% confidence intervals, respectively; circles represent the data, which span from 1952 and 2020. Zero counts in NuSEDS were kept.
The R script containing the code for the NuSEDS data processing procedure described below is available in a Zenodo repository (https://zenodo.org/records/14194638). Complete details of the cleaning procedure can be found at: https://bookdown.org/salmonwatersheds/nuseds_cleaning_procedure_atkinson/1_nuseds_collation_1.html.
The objective of the procedure is to obtain the yearly counts of each
salmon population in their respective stream and associate these
populations to their corresponding conservation unit (CU). The NuSEDS
data is separated into two datasets. The all_areas_nuseds dataset
contains the observed yearly counts (related fields:
NATURAL_ADULT_SPAWNERS
,
NATURAL_SPAWNERS_TOTAL
, etc.) for each population (related
fields: SPECIES
, POP_ID
,
POPULATION
) in their respective site (related fields:
AREA
, GFE_ID
, WATERBODY
,
GAZETTED_NAME
, etc.), along with the associated methods
used (related fields: ESTIMATE_METHOD
,
ESTIMATE_CLASSIFICATION
,
ENUMERATION_METHODS
).
We first define the unique yearly count field MAX_ESTIMATE for each
population as the maximum value of the count-related fields in
all_areas_nuseds, i.e., NATURAL_ADULT_SPAWNERS
,
NATURAL_JACK_SPAWNERS
, NATURAL_SPAWNERS_TOTAL
,
ADULT_BROODSTOCK_REMOVALS
,
JACK_BROODSTOCK_REMOVALS
,
TOTAL_BROODSTOCK_REMOVALS
, OTHER_REMOVALS
and
TOTAL_RETURN_TO_RIVER
. MAX_ESTIMATE
is the
only count-related field we use in the rest of the procedure. A
population’s MAX_ESTIMATE
data points is referred to as its
“time series”.
The second dataset, conservation_unit_system_sites, links each population (related fields: POP_ID) to their respective CU (related fields: CU_NAME, FULL_CU_IN, CU_LONGT, CU_LAT, etc.) and site (related fields: GFE_ID, SYSTEM_SITE, X_LONGT, Y_LAT). Ideally, attributing each time series in all_areas_nuseds its corresponding CU and location’s coordinates in conservation_unit_system_sites would simply consist in merging the two datasets using the population and location identification number POP_ID and GFE_ID, respectively. Unfortunately, numerous time series are problematic, which occurs when:
A time series is present in all_areas_nuseds but its POP_ID and GFE_ID association is absent in conservation_unit_system_sites (there are 4428 populations in that case).
Multiple time series of thea same population (i.e. same POP_ID) are observed in multiple locations (i.e. different GFE_IDs), which should not occur because a POP_ID should be defined for a unique location.
Multiple populations (i.e. different POP_ID) of a same CU are observed in the same location (i.e. same GFE_ID), suggesting that these populations should form one unique population.
The observation of problematic time series revealed inconsistencies such as missing, duplicated, and conflicting data points (i.e. different counts in the same year). In order to rescue as much data as possible, we attempted to solve these issues by either (i) deleting, (ii) combining or (iii) summing a problematic series to an alternative series (i.e., a time series present in all_areas_nuseds that shares either the same POP_ID or GFE_ID and species), or (iv) creating a new reference to the time series in conservation_unit_system_sites. In each case, we used all the information available in the different fields and made assumptions based on our professional judgment. Below are more details about the different interventions applied to problematic time series:
We deleted a problematic time series (or a portion of it) from all_areas_nuseds when (1) it was a duplicate of another time series, (2) it had less than four data points and was not spotted during the cleaning process, or (3) the lack of information prevented to combine or merge it to an alternative series or to create a new series in conservation_unit_system_sites (e.g. if run timing was unknown).
We combined a problematic time series with an alternative series when data points were not in conflict (i.e. different value for a same year) and the information available in the other fields was not prescriptive (e.g. the population belong to the same CU, the locations have similar names or spatial coordinates, the runtimings are not different).
We summed a problematic time series to an alternative series having different POP_ID but same GFE_ID and CU_NAME and with conflicting data points.
We created a new reference of the problematic
time series in conservation_unit_system_sites (i.e. a new row with a
unique POP_ID
- GFE_ID
combination) when it
could not be deleted, merged or combined to an alternative
series.
The diverse and idiosyncratic nature of these problematic cases prevented the establishment of a systematic and simple cleaning process. Many of these cases were resolved individually by observing and comparing time series and accounting for the information available. We proceeded in consecutive steps in which we applied the different types of intervention mentioned above, either after visually inspecting individual time series or by applying a systematic procedure. These steps are detailed below:
Remove the time series (i.e. unique POP_ID
-
GFE_ID
associations) with only NAs values for MAX_ESTIMATE
in all_areas_nuseds (corresponding to 4424 time series, 101,944 rows or
25% of all_areas_nuseds)
Assess all the references to time series present in conservation_unit_system_sites but absent in all_areas_nuseds (247 out of 714 time series in all_areas_nuseds are not referenced in conservation_unit_system_sites)
2.2. Cases with presence of alternative series in NUSEDS (n = 3)
2.3. Cases with no alternative time series in all_areas_nuseds (n = 243)
2.4. Case with multiple GFE_ID
for a single
POP_ID
(n = 1)
POP_ID
but different
GFE_ID
s in conservation_unit_system_sites and present in
all_areas_nuseds (n = 1).GFE_ID
. Both time series were kept because they are not
compatible and have many and recent data points.4.1. Cases of series with alternative POP_ID
and no
alternative GFE_ID
(n = 126)
4.2. Cases of series with alternative GFE_ID
with or
without alternative POP_ID
(n = 64)
POP_ID
s (e.g. series
with POP_ID
= 47925 and GFE_ID
= 15, 31516 and
31740, respectively) having the same alternative series (e.g. series
with POP_ID
= 47925 and GFE_ID
= 14) are
assessed together, resulting in 37 groups of time series to assess
separately.4.3. Cases with no alternative series (n = 166)
We attempted to associate these time series to a CU using the
fields X_LONGT
and Y_LAT
to intersect them
with the CUs’ shapefiles used in the PSE.
Time series without coordinates (or for which coordinates could not be found) were removed from all_areas_nuseds (n = 5).
The reference of the rest of the time series was added to conservation_unit_system_sites
Merge all_areas_nuseds and conservation_unit_system_sites using
the fields POP_ID
and GFE_ID
.
Assess time series with different POP_ID
but a same
CU_NAME
and GFE_ID
(n = 126).
Case 1: there is only one data point in one of the series; if it is compatible, then merge to the alternative series, otherwise remove it from all_areas_nuseds.
Case 2: One of the series is 100% duplicated with the other one: to remove from all_areas_nuseds.
Case 3: the rest of the series are merged; adding the values when data points are in conflict.
We matched the CUs in the cleaned NuSEDS data with the CUs defined in
the Pacific Salmon Explorer, which are associated with a given region
(PSE; https://salmonexplorer.ca). We used the fields
POP_ID
, FULL_CU_IN
and CU_NAME
.
There are 23 CU_NAME not matching to any CUs in the PSE, corresponding
to 87 populations and 2256 data points. These time series are kept.
For the analysis presented in the main text, we excluded data points where MAX_ESTIMATE = NA (n = 156,507) or 0 (n = 3,449) and we present here in the appendix the results of the analyses done on the cleaned NuSEDS dataset where zeros are kept.