The is the supporting information for: Atkinson et al. Monitoring for fisheries or for fish? Declines in monitoring of salmon spawners continue despite a conservation crisis.

The document contains (i) the supplementary figures and tables referred to in the main text, (ii) the details of the generalized linear regression analysis conducted to quantify the effect of commercial catch, price on monitoring effort for each species, and (iii) a summary of the NuSEDS cleaning procedure.

1 Supplementary figures and tables

fig Figure S1. Annual number of salmon stream populations monitored over time in NuSEDS after filtering zeros.

fig Figure S2. Annual proportion of salmon stream populations assessed over time in NuSEDS after filtering zeros.

fig Figure S3. Annual number of pink salmon populations in odd and even years monitored in NuSEDS and broken down by region (panels a through f). Zeros were filtered.

fig Figure S4. Annual number of salmon populations monitored in NuSEDS broken down by region (panels a through i) and by species (Chinook, chum, coho, pink, and sockeye), including observations recorded as zero counts in the NuSEDS dataset.

fig Figure S5. The average annual change in populations monitored since 1986 as estimated via linear regression for each species and region, including observations recorded as zero counts in the NuSEDS dataset. Dots and horizontal bars represent the slopes and their 95% confidence intervals, respectively.

fig Figure S6. Pattern in zero counts (MAX_ESTIMATE) for Fraser sockeye populations (n = 351) in the cleaned NuSEDS data before filtering out zeros.

fig Figure S7. Frequency of the different survey data quality values (ESTIMATE_CLASSIFICATION) in the cleaned NuSEDS data after filtering zeros.

fig Figure S8. Frequency of the different survey data quality values (ESTIMATE_CLASSIFICATION) in the cleaned NuSEDS data before filtering zeros.

fig Figure S9. Frequency of the different survey data quality values per species since 1998 in the cleaned NuSEDS data after filtering zeros. The stream survey quality grouping is based on the field ESTIMATE_CLASSIFICATION: “High” for “TRUE ABUNDANCE (TYPE-1)”; “Medium-High” for “TRUE ABUNDANCE (TYPE-2)”; “Medium” for “RELATIVE ABUNDANCE (TYPE-3)”; “Medium-Low” for “RELATIVE ABUNDANCE (TYPE-4)”; “Low” for “RELATIVE ABUNDANCE (TYPE-5)”, “PRESENCE-ABSENCE (TYPE-6)”, “PRESENCE/ABSENCE (TYPE-6)”, “RELATIVE: VARYING MULTI-YEAR METHODS” and “RELATIVE: CONSTANT MULTI-YEAR METHODS”.

fig Figure S10. Frequency of the different survey data quality values per species since 1998 in the cleaned NuSEDS data before filtering zeros. The stream survey quality grouping is based on the field ESTIMATE_CLASSIFICATION: “High” for “TRUE ABUNDANCE (TYPE-1)”; “Medium-High” for “TRUE ABUNDANCE (TYPE-2)”; “Medium” for “RELATIVE ABUNDANCE (TYPE-3)”; “Medium-Low” for “RELATIVE ABUNDANCE (TYPE-4)”; “Low” for “RELATIVE ABUNDANCE (TYPE-5)”, “PRESENCE-ABSENCE (TYPE-6)”, “PRESENCE/ABSENCE (TYPE-6)”, “RELATIVE: VARYING MULTI-YEAR METHODS” and “RELATIVE: CONSTANT MULTI-YEAR METHODS”.

Table S1. Number of Conservation Units (CUs), populations, and sites present in the NuSEDS data in each region after cleaning (NAs counts were removed, zeros were kept). Sites are unique locations where spawners from one or more CUs may be counted. Populations are unique combinations of CUs and sites.

Region Number of CUs Number of populations Number of sites
Yukon 15 60 54
Transboundary 22 65 37
Haida Gwaii 28 756 233
Nass 20 307 82
Skeena 54 777 265
Central Coast 111 1681 404
Vancouver Island & Mainland Inlets 83 2328 668
Fraser 57 1054 617
Columbia 2 2 1
Overall 392 7030 2361

2 Generealized linear regression

2.1 Summary

The goal of the analysis is to quantify the effect of commercial catch on monitoring effort accounting for the effect of price over time and for each species.

We fit two models where the number of populations monitored (count_pop) is expressed as (1) a function of landing value, i.e., the product of commercial catch in kg (catch_kg) and commercial price per kg (price_kg), and (2) as a function of commercial catch alone. For both models we include species as a predictor and its interaction with the other predictor. We compare the two models using the Akaike information criterion (AIC).

We selected the negative binomial distribution to correct the overdispersion observed when using the Poisson distribution. The overdispersion is estimated by dividing the residual deviance by the residual degrees of freedom and it is < 1.1 for all four models (two models fitted on two datasets, each reasulting from from filtering or keeping the zeros in NuSEDS, respectively). We assessed collinearity among explanatory variables using pairwise plots and correlation coefficients. We used deviance residuals to verify the models’ assumptions with diagnostic plots (i.e., homoscedasticity of the residuals). We calculated a pseudo-R2 as the ratio between the null deviance minus the residual deviance divided by the null deviance.

2.2 The datasets

Dataset containing the number of populations (count_pop) extracted from NuSEDS after filtering out the zeros:

##   species year count_pop
## 1    Pink 1926         0
## 2    Pink 1927         0
## 3    Pink 1928         1
## 4    Pink 1929         0
## 5    Pink 1930         0
## 6    Pink 1933         0

Commercial Canadian catch data – number of fish (count) and weight in kg (wt_kg) – downloaded from the North Pacific Anadromous Fish Commission website in July 2024:

##   species year   count    wt_kg
## 1    Coho 1925 3079200 11739000
## 2    Coho 1926 2843400 10840000
## 3    Coho 1927 2874500 10780000
## 4    Coho 1928 3231200 12151000
## 5    Coho 1929 3236900 12111000
## 6    Coho 1930 3216900 12169000

Price per kg data downloaded from DFO’s 2023 Economic analysis:

##   Year Sockeye Coho Pink Chum Chinook
## 1 1952    6.25 3.27 2.00 1.84    4.19
## 2 1953    5.34 3.08 1.77 1.68    4.29
## 3 1954    5.20 3.56 1.82 1.72    4.39
## 4 1955    5.49 4.02 2.03 2.26    5.25
## 5 1956    6.11 5.04 2.00 2.68    6.03
## 6 1957    6.04 3.42 2.01 1.91    5.10

These three datasets are combined into one for the analysis:

##   species year count_pop count_catch catch_kg price_kg
## 1    Pink 1952       322    11217000 25281000     2.00
## 2    Pink 1953       369    11106000 30476000     1.77
## 3    Pink 1954       438     5439000 12722000     1.82
## 4    Pink 1955       371    11240000 31270000     2.03
## 5    Pink 1956       452     7352000 14311000     2.00
## 6    Pink 1957       367    11310000 28297000     2.01

The data range from 1952 to 2020.

2.3 Analysis excluding the zeros in NuSEDS

Figure S11. Number of populations monitored, catch and price over time for each species. Each variables was normalised across all species to be displayed on the same Y axis. Zeros counts were filtered out in NuSEDS.

2.3.1 Check for collinearity

We check collinearity using pairwise plots comparing covariates and correlation coefficients:

Figure S12. Pairwise plot showing the distribution, relationship and correlation among the variables.

The correlations among explanatory variables are acceptable.

We notice that the explanatory variables catch (catch_kg) and price (price_kg) are right skewed, so we transform them with the square root function to improve the model fit:

Figure S13. Pairwise plot showing the distribution, relationship and correlation among the variables with catch and price transformed with the square root function.

2.3.2 Model definition

We first fit a GLM with a Poisson distribution and the log-link function:

glm_cps <- glm(count_pop ~ catch_kg_sqrt:price_kg_sqrt * species, 
               data = data_here, family = poisson(link = "log"))

glm_cs <- glm(count_pop ~ catch_kg_sqrt * species, 
              data = data_here, family = poisson(link = "log"))

We estimate if there is overdispersion by dividing the residual deviance by the residual degrees of freedom:

s_glm_cps <- summary(glm_cps)
round(s_glm_cps$deviance / s_glm_cps$df.residual,2)
## [1] 17.08
s_glm_cs <- summary(glm_cs)
round(s_glm_cs$deviance / s_glm_cs$df.residual,2)
## [1] 18.72

The values are much larger than 1, indicating overdispersion.

To solve this issue, we fit the same models but with a negative binomial distribution:

glm_cps <- glm.nb(count_pop ~ catch_kg_sqrt:price_kg_sqrt * species, 
               data = data_here, link = "log")

glm_cs <- glm.nb(count_pop ~ catch_kg_sqrt * species, 
              data = data_here, link = "log")
s_glm_cps <- summary(glm_cps)
round(s_glm_cps$deviance / s_glm_cps$df.residual,2)
## [1] 1.04
s_glm_cs <- summary(glm_cs)
round(s_glm_cs$deviance / s_glm_cs$df.residual,2)
## [1] 1.04

There is no more overdispersion.

2.3.3 Deviance residuals plots

We check the deviance residuals for glm_cps:

Figure S14. Residual diagnostic plots for the generalised linear model with a negative binomial distribution and log-link function to predict the number of populations monitored as a function of commercial catches, price and species (n = 345). Zero counts in NuSEDS were removed.

And for glm_cs:

Figure S15. Residual diagnostic plots for the generalised linear model with a negative binomial distribution and log-link function to predict the number of populations monitored as a function of commercial catches and species (n = 345). Zero counts in NuSEDS were removed.

The residuals plots do not invalidate both models.

2.3.4 Model comparison

We compare the two models using AIC and pseudo-R2. The pseudo-R2 is expressed as the ratio between the null deviance minus the residual deviance divided by the null deviance.

##     model    AIC R2_pseudo
## 1 glm_cps 3977.5     0.804
## 2  glm_cs 4007.7     0.786

The model expressing the effect of landed value (i.e,, the product of catch catch_kg and price price_kg) on the number of population monitored (glm_cps) is the best model according to AIC (delta AIC = 30.2).

2.3.5 Parameter estimates

The parameter estimates for glm_cps are:

##                                    parameters estimate     SE P_value
## 1                                 (Intercept)    5.255 0.0709 < 0.001
## 2                                 speciesChum    0.722 0.0919 < 0.001
## 3                                 speciesCoho    0.540 0.0878 < 0.001
## 4                                 speciesPink    0.223 0.0885  < 0.05
## 5                              speciesSockeye    0.192 0.0924  < 0.05
## 6                 catch_kg_sqrt:price_kg_sqrt    0.000 0.0000     0.1
## 7     catch_kg_sqrt:price_kg_sqrt:speciesChum    0.000 0.0000 < 0.001
## 8     catch_kg_sqrt:price_kg_sqrt:speciesCoho    0.000 0.0000 < 0.001
## 9     catch_kg_sqrt:price_kg_sqrt:speciesPink    0.000 0.0000 < 0.001
## 10 catch_kg_sqrt:price_kg_sqrt:speciesSockeye    0.000 0.0000    0.61

The parameter estimates for glm_cs are:

##                      parameters estimate     SE P_value
## 1                   (Intercept)    5.317 0.0782 < 0.001
## 2                 catch_kg_sqrt    0.000 0.0000    0.51
## 3                   speciesChum    0.620 0.1110 < 0.001
## 4                   speciesCoho    0.447 0.0968 < 0.001
## 5                   speciesPink    0.082 0.1056    0.44
## 6                speciesSockeye    0.120 0.1026    0.24
## 7     catch_kg_sqrt:speciesChum    0.000 0.0000  < 0.05
## 8     catch_kg_sqrt:speciesCoho    0.000 0.0000 < 0.001
## 9     catch_kg_sqrt:speciesPink    0.000 0.0000  < 0.05
## 10 catch_kg_sqrt:speciesSockeye    0.000 0.0000    0.78

2.3.6 Predictions

We predict number of populations monitored (count_pop) as a function of Landed value = catch (catch_kg) \(\times\) price (price_kg) for each species using the glm_cps model:

Figure S16. Prediction of the number of populations monitored as a function of landed value (in million of CAD), which is the product between commercial catch (in kg) and price (in CAD per kg). A generalized linear model with a negative binomial distribution and log-link function was fitted with catch, price and species as predictive variables (n = 345). Lines and polygons represent model fits and 95% confidence intervals, respectively; circles represent the data, which span from 1952 and 2020. Zero counts in NuSEDS were filtered out. This figure is the one appearing in the main text.

2.4 Analysis including the zeros in NuSEDS

We use here the equivalent dataset but where zeros were considered as positive observation to assess the number of population monitored in a given year:

##   species year count_pop count_catch catch_kg price_kg
## 1    Pink 1952       322    11217000 25281000     2.00
## 2    Pink 1953       369    11106000 30476000     1.77
## 3    Pink 1954       438     5439000 12722000     1.82
## 4    Pink 1955       371    11240000 31270000     2.03
## 5    Pink 1956       452     7352000 14311000     2.00
## 6    Pink 1957       373    11310000 28297000     2.01

Figure S17: Number of populations monitored, catch and price over time for each species. Each variables was normalised across all species to be displayed on the same Y axis. Zeros counts were not filtered out in NuSEDS (also the figure shows the number).

2.4.1 Check for collinearity

We check collinearity using pairwise plots comparing covariates and correlation coefficients (after transforming catch catch_kg and price price_kg with the square root function):

Figure S18: Pairwise plot showing the distribution, relationship and correlation among the variables with catch and price transformed with the square root function.

The correlations among explanatory variables are acceptable.

2.4.2 Model definition

We first fit a GLM with a Poisson distribution and the log-link function:

glm_cps <- glm(count_pop ~ catch_kg_sqrt:price_kg_sqrt * species, 
               data = data_here, family = poisson(link = "log"))

glm_cs <- glm(count_pop ~ catch_kg_sqrt * species, 
              data = data_here, family = poisson(link = "log"))

We estimate if there is overdispersion by dividing the residual deviance by the residual degrees of freedom:

s_glm_cps <- summary(glm_cps)
round(s_glm_cps$deviance / s_glm_cps$df.residual,2)
## [1] 17.55
s_glm_cs <- summary(glm_cs)
round(s_glm_cs$deviance / s_glm_cs$df.residual,2)
## [1] 19.24

The values are much larger than 1, indicating overdispersion.

To solve this issue, we fit the same models but with a negative binomial distribution:

glm_cps <- glm.nb(count_pop ~ catch_kg_sqrt:price_kg_sqrt * species, 
               data = data_here, link = "log")

glm_cs <- glm.nb(count_pop ~ catch_kg_sqrt * species, 
              data = data_here, link = "log")
s_glm_cps <- summary(glm_cps)
round(s_glm_cps$deviance / s_glm_cps$df.residual,2)
## [1] 1.04
s_glm_cs <- summary(glm_cs)
round(s_glm_cs$deviance / s_glm_cs$df.residual,2)
## [1] 1.04

There is no more overdispersion.

2.4.3 Deviance residuals plots

We check the deviance residuals for glm_cps:

Figure S20. Residual diagnostic plots for the generalised linear model with a negative binomial distribution and log-link function to predict the number of populations monitored as a function of commercial catches, price and species (n = 345). Zero counts in NuSEDS were kept.

And for glm_cs:

Figure S21. Residual diagnostic plots for the generalised linear model with a negative binomial distribution and log-link function to predict the number of populations monitored as a function of commercial catches and species (n = 345). Zero counts in NuSEDS were kept.

The residuals plots do not invalidate both models.

2.4.4 Model comparison

We compare the two models using AIC and pseudo-R2. The pseudo-R2 is expressed as the ratio between the null deviance minus the residual deviance divided by the null deviance.

##     model    AIC R2_pseudo
## 1 glm_cps 4006.7     0.785
## 2  glm_cs 4036.2     0.766

The model expressing the effect of landed value (i.e,, the product of catch catch_kg and price price_kg) on the number of population monitored (glm_cps) is the best model according to AIC (delta AIC = 29.5).

2.4.5 Parameter estimates

The parameter estimates for glm_cps are:

##                                    parameters estimate     SE P_value
## 1                                 (Intercept)    5.300 0.0718 < 0.001
## 2                                 speciesChum    0.689 0.0932 < 0.001
## 3                                 speciesCoho    0.537 0.0890 < 0.001
## 4                                 speciesPink    0.186 0.0897  < 0.05
## 5                              speciesSockeye    0.441 0.0935 < 0.001
## 6                 catch_kg_sqrt:price_kg_sqrt    0.000 0.0000    0.22
## 7     catch_kg_sqrt:price_kg_sqrt:speciesChum    0.000 0.0000 < 0.001
## 8     catch_kg_sqrt:price_kg_sqrt:speciesCoho    0.000 0.0000 < 0.001
## 9     catch_kg_sqrt:price_kg_sqrt:speciesPink    0.000 0.0000 < 0.001
## 10 catch_kg_sqrt:price_kg_sqrt:speciesSockeye    0.000 0.0000    0.07

The parameter estimates for glm_cs are:

##                      parameters estimate     SE P_value
## 1                   (Intercept)    5.376 0.0792 < 0.001
## 2                 catch_kg_sqrt    0.000 0.0000    0.92
## 3                   speciesChum    0.577 0.1124 < 0.001
## 4                   speciesCoho    0.434 0.0980 < 0.001
## 5                   speciesPink    0.034 0.1070    0.75
## 6                speciesSockeye    0.370 0.1036 < 0.001
## 7     catch_kg_sqrt:speciesChum    0.000 0.0000  < 0.01
## 8     catch_kg_sqrt:speciesCoho    0.000 0.0000 < 0.001
## 9     catch_kg_sqrt:speciesPink    0.000 0.0000  < 0.01
## 10 catch_kg_sqrt:speciesSockeye    0.000 0.0000    0.47

2.4.6 Predictions

We predict number of populations monitored (count_pop) as a function of Landed value = catch (catch_kg) \(\times\) price (price_kg) for each species using the glm_cps model:

Figure S22. Prediction of the number of populations monitored as a function of landed value (in million of CAD), which is the product between commercial catch (in kg) and price (in CAD per kg). A generalized linear model with a negative binomial distribution and log-link function was fitted with catch, price and species as predictive variables (n = 345). Lines and polygons represent model fits and 95% confidence intervals, respectively; circles represent the data, which span from 1952 and 2020. Zero counts in NuSEDS were kept.

3 NuSEDS Data Processing

The R script containing the code for the NuSEDS data processing procedure described below is available in a Zenodo repository (https://zenodo.org/records/14194638). Complete details of the cleaning procedure can be found at: https://bookdown.org/salmonwatersheds/nuseds_cleaning_procedure_atkinson/1_nuseds_collation_1.html.

3.1 Cleaning Procedure

The objective of the procedure is to obtain the yearly counts of each salmon population in their respective stream and associate these populations to their corresponding conservation unit (CU). The NuSEDS data is separated into two datasets. The all_areas_nuseds dataset contains the observed yearly counts (related fields: NATURAL_ADULT_SPAWNERS, NATURAL_SPAWNERS_TOTAL, etc.) for each population (related fields: SPECIES, POP_ID, POPULATION) in their respective site (related fields: AREA, GFE_ID, WATERBODY, GAZETTED_NAME, etc.), along with the associated methods used (related fields: ESTIMATE_METHOD, ESTIMATE_CLASSIFICATION, ENUMERATION_METHODS).

We first define the unique yearly count field MAX_ESTIMATE for each population as the maximum value of the count-related fields in all_areas_nuseds, i.e., NATURAL_ADULT_SPAWNERS, NATURAL_JACK_SPAWNERS, NATURAL_SPAWNERS_TOTAL, ADULT_BROODSTOCK_REMOVALS, JACK_BROODSTOCK_REMOVALS, TOTAL_BROODSTOCK_REMOVALS, OTHER_REMOVALS and TOTAL_RETURN_TO_RIVER. MAX_ESTIMATE is the only count-related field we use in the rest of the procedure. A population’s MAX_ESTIMATE data points is referred to as its “time series”.

The second dataset, conservation_unit_system_sites, links each population (related fields: POP_ID) to their respective CU (related fields: CU_NAME, FULL_CU_IN, CU_LONGT, CU_LAT, etc.) and site (related fields: GFE_ID, SYSTEM_SITE, X_LONGT, Y_LAT). Ideally, attributing each time series in all_areas_nuseds its corresponding CU and location’s coordinates in conservation_unit_system_sites would simply consist in merging the two datasets using the population and location identification number POP_ID and GFE_ID, respectively. Unfortunately, numerous time series are problematic, which occurs when:

  • A time series is present in all_areas_nuseds but its POP_ID and GFE_ID association is absent in conservation_unit_system_sites (there are 4428 populations in that case).

  • Multiple time series of thea same population (i.e. same POP_ID) are observed in multiple locations (i.e. different GFE_IDs), which should not occur because a POP_ID should be defined for a unique location.

  • Multiple populations (i.e. different POP_ID) of a same CU are observed in the same location (i.e. same GFE_ID), suggesting that these populations should form one unique population.

The observation of problematic time series revealed inconsistencies such as missing, duplicated, and conflicting data points (i.e. different counts in the same year). In order to rescue as much data as possible, we attempted to solve these issues by either (i) deleting, (ii) combining or (iii) summing a problematic series to an alternative series (i.e., a time series present in all_areas_nuseds that shares either the same POP_ID or GFE_ID and species), or (iv) creating a new reference to the time series in conservation_unit_system_sites. In each case, we used all the information available in the different fields and made assumptions based on our professional judgment. Below are more details about the different interventions applied to problematic time series:

  • We deleted a problematic time series (or a portion of it) from all_areas_nuseds when (1) it was a duplicate of another time series, (2) it had less than four data points and was not spotted during the cleaning process, or (3) the lack of information prevented to combine or merge it to an alternative series or to create a new series in conservation_unit_system_sites (e.g. if run timing was unknown).

  • We combined a problematic time series with an alternative series when data points were not in conflict (i.e. different value for a same year) and the information available in the other fields was not prescriptive (e.g. the population belong to the same CU, the locations have similar names or spatial coordinates, the runtimings are not different).

  • We summed a problematic time series to an alternative series having different POP_ID but same GFE_ID and CU_NAME and with conflicting data points.

  • We created a new reference of the problematic time series in conservation_unit_system_sites (i.e. a new row with a unique POP_ID - GFE_ID combination) when it could not be deleted, merged or combined to an alternative series.

The diverse and idiosyncratic nature of these problematic cases prevented the establishment of a systematic and simple cleaning process. Many of these cases were resolved individually by observing and comparing time series and accounting for the information available. We proceeded in consecutive steps in which we applied the different types of intervention mentioned above, either after visually inspecting individual time series or by applying a systematic procedure. These steps are detailed below:

  1. Remove the time series (i.e. unique POP_ID - GFE_ID associations) with only NAs values for MAX_ESTIMATE in all_areas_nuseds (corresponding to 4424 time series, 101,944 rows or 25% of all_areas_nuseds)

  2. Assess all the references to time series present in conservation_unit_system_sites but absent in all_areas_nuseds (247 out of 714 time series in all_areas_nuseds are not referenced in conservation_unit_system_sites)

2.2. Cases with presence of alternative series in NUSEDS (n = 3)

  • Each case was assessed visually.

2.3. Cases with no alternative time series in all_areas_nuseds (n = 243)

  • The rows corresponding to these time series are removed from conservation_unit_system_sites (242 of the 243 references correspond to time series in all_areas_nuseds removed in step 1. above because they contained only NAs).

2.4. Case with multiple GFE_ID for a single POP_ID (n = 1)

  • The problematic time series was simply removed from conservation_unit_system_sites.
  1. Case of time series with the same POP_ID but different GFE_IDs in conservation_unit_system_sites and present in all_areas_nuseds (n = 1).
  • Two time series had the same POP_ID but different GFE_ID. Both time series were kept because they are not compatible and have many and recent data points.
  1. Assess all the time series present in all_areas_nuseds but absent in conservation_unit_system_sites (n = 242)

4.1. Cases of series with alternative POP_ID and no alternative GFE_ID (n = 126)

  • Each case was assessed visually.

4.2. Cases of series with alternative GFE_ID with or without alternative POP_ID (n = 64)

  • Problematic series with the same POP_IDs (e.g. series with POP_ID = 47925 and GFE_ID = 15, 31516 and 31740, respectively) having the same alternative series (e.g. series with POP_ID = 47925 and GFE_ID = 14) are assessed together, resulting in 37 groups of time series to assess separately.

4.3. Cases with no alternative series (n = 166)

  • We attempted to associate these time series to a CU using the fields X_LONGT and Y_LAT to intersect them with the CUs’ shapefiles used in the PSE.

  • Time series without coordinates (or for which coordinates could not be found) were removed from all_areas_nuseds (n = 5).

  • The reference of the rest of the time series was added to conservation_unit_system_sites

  1. Merge all_areas_nuseds and conservation_unit_system_sites using the fields POP_ID and GFE_ID.

  2. Assess time series with different POP_ID but a same CU_NAME and GFE_ID (n = 126).

  • Case 1: there is only one data point in one of the series; if it is compatible, then merge to the alternative series, otherwise remove it from all_areas_nuseds.

  • Case 2: One of the series is 100% duplicated with the other one: to remove from all_areas_nuseds.

  • Case 3: the rest of the series are merged; adding the values when data points are in conflict.

3.2 Attribution of the geographic regions to CUs

We matched the CUs in the cleaned NuSEDS data with the CUs defined in the Pacific Salmon Explorer, which are associated with a given region (PSE; https://salmonexplorer.ca). We used the fields POP_ID, FULL_CU_IN and CU_NAME. There are 23 CU_NAME not matching to any CUs in the PSE, corresponding to 87 populations and 2256 data points. These time series are kept.

3.3 Managing NA and zero counts

For the analysis presented in the main text, we excluded data points where MAX_ESTIMATE = NA (n = 156,507) or 0 (n = 3,449) and we present here in the appendix the results of the analyses done on the cleaned NuSEDS dataset where zeros are kept.