6 Results in context and beyond: What do the data signify?
This section addresses interpretation of results within the context of public health. In the previous section, we focused on analysis and formal results, primarily in terms of mathematical assumptions and interpretations. In this section, we examine how authors place their data, methods, and results into the context of public health, including discussion and recommendations.
6.1 Interpret effects and their significance
Calculated values, including values derived from models, should be interpreted in their public health context. First, what is the clinical or public health significance of a measured or derived magnitude of an effect? What does the effect size or contrast mean in public health practice? Second, while taking into account the variability or uncertainty that comes with data, how compatible is the effect size with a reference hypothesis about the world? Because the effect measure is estimated from data, the analyst can only say that the data are more or less compatible with the hypothesis of interest, not whether the hypothesis itself is true or false.
Principles: Measured effects of interest should foremost be described in terms of their public health (or clinical) significance and the variability or uncertainty that comes from using observed data. There’s an inherent tension between stating findings and justifying interpretations on one hand and avoiding underinterpretation and overinterpretation on the other.
- Interpretation of an effect must match the method(s) used to measure or derive the effect.
- Always describe sources of uncertainty, including random variation and nonrandom influences on observation and measurement.
- Statistical significance, if presented at all, should be used only to assess the compatibility of results with a background hypothesis, mindful of additional assumptions, potential biases, and variability.
Observations:
- Report mm6903a1 (Peterson et al. 2020) relies exclusively on statistically significant differences above a population threshold to identify industry segments with higher than average proportions of suicides among civilian, noninstitutionalized working persons aged 16–64 years. The report states but does not further interpret the magnitude of the highest values.
- Report mm6906a3 (Divers et al. 2020) limited the interpretation of results, which rely heavily on statistical significance. The following sentence in the discussion overinterprets changes that are not statistically significantly different from 0: “Since 2012, the rate of increase in type 2 diabetes has not changed, and has also remained constant for type 1 diabetes, except among Asians and Pacific Islanders.” It is not clear whether this statement pertains to changes from 2012 to 2015 or contrasts between changes during 2002–2010 and 2011–2015. In either case, failing to reject a slope of 0 does not generally imply that the slope is 0.
- Reports mm7006e2 (Joo et al. 2021) and mm7010e3 (Guy et al. 2021) present associations between mask-mandate policies and aggregated incidence values. Although the regression model that is used in both reports is attested in the literature, the analytic method does not admit a straightforward interpretation from which public health implications can be inferred.
- Report mm7010e4 (Kompaniyets et al. 2021) fits logistic models with fractional polynomial terms to pick up nonlinear associations between body mass index and the risk of 4 possible outcomes. The report interprets the estimated probability of each outcome (conditional on specific BMI values) as risk, and it uses marginal-effect methods to estimate predictive margins, which are interpreted as risk ratios. The methods and interpretation appear to be sound, with 3 considerations: (1) The finding of nonlinear associations through fractional polynomials is notable. The authors perform limited comparison to other functional forms for BMI, thus overinterpreting the specificity of the functional form. (2) In particular, they overinterpret the estimates for BMI values with the least risk and present those estimates without a sense of their variability (unlike the paper on which the methods are based [Wong, 2011]). (3) The use of predictive margins is uncommon in the MMWR, especially risk ratios (rather than odds ratios) derived from logistic regression; readers would therefore benefit from more guidance or references on the method and interpretation.
- Report mm7043e2 (Xu et al. 2021) analyzes mortality risk associated with Covid-19 vaccines by comparing 6.4 million vaccine recipients to 4.6 million unvaccinated persons. The results indicate lower mortality among all vaccinated age groups ≥18 years, where all 95% pointwise confidence interval upper limits for adjusted relative risks are 0.82 or less. Among 12-17–year-old mRNA vaccine recipients, however, the relative risks are 0.85 and 0.73, and the upper limits of 95% CIs are 1.90 and 1.64. Therefore, when the report asserts that “there is no increased risk for mortality” and that “[t]his finding reinforces the safety profile”, these claims should be limited to adults. The results in adolescents are subject to too much variability to support the strong claims.
- Report mm7121e1 (Bull-Otterson et al. 2022) analyzes data in which persons under analysis have follow-up that varies in duration between 30 and 365 days. Thus the method yields cumulative incidence and does not admit a straightforward interpretation in terms of “absolute risk difference”, although the report frames it in that way. With additional assumptions, the analysis could estimate the risk difference, but this report does not state or apply such assumptions.
Recommendations:
- Reports should, by default, interpret the public health significance of the main reported values. Exceptions should be justified, with an explicit note that the public health significance remains to be determined.
- P-values and CIs should be interpreted in the context of public health significance. It is insufficient to leave all inferential procedures to be interpreted on their own.
- Avoid misleading implications by qualifying whether a statement comprises a comparison of observed magnitude or a formal inferential procedure.
- “Avoid nontechnical uses of technical terms in statistics, such as ‘random’, ‘normal’, ‘significant’, and ‘expected’.” (Bailar and Mosteller 1988) The word “significant” and its variants and negations should always be qualified, at least on the first occurrence within a given context, as denoting statistical, clinical, public health, or other significance.
6.2 Describe potential impact to the extent possible
By design, MMWR full reports are often explicitly connected to assessing public health impact or recommending specific interventions through policy, education, vaccination, and other actions. Yet the nature of observational or incidental data especially complicates analyses for both descriptive and causal inference without additional, explicit assumptions. Causal concepts can be supported by, but almost never flow directly from, statistical analysis.
Principles: Reports should approach causal claims carefully and explicitly, with particular attention to assumptions and subject-matter context that go beyond statistics. It is especially important to relate clearly the inferential support from the report’s analyses to all implicit or explicit claims or recommendations regarding public health actions and impact, and to avoid both understating what would need to be known and overstating the value of evidence presented.
A multinational, systematic assessment of causal and associational language in observational health research (Haber et al. 2022) screened 1,170 articles from 18 high-profile journals. In their discussion, the authors wrote the following (p 2094-2095):
The practice of avoiding causal language linking exposures and outcomes appears to add little if any clarity. … Misalignment between the research question being asked and action implications is on its own a source of confusion, which could be avoided if the causal nature of the research question were made explicit. … Authors, reviewers, and editors should focus on being clear about what questions are being asked, what decisions are being informed, and the degree to which we are and are not able to achieve those goals.
Observations: We evaluate 4 (slightly overlapping) groups of reports on the ways that they handle explicit or implicit causal concepts: vaccine effectiveness, changes over time, expressions of causal intent, and linking reports to public health recommendations. Reports commonly describe the current state of a public health phenomenon or its state over time for one of a few purposes: to determine whether, when, and how to apply existing guidance; to set a baseline for further monitoring, especially with emerging conditions like Covid-19; or to monitor phenomena relative to existing knowledge. In some cases, reports help to set, affirm, or revise guidance.
- Vaccine effectiveness: Among the reports reviewed here, 7 assess vaccine effectiveness, or the difference in untoward health outcomes between those who have received the prescribed vaccine regimen and those who have not, usually interpreted in the context of phase 3 randomized controlled trials. These 7 reports cover influenza (mm6806a2 (Doyle et al. 2019)) and Covid-19 (mm7011e3 (Britton et al. 2021), mm7013e3 (Thompson et al. 2021), mm7018e1 (Tenforde et al. 2021), mm7032e3 (Moline et al. 2021), m7037e1, mm705152a2 (Lutrick et al. 2021)) vaccines, using 5 different study designs: prospective cohort (mm7013e3 (Thompson et al. 2021) and mm705152a2 (Lutrick et al. 2021) apply hazard regression methods), retrospective cohort (mm7011e3 (Britton et al. 2021) applies hazard regression), case-control or “test-negative” (mm6806a2 (Doyle et al. 2019) and mm7018e1 (Tenforde et al. 2021) apply logistic regression), hybrid cohort and ecologic or surveillance or the “screening method” (mm70323 applies Poisson regression), and fully ecologic or surveillance (mm7037e1 (Scobie et al. 2021) uses empirical estimation methods). These reports show consistent care in addressing possible uncontrolled confounding, limited sample sizes (where applicable), consistency with signals from phase 3 trials, and acknowledgment of the relative strength of evidence conferred by each observational study design.
-
Changes over time: Here we focus on 8 reports that explicitly
characterize results from before and after a reference event.
- The 5 reports that apply a version of piecewise linear spline regression
can be distinguished by whether the location of a change point was
preselected (e.g., to capture a change in policy or practice at a
specified time) or data-driven (e.g., to help describe temporal changes)
and whether the modeled relationship at changepoints was continuous or
discontinuous. The 3 reports that assessed policy changes (mm6802a1 (García et al. 2019),
m6947e2 [“segmented regression”], and m7110e1 [“comparative interrupted
time series”]) used preselected change points; the latter 2 reports
allowed for and found discontinuities, while mm6802a1 (García et al. 2019) required
continuous relationships. The other 2 piecewise spline reports (mm6906a3 (Divers et al. 2020)
and mm6927a4 (Waltzman et al. 2020)) used the joinpoint method to find data-driven change
points, producing descriptive models that could be straightforwardly
interpreted.
Report mm6927a4 (Waltzman et al. 2020): “… it is difficult to tell whether decreases in injuries result from interventions, decline in participation, or a combination of both. … it cannot be determined whether the observed changes in the number of ED visits resulted from an actual change in incidence, care-seeking behaviors, or other reasons.”
Report mm6947e2 (Van Dyke et al. 2020): “After implementation of mask mandates …, the increasing trend in COVID-19 incidence reversed. … [Mandated] counties … appear to have mitigated the transmission of COVID-19, whereas [nonmandated] counties continued to experience increases in cases. … The decreased COVID-19 incidence among mask-mandated counties in Kansas occurred during a time when the only other state mandates issued were focused on mitigation strategies for schools as they reopened in mid-August. … the ecologic design of this study and limited information on [behaviors and enforcement] limit the ability to determine the extent to which the countywide mask mandates accounted for the observed declines in COVID-19 incidence in mandated counties. … countywide mask mandates appear to have contributed to the mitigation of COVID-19 spread in Kansas counties that had them in place.”
Report mm7110e1 (Donovan et al. 2022): “… this was an ecologic study … the pre- and postimplementation of mask policy analysis in a subset of 26 school districts could not separately investigate the impact of full and partial mask policies because of small sample sizes. … This investigation indicates that school mask policies were associated with lower COVID-19 incidence in areas with moderate to substantial community transmission.” - The other 3 reports focused on empirical or model-based contrasts with
percentages (m7010e3) or incidence rates (mm7037e1 (Scobie et al. 2021) and m7039e3) at a
reference time point. Report mm7010e3 (Guy et al. 2021) applies a weighted least squares
model to calculate differences, with pointwise confidence intervals,
before and after specified county-level policy changes. Report mm7037e1 (Scobie et al. 2021)
presents age-standardized incidence values and patterns before and after
predominance of the delta variant, stratified by vaccination status.
Report mm7039e3 (Budzyn et al. 2021) contrasts empirical incidence values before and after a
change in policy.
Report mm7010e3 (Guy et al. 2021): “In this study, mask mandates were associated with reductions in COVID-19 case and death growth rates within 20 days, whereas allowing on-premises dining at restaurants was associated with increases in COVID-19 case and death growth rates after 40 days.”
Report mm7039e3 (Budzyn et al. 2021): “this was an ecologic study, and causation cannot be inferred …. The results of this analysis indicate that increases in pediatric COVID-19 case rates during the start of the 2021–22 school year were smaller in U.S. counties with school mask requirements than in those without school mask requirements.”
- The 5 reports that apply a version of piecewise linear spline regression
can be distinguished by whether the location of a change point was
preselected (e.g., to capture a change in policy or practice at a
specified time) or data-driven (e.g., to help describe temporal changes)
and whether the modeled relationship at changepoints was continuous or
discontinuous. The 3 reports that assessed policy changes (mm6802a1 (García et al. 2019),
m6947e2 [“segmented regression”], and m7110e1 [“comparative interrupted
time series”]) used preselected change points; the latter 2 reports
allowed for and found discontinuities, while mm6802a1 (García et al. 2019) required
continuous relationships. The other 2 piecewise spline reports (mm6906a3 (Divers et al. 2020)
and mm6927a4 (Waltzman et al. 2020)) used the joinpoint method to find data-driven change
points, producing descriptive models that could be straightforwardly
interpreted.
-
Causal intent: Several reports directly address an intention to
corroborate causation or to explicitly disavow a causal interpretation.
Others address causal ideas indirectly through discussion of effect
modification, evidence, or propensities. These reports typically distinguish
results that are consistent with causal inferences from evidence of
causation.
Causation: Report mm7018e1 (Tenforde et al. 2021): “Postmarketing observational studies are important … to strengthen evidence from clinical trials of vaccine efficacy. … the case-control design infers protection based on associations between disease outcome and previous vaccination but cannot establish causation.”
Report mm7039e3 (Budzyn et al. 2021): “… this was an ecologic study, and causation cannot be inferred.”
Report mm7024e1 (Yard et al. 2021): “… this analysis was not designed to determine whether a causal link existed between these trends and the COVID-19 pandemic.”-
Effect modification: Report mm7047e1 (DeSisto et al. 2021) states, “Effect modification by period was assessed using adjusted models with interaction terms.” This statement amounts to a causal claim; whereas “interaction” is a statistical concept, “effect modification” is a causal interpretation of the modeled interactions. The report otherwise takes care to describe associations rather than causal links. The interaction of interest contrasts adjusted relative risks (rate ratios from a loglinear model): 1.47 during the pre-delta period and 4.04 during the delta period, with a P-value < 0.001 for the contrast; the report does not, however, quantify or interpret this contrast (about 2.75).
Corroborate, reinforce: Report mm705152a2 (Lutrick et al. 2021): “The VE estimates described in this report for the Pfizer-BioNTech vaccine in real-world conditions during the period of Delta variant predominance corroborate and expand upon the VE estimates from other recent studies in adolescents and reinforce previous findings that current vaccination efforts are resulting in substantial preventive benefits among adolescents aged 12–17 years.”
-
Linking to recommendations: In their discussion sections, reports
typically place results in the context of previous understanding, practice,
and policy. Among the reports reviewed here, 27 reports use words like
highlight, reinforce, and underscore (to say nothing of similar words
used in these and other reports, like demonstrate or show). These
linking terms connect each report’s subject to related recommendations. It
is often unclear, however, whether these linking words are meant to convey
that a report provides evidence for the cited guidance or only that the
guidance relates to the subject of the report. We highlight 3 examples:
- Report mm6943e3 (Kambhampati et al. 2020): “These results are consistent with previously reported data suggesting that underlying conditions, including obesity, diabetes, and cardiovascular disease, are risk factors for COVID-19–associated hospitalization and ICU admission. … The findings in this report highlight the need for prevention and management of obesity … to reduce risk [among health care personnel] for poor COVID-19–related outcomes. … It is unknown whether HCP were exposed to SARS-CoV-2 in the workplace or community, highlighting the need for community prevention efforts as well as infection prevention and control measures in health care settings.”
- Report mm7010e4 (Kompaniyets et al. 2021): “These findings highlight the clinical and public health implications of higher BMIs, including the need for intensive COVID-19 illness management as obesity severity increases … and policies to ensure community access to nutrition and physical activities that promote and support a healthy BMI. … These results highlight the need to promote and support a healthy BMI, which might be especially important for populations disproportionately affected by obesity, particularly … populations who have a higher prevalence of obesity and are more likely to have worse outcomes from COVID-19 compared with other populations. The findings in this report highlight a dose-response [sic] relationship between higher BMI and severe COVID-19–associated illness and underscore the need for progressively intensive illness management as obesity severity increases.”
- Report mm705152a3 (Wanga et al. 2021): “Among pediatric patients with COVID-19–related hospitalizations, … few vaccine-eligible patients hospitalized for COVID-19 were vaccinated, highlighting the importance of vaccination for those aged ≥5 years and other prevention strategies to protect children and adolescents from COVID-19, particularly those with underlying medical conditions. … Approximately two thirds of patients hospitalized for COVID-19 aged 12–17 years had obesity. Compared with patients without obesity, those with obesity required higher levels and longer duration of care. These findings … highlight the importance of obesity and other medical conditions as risk factors for severe COVID-19 in children and adolescents. … this study demonstrates that unvaccinated children hospitalized for COVID-19 could experience severe disease and reinforces the importance of vaccination of all eligible children … These data highlight the importance of COVID-19 vaccination for those aged ≥5 years and other prevention strategies to protect children and adolescents from COVID-19, particularly those with obesity and other underlying health conditions.”
Recommendations:
Take care with implicit and explicit causal claims, including recommendations for public health interventions or other public health action.
- State and address assumptions that are intended to support causal claims, especially from nonexperimental data. These assumptions, such as assumptions regarding confounders and mediators, often require careful justification because they cannot be directly tested.
- Time-structured analyses, such as joinpoint, interrupted time series, and other time series methods, are especially prone to potentially specious causal claims. Pay special attention to efforts to capture, describe, and draw inferences from changes over time.
- Be especially mindful of words that do or do not imply causal relationships:
- Association and trend assert quantifiable relationships without causal interpretation.
- The following words or phrases typically convey a causal interpretation: protect, harm, effective, increase/decrease risk, impact.
- Take care with interaction (a statistical concept) and the related but distinct effect modification (a causal concept).
- When linking to related but distinct information, be clear about whether or not the report substantively supports or refutes the outside information. Otherwise, it can appear that the link implies substantive support, whether that is warranted or not.
6.3 Place interpretation in context, including limitations
MMWR full reports typically include a paragraph that enumerates limitations, in part to establish transparency, and in part to help the reader avoid overinterpreting results and potential implications.
Principles: Enumerated limitations should allow a reader to contextualize the interpretations of findings, including a sense of how the interpretations could differ if the setting or assumptions varied from those in the report.
Observations: The overwhelming majority (about 95%) of weekly reports contain a sentence worded as follows: “The findings in this report are subject to at least [number] limitations.” Over the period under review in this report, 731 weekly reports (not just full reports) enumerated at least 1 limitation, with 104 (14%) listing 6-9 limitations. Among the 56 full reports under review, 55 report at least 1 limitation (all except report mm6841e3 (Siegel et al. 2019), a guidance document), and 11 (20%) list 6-9 limitations.
We summarize limitations from selected reports to demonstrate the range and variety.
- Report mm7024e1 (Yard et al. 2021) lists 9 limitations related to representativeness, variation across sites and time, underreporting and incompleteness, and “this analysis was not designed to determine whether a causal link existed between these trends and the COVID-19 pandemic.”
- Reports mm7004e3 (Falk et al. 2021), mm7011e3 (Britton et al. 2021), mm7047e1 (DeSisto et al. 2021), and mm7104e1 (León et al. 2022) list 7 limitations each.
- Report mm7001a4 (Leidner et al. 2021) is 1 of 6 reports listing 6 limitations, including misclassification, scope, and generalizability across geography, student populations, and time.
Recommendations:
- Ideally, a limitation would be accompanied by the expected implication of that limitation for interpreting the results under discussion. This is especially true for limitations related to bias, unmeasured confounders, and causal claims. The examples above often, but not always, include such possible implications.
- Ensure that limitations pertain to the reported data, methods, results, and interpretation and do not attempt to imply conclusions that are otherwise not supported by the particular study design or sample selection procedures.
- Rather than enumerating each distinct limitation statement, consider grouping limitation statements conceptually, such as those pertaining to potential bias or that might limit generalizability.
- If the number or weight of limitations is substantial, narrow the scope of the report’s claims.