3 Protocol variables

The second main step in our data pre-processing is to correct for protocol variables. Protocol variables are parameters that may influence compound level between sample collection and final concentration The most common example is the measurement batch. The correction method we use is described in (Mortamais et al. 2012) where a multiple linear regression⁴ is performed for each compound, between the compound level, the protocol variables and adjusted for confounders. And then the relevant model \(\beta\)s are substracted from the measured concentration

The different steps are hence:

Identify the protocol variables that might have an influence on the compound level
Identify confounders
Decide on which protocol variables to actually correct
Apply correction

3.1 Identification of protocol variables

Variables that we identified in our example data:

Time of transport from place of collection to storage freezer (transport_time)
Defreeze time before sample pooling (defreeze_time)
Measurement batch when measured at NIPH (batch)

3.2 Confounders

Using a DAG (not shown here) and the results shown in [phenols paper] we identified the following confounders between phenol level and protocol variables listed previously:

Specific gravity
Sample date (linear time trend)
Sample season (categorical)
Sampling pregnancy trimester (T1 vs T3)
Maternal education level (categorical)
Maternal age (continuous)
Maternal pre-pregnancy BMI (continuous)
Parity (continuous)

3.3 Selection of influent variables

Now for each compound we apply the following regression:

\[phenol = protocol\_variables + confounders\]

From this regression we extract Wald’s p to quantify the overall effect of the protocol variable. We will correct for all protocol variables where p < 0.2.

Table 3.1: p-values in the adjusted regressions
	batch	defreeze_time	transport_time
BPA	0	0.36	0.68
ETPA	0.06	0.93	0.29
MEPA	0.1	0.95	0.01
MMCHP	0	0.54	0.48
OXBE	0.06	0.12	0.73
PRPA	0.02	0.73	0.08
TRCS	0.42	0.41	0.9

3.4 Variable correction

Once the protocol variables to correct for are identified for each phenol (in red in Table 3.1) correction on the final values are applied using the (Mortamais et al. 2012) method where:

\[val\_cor_i = val\_crude_i - \sum_i\beta_{protocol\_var\ j} * (X_j^i - X_j^{ref})\] Where:

\(i\) represents the i^th individual
\(j\) represents the j^th protocol variable
\(X_j^i\) the value of the j^th protocol variable for the i^th individual
\(X_j^{ref}\) the chosen reference for protocol variable j (median for continuous variables and highest N category for categorical)

Example:

Figure 3.1: Benzophenone-3 levels per batch before and after correction

There still is strong residual variability between batches because phenols are known to vary a lot with season [papier phenols] and season varies strongly accross batch. Here is another example on MMCHP, a phthalate for which we used the same method:

Figure 3.2: MMCHP levels per batch before and after correction

Given that here phenol is the outcome, an interval censored regression such as R’s survival::survreg could be used where values below LOD are considered censored between 0 and LOD↩