A data-driven approach to identifying PFAS water sampling priorities in Colorado, United States

The purpose of this prediction map was to inform a first-round prioritization approach that targets areas of Colorado with higher risk of PFOS and PFOA contamination and determine information that could be collected to improve the predictive power of future mapping work. From 2023 to 2025, small (between 3000 and 10,000 people) and large (>10,000 people) public water systems are required to test for PFAS during implementation of UCMR 5 [47]. This leaves over 1800 public water systems (including TNCs and NTNCs) without a testing requirement in Colorado and does not include testing of private wells [49]. Further, in 2024 the USEPA released finalized maximum contaminant levels for six PFAS, including PFOS and PFOA [50, 51]. This rule requires sampling for PFAS by public water systems but will not impact private wells or TNCs. The lack of information for private well users means that PFAS risk may continue unchecked.

Recommendations

To address these data gaps, we recommend a particular focus on smaller systems serving rural areas, mobile home parks, and schools that provide water to vulnerable or disproportionately impacted populations. We also recommend a focus on census blocks that have a high proportion of private well users and are considered DI communities.

To maintain focus on vulnerable populations, we have prioritized schools and mobile home parks for sampling and impact assessment efforts. While water systems that serve schools are considered to be NTNC systems, students and employees may spend a majority of their waking time at school, where they may consume significant amounts of drinking water. The lack of information on the temporal relationship between PFAS exposure and the onset of observable symptoms obscures the causal links. However, children are more likely to experience higher exposure per body weight than adults, and studies have found they often carry a higher PFAS body burden than adults [11, 14]. Further, many health effects associated with PFAS exposure have been shown to manifest during adolescence [11, 14, 52].

Mobile home parks often are commonly located in DI communities, providing housing for lower-income and socially vulnerable individuals. Mobile home parks often do not have the resources to provide adequate environmental services, including drinking water, storm water and wastewater drainage [53]. A nationwide study found that living in a mobile home park was negatively associated with water service reliability [54]. Further studies in California observed that mobile home parks were more likely to incur violations for health-based standards than their non-mobile home counterparts [55]. To address some of these concerns, Colorado passed a Mobile Home Park Water Quality bill in 2023 which created a water testing program, which may include testing for PFAS, for mobile home parks [56].

This work suggests that TNCs and NTNCs should be prioritized as sampling sites. While TNCs and NTNCs are often overlooked in sampling programs due to the limited exposure duration for many who utilize them, people may use these systems for their primary or secondary water source throughout the year. Key examples include employees and students who drink from water systems at churches and schools. We therefore include these system types, alongside very small public water systems (<3000) in our sampling list if they are in an area predicted to have elevated contamination risk. Care should be taken to prioritize sampling of these systems not only by predicted risk levels and DI community status, but also by assessing impacts to potentially vulnerable populations including infants and young children, and people who are pregnant or planning to become pregnant, or currently breastfeeding (e.g., evaluate age structure of community served or metrics such as proportion of people participating in the Special Supplemental Nutrition Program for Women, Infants, and Children [i.e., WIC]).

Because there are limited resources to sample private wells, resources should be focused on DI communities with a high proportion of private well users whose groundwater may be at risk of PFAS contamination. Private well owners in DI communities may not have the resources to regularly test and treat their water. CDPHE may also have resources to connect well-owners to free or reduced-cost filtration options through its emergency assistance program [19]. Therefore, a one goal of this effort was to highlight areas with high densities of private wells for targeted outreach to increase enrollment into the CDPHE’s PFAS sampling programs. Because private wells are spatially dispersed throughout Colorado [57]; increased well testing will also help to address important data gaps.

Finally, this report suggests attention be given to specific source types where limited information is available on PFAS releases. Records of PFAS use or release are absent because over decades of use these substances were not subject to regulation under various environmental regulations that monitor and regulate releases. The lack of clear patterns in variable importance for most potential PFAS source types points to a need for additional source investigation. CDPHE may explore its authority to investigate various PFAS source types. In 2018, CDPHE added PFOS and PFOA to its state hazardous constituent list. This gives CDPHE the authority to monitor for and address PFOS and PFOA at facilities subject to corrective action under the Resource Conservation and Recovery Act (RCRA). Other sources could be investigated indirectly by sampling groundwater from private wells and/or surface water likely influenced by those sources [19, 58]. At a national level, changes to Toxic Release Inventory reporting to remove the de minimis exemption, as well as designation of PFOS and PFOA as hazardous substances under Comprehensive Environmental Response, Compensation, and Liability Act (CERCLA), will improve accuracy of information available for future releases of PFAS; however, gaps will persist without investigation of historic or ongoing releases at potential sources [50, 59]. Improved understanding of release volumes of PFAS containing substances at each source location may also be used in the future as a way to rank and prioritize potential sources.

In future model iterations where the variable importance ranking provides better understanding of the predictive power of various PFAS sources, it will be important to evaluate the spatial structure of the data to consider the effects of spatial autocorrelation. Additionally, as we move towards more consistency in analytical detection capabilities and the number of PFAS analyzed, future work could incorporate time-trend analysis to assess changes in PFAS contamination patterns over time.

Model strengths

Some strengths of this model include the ability to incorporate and auto-calculate spatial factors (e.g., distance to source) into the prediction model, employ numerous and varied explanatory variables, assess which factors are most predictive of PFAS contamination through the model’s variable importance function, validate many different model iterations with varied input specifications before running the final prediction model, and visually present results in a manner that facilitates understanding and decision-making. This analytical approach has been demonstrated to be effective in projects with similar objectives [22, 24, 60,61,62], including a recent study that evaluated the effectiveness of random forest classification in comparison to logistic regression for predicting PFAS contamination in private wells in New Hampshire [23]. The authors also found that random forest classification (performed in R rather than ArcGIS Pro), performed better than logistic regression across all five PFAS evaluated [23].

Random forest classification is a particularly effective analytical tool for this project due to its ability to develop predictive maps using many different covariates without an assumed linear relationship with the outcome variable, as is the case with the data used here. Further, random forest classification has no assumptions about normalcy, yielding a model that can effectively handle non-parametric, ordinal, and categorical data [41].

Model limitations

Limitations largely stem from data sources themselves, including high LoDs in some portion of the samples, preferential sampling of PFAS at contaminated sites and across Colorado’s Front Range, inconsistency in the number of PFAS analyzed per site, and limited knowledge of occurrence, magnitude, and duration of PFAS releases at most point sources.

With respect to the range of LoDs, there are 51 samples (4% of training data) included in the training dataset which are non-detect but classified in the moderate category rather than the low category due to high LoDs and the substitution method we employed in this work. It is possible that this biases the model predictions high near some sources. For example, one dataset collected in Frisco, Colorado has LoDs of 20 ng/L and 10 ng/L for PFOA and PFOS, respectively. Eleven samples were collected in this area because of nearby sampling results indicating PFAS contamination. All eleven samples in this area came back below detection, but in the training dataset are classified as moderate risk. We do not have extensive data in this area, which is near a ski resort. This could bias the model towards the ski resort source. More information needs to be collected in areas like this to verify results for subsequent model iterations.

On the other hand, 27 of the 51 high LoD samples were collected in El Paso County in relation to investigations of AFFF release from a military base. We have hundreds of samples from this area indicating widespread contamination. The LoDs for these samples are higher compared to other datasets because they were collected in the earlier years of PFAS investigation (2016 and 2018). Given known contamination in this area, it is not as likely that the classification for these 27 samples biases model predictions. Finally, changing LoDs and differences between regulatory quantitation limits and laboratory methods have complicated our ability to assess risk in the past. As we continue to collect data via standardized methods for PFAS with increasingly lower LoDs, we will improve risk classification models and decision-making.

Random forest classification is more effective at handling preferential sampling than alternative methods [61]. It is important to ensure the training dataset (75% of the PFAS sampling results) represents adequate geographic spread and has similar proportions in each category as the full dataset. To account for differences in PFAS results data (number of PFAS analyzed and differences in detection limits) we decided to move forward with a simple aggregation of PFOS and PFOA. While this limits our ability to predict the full spectrum of PFAS contributing to contamination, it enables us to tease out point sources or geographic features that contribute to commonly detected PFAS. While there are some differences in the sources of PFOS and PFOA, there are also many similarities and these two PFAS are often found together. In 92% of the training data sampling points the detection status of PFOS and PFOA are the same. In other words, only in 8% of the samples was PFOA detected and PFOS not detected or vice versa. As Colorado continues to collect data with lower LoDs and an expanded suite of PFAS we would work to rerun this model for separate PFAS types to better understand source signatures, fate, and transport in Colorado. Finally, while a continuous-scale analysis may be more informative for identifying water systems at highest relative risk, the current data do not suggest benefits of performing regression analysis with continuous values over the type of classification we employed. The merits of classification are further supported in a review of machine learning models to predict potential groundwater contamination, which determined that models generally performed better with classification than regression [63]. While the data used in this work was better suited for classification analysis, whether to run classification or regression is dependent on the type and extent of available data.

Additional information on source types could be useful for better refining the model in the future. For example, landfills likely have a different risk of PFAS contamination based on factors such as age of waste and how the landfill is constructed [64,65,66]. Further, the depth to water table data set could be improved to better approximate well depth across Colorado and potentially improve its predictive capability. We do not have or know of a reliable method for evaluating well depths at the statewide scale. The high variability of aquifer types and characteristics across Colorado (fracture flow, alluvial, confined, unconfined, etc.), fluctuating and depleted aquifers without water-level monitoring data, and significant variability in transmissivity, additionally introduces complexity that becomes a challenge at the statewide scale. Moving forward, we have interest in undertaking a similar spatial/modeling exercise at more localized or regional scales that incorporate well depth or “uppermost groundwater aquifer” depth in heavily studied and understood aquifer systems with time-series monitoring data of water table depth. Finally, information on some PFAS source types and some potentially useful environmental predictors (e.g., groundwater age) [67], was not available for inclusion in the model.

Conclusions and future work

This map represents the first iteration of this work, which we will develop further as new data become available. The primary goal of this modeling effort was to identify data gaps and drive prioritization for subsequent rounds of PFAS sampling. With its framework in place, the CDPHE will re-run this model with larger and more comprehensive training data on an annual basis. Additional data with more consistent and lower LoDs will allow for an assessment that is more refined and improve model predictions. Further, a better understanding of releases from potential PFAS sources, along with targeted sampling in higher risk areas, will assist resource allocation efforts and improve our big-picture understanding of who is at greatest risk for PFAS exposure and health effects in Colorado.

留言 (0)

沒有登入
gif