Integrating Public Health Surveillance and Environmental Data to Model Presence of Histoplasma in the United States

Histoplasmosis is an infection caused by the inhalation of the environmental fungus Histoplasma capsulatum that ranges from asymptomatic infection to life-threatening disseminated disease.1 In the United States, histoplasmosis has traditionally been associated with the areas around the Ohio and Mississippi River Valleys, a finding established by nationwide skin test reactivity surveys conducted during the 1940s and 1950s.2 However, cases acquired far outside these areas suggest that histoplasmosis is more widespread than originally thought.3–5

Furthermore, public health surveillance data, although subject to underdetection of cases and limited to only a dozen states that require reporting of histoplasmosis, indicate that cases routinely occur in North Central states not previously considered to be endemic.6,7

Our current population-level understanding of histoplasmosis is primarily based on passive public health disease surveillance data. Before 2016, there was no standardized national case definition for histoplasmosis, leading to variation in definitions across states (See eAppendix Table 1; https://links.lww.com/EDE/B927). Since then, the case definition has been standardized.7 Despite these advances, surveillance remains limited in geographic scope and in the level of detail of the data collected.8 Additionally, understanding the spatial risk of histoplasmosis is complicated by challenges with detection.9 For example, histoplasmosis is likely under detected and frequently misdiagnosed because the signs and symptoms can be similar to those of other common respiratory illnesses.7,10 A prior study developed a suitability score for H. capsulatum based on environmental characteristics but did not account for under detection of reported histoplasmosis cases.11 Ignoring under detection can result in biased statistical inference about the presence of H. capsulatum.12

We developed a spatio-temporal occupancy model to estimate the endemic region for histoplasmosis.13 The model relates reported histoplasmosis cases to latent, or unobserved, presence of H. capsulatum and accounts for imperfect detection in the reported cases. This represents a novel application of a common ecological model to a problem in environmental and infectious disease epidemiology. Our results provide an increased understanding of the areas with the highest probability of the presence of H. capsulatum. This information is essential for guiding both healthcare providers’ testing decisions and public health prevention strategies.

METHODS Data Sources

We used county-level data on histoplasmosis cases reported to health departments from 12 states (Alabama, Arkansas, Delaware, Illinois, Indiana, Kentucky, Michigan, Minnesota, Mississippi, Nebraska, Pennsylvania, and Wisconsin) during 2011–2014, the most recently available time period (Figure 1). This time period precedes the standardized definition defined in 2016. Monthly data were available from all states except for Delaware and Kentucky, which were reported yearly.7

F1FIGURE 1.:

Map of the 12 states included in our analysis. Counties shaded blue had at least one case of histoplasmosis reported to public health authorities during the 4-year study period.

Variables considered to help explain the environmental presence of the fungus included land cover characteristics, nitrogen levels, latitude and longitude, and elevation. To explain the presence or absence of histoplasmosis diagnoses, we considered cultivated crops, temperature, soil moisture, total population, and socioeconomic variables. The land cover characteristics were from the 2010 National Land Cover Database (NLCD), and for this analysis, we considered the proportion of each county with cultivated crops, the proportion covered with water, and the proportion that is undeveloped.14 We obtained soil nitrogen levels from the United States Geological Survey15 based upon estimated county-level farm and nonfarm nitrogen fertilizer use, in kilograms, from commercial fertilizer sales. Elevation was from the 2010 Global Multi-resolution Terrain Elevation Data (GMTED2010).16 We obtained temperature data from the National Oceanic and Atmospheric Administration (NOAA) National Centers for Environmental Information (NCEI) and was provided at the locations of the weather stations.17 We computed daily county-level temperature by averaging the stations within each county. For counties with no weather stations, we averaged the temperature of the adjacent counties. We generated monthly data by averaging the daily values. Surface soil moisture was from the National Integrated Drought Information System (NIDIS) supported by the Climate Prediction Center (CPC).18 The values estimated by the CPC soil moisture tool as part of the National Weather Service Global Forecast System is calculated using a one-layer hydrologic model19,20 that calculates soil moisture, evaporation, and runoff using as forcing observed precipitation and temperature. The data provides daily calculated soil moisture values in a 10 × 10 km pixel area, which were averaged to the county level by attributing the pixels to the respective county based upon the bounding box and scaled to the monthly level by averaging across the days. We used the American Community Survey 2014 5-year county-level estimates of total population, proportion with private health insurance, and proportion employed in agriculture to characterize county demographics.21

Statistical Model

We used a Bayesian occupancy model (full model specification in eAppendix; https://links.lww.com/EDE/B927) to relate the binary indicator of reported histoplasmosis cases to latent presence of H. capsulatum.13 By converting reported histoplasmosis case counts to binary indicators, we mitigate some of the reporting differences across states while retaining the ability to learn about the latent presence of H. capsulatum in the environment. This model accounts for the fact that presence of H. capsulatum does not always result in histoplasmosis infection, and even if histoplasmosis is contracted, it may go undetected (i.e., undiagnosed or unreported to public health authorities). Failure to account for the fact that zero cases may be reported even when there is exposure to H. capsulatum can result in biased statistical inference.12 By focusing on presence of H. capsulatum, we address a question of interest better suited to the available data that avoids bias due to imperfect detection that would be present in a more traditional count model for rates of disease.

For county i(i=1,  …  ,943) and month t(t=1,…,48), define Yit to be the binary indicator that at least one case of histoplasmosis was reported. Let Zi be an indicator of presence of H. capsulatum in county i. While histoplasmosis cases are reported monthly, we assume presence of H. capsulatum is constant over this 4-year period. This assumption is reasonable based on the short time frame under which data from this study were collected relative to known survival rates of H. capsulatum.22 The proposed occupancy model assumes that if H. capsulatum is absent Zi=0, then there are no reported cases of histoplasmosis in that county throughout the study period Yit=0forallt=1,…,T. However, if H. capsulatum is present (Zi=1), then there may or may not be any reported cases for any month within the study period (Yit=0 or Yit=1 for any given month t, where t=1,…, 48). To incorporate all data available, we let Yi(k) be the indicator that there was at least one reported case of histoplasmosis in county i during year k=1,  2,  3,  4, and we set Yi(k) equal to 1 if Yit=1 for any month within year k and 0 otherwise. This hierarchical model allows us to simultaneously model all counties regardless of reporting frequency, and we borrow strength from the counties with monthly observations to infer monthly outcomes for the remaining counties.

Define ψi≡P(Zi=1) as the probability of H. capsulatum presence in county i and pit=P(Yit=1|Zi=1) the detection probability, or the probability of there being at least one reported case of histoplasmosis in county i during month t given that H. capsulatum is present. We assume the following Bayesian occupancy model:

Yi(k)=1−∏t=12k−1112k(1−Yit)

Yit|Zi,pit∼Bernoulli(Zipit)

Zi|ψi∼Bernoulli(ψi).

For the counties where monthly data are available, Yit is observed. However, for the counties with yearly data, Yit is a latent binary random variable for each t. We assume a probit link for each probability, ψi and pit, and relate these quantities to environmental variables and spatial or temporal random effects.23,24 The probit link was chosen for computational considerations (see eAppendix Section 1.2; https://links.lww.com/EDE/B927 for more details). More specifically, we assume the probability of the presence of H. capsulatum relates to environmental covariates and a spatial random effect that accounts for our belief that H. capsulatum is more likely to be present in a county if it is present in neighboring counties. That is,

ψi=Φ(Xiα+ηi),

where Φ(⋅) is the cumulative distribution function of the standard normal distribution, Xi is a vector of standardized covariates related to exposure risk, α is a vector of regression coefficients, and ηi is a spatial random effect (see eAppendix; https://links.lww.com/EDE/B927 for detailed specification). We assume the probability of observing at least one diagnosed case of histoplasmosis, given H. capsulatum is present relates to land-use and socio-environmental covariates, as well as population size since more populous counties are more likely to have a detected case. The model for the detection probability includes a random effect to account for temporal autocorrelation within a county. That is,

pit=ΦWit β+vit,

where Wit is a vector that contains standardized covariates related to detectability, and β is a vector of regression coefficients. Note that Wit is specified to assume state-specific intercepts so that the intercepts represent a statewide average detection rate. We include the temporal random effect νit to account for the notion that detectability within county i during year t is likely related to detectability for county i during year t−1. Thus, we assume for t=1vit∼N(0,τ2) and for t=2,…,   T,   vit∼N(ρvi,  t−1,  τ2). The temporal random effects are assumed to be independent across space. For identifiability of the temporal random effect, we enforce a mean-zero centering constraint when t=1.

Our model is fit within the Bayesian framework. Details on the prior distributions and the Markov chain Monte Carlo (MCMC) algorithm used to simulate from the posterior distribution are in eAppendix Section 1.2; https://links.lww.com/EDE/B927. We initially included a larger set of explanatory variables and performed reversible jump MCMC to determine which variables had a relatively high posterior probability of inclusion in the model.25 See eAppendix Section 2.1; https://links.lww.com/EDE/B927 for details. The final model was then fit using the variables listed in Tables 1 and 2.

TABLE 1. - The Posterior Mean and 95% Credible Intervals for the Covariate Effects of the Probability of Presence of Histoplasma capsulatum Variable Estimate (CI) P (>0 Data) Intercept –0.0375 (–0.1373, 0.0667) 0.2480 Cultivated crops 0.1566 (–0.0460, 0.3453) 0.9520 Log farm nitrogen 0.2272 (0.0442, 0.4217) 0.9880 Log nonfarm nitrogen 0.6436 (0.5228, 0.7785) 1.0000 Log elevation –0.1922 (–0.3991, –0.0135) 0.0200 Latitude 0.1865 (0.0520, 0.3404) 0.9940 Longitude 0.1898 (0.0692, 0.3300) 1.0000 Land cover—water 0.1050 (0.0072, 0.2051) 0.9840

The last column is the posterior probability of the regression coefficient being positive.


TABLE 2. - The Posterior Mean and 95% Credible Intervals of the Covariate Effects for Diagnosed Cases of Histoplasmosis Given Presence of Histoplasma capsulatum Variable Estimate (CI) P (>0 Data) Cultivated crops 0.0829 (–0.0620, 0.2321) 0.8660 Soil moisture 0.0055 (–0.0880, 0.1159) 0.5440 Log population 0.7292 (0.5525, 0.9288) 1.0000 Log percent non-White 0.0040 (–0.1552, 0.1659) 0.5040 Agriculture 0.0290 (–0.1422, 0.1926) 0.6520 Private insurance –0.1247 (–0.2771, 0.0126) 0.0380

The last column is the posterior probability of the regression coefficient being positive.


RESULTS

We see strong and positive associations between the presence of H. capsulatum and log farm nitrogen soil content, log nonfarm nitrogen soil content, and the percent of the county covered by water (Table 1). We see a negative association with log elevation, indicating a lower probability of presence in areas of higher elevation. We estimated the presence of H. capsulatum for the 12 states for which we had histoplasmosis case data and for the states in the immediate surrounding area. We observe the highest estimated probability of the presence of H. capsulatum in the northern part of the study region consisting of the East North Central United States. We also observe high estimated probabilities along the Atlantic coastal plain. The probability increases moving northward and eastward through the study region (Figure 2A). As expected, we see higher standard errors in counties that are further from those in the observed data (Figure 2B).

F2FIGURE 2.:

Histoplasmosis results. A, Map of the estimated posterior probability of presence of Histoplasma capsulatum. B, Standard errors of the estimated posterior probability. C, Estimates and credible intervals of the state-specific intercepts for the probability of detecting a case of histoplasmosis, given H. capsulatum is present. D, County-level estimated detection posterior probability averaged over the 48 months. Note that the low estimate and wide variability for the average rate of histoplasmosis detection in Delaware shown in (C) is likely due to the fact that Delaware only has three counties with just two reported diagnosed cases throughout the study period.

The state-specific estimated intercepts and credible intervals for the detection probability are shown in Figure 2C. This figure shows that detecting cases of histoplasmosis varies from state to state, and the highest average estimated probability of detection was in Minnesota. Figure 2D shows the estimated detection probability for each county averaged over the 48 months, showing a large amount of spatial heterogeneity in histoplasmosis detection. Maps of the county-level detection probabilities for each of the 48 months can be found in the eAppendix; https://links.lww.com/EDE/B927.

Table 2 summarizes the posterior distributions for the detection probability regression coefficients. We estimate that more populated areas are more likely to have reported cases of histoplasmosis given the presence of H. capsulatum. We observed a negative relationship with the percentage of residents with private health insurance, and we see a moderately positive relationship with having a larger land area covered by cultivated crops.

DISCUSSION

We used a Bayesian occupancy model to relate monthly reported cases of histoplasmosis to the latent presence of H. capsulatum, while accounting for differences in case reporting frequency. Histoplasmosis is only reportable in a small number of states and often goes undetected. Thus, analyzing the case data without accounting for imperfect detection of histoplasmosis will likely underestimate the true presence and yield biased statistical inference. In addition, we quantified the relationship between geographical variation in the presence of H. capsulatum and environmental characteristics. By incorporating information about environmental covariates and accounting for spatial dependence, we were able to predict the probability of presence for states neighboring the 12 states for which we had histoplasmosis case data. We found that the northern Midwestern United States and the eastern Atlantic coastal region had the highest estimated probability of the presence of H. capsulatum, suggesting that the spatial risk of histoplasmosis has changed from what was originally described in the 1950s, which focused counties inside a triangle with points in central Ohio, southern Iowa, and northern Louisiana.2 This finding is consistent with previous work suggesting an expansion of the endemic areas.7,11 Specifically, the high probability of histoplasmosis detection in Minnesota is consistent with previous work and may reflect a strong surveillance system and a broad case definition at the time the data were collected.7

Although occupancy models are commonly used in ecological applications, they are less common in public health. However, imperfect detection is likely present in many public health outcomes due to gaps in reporting systems and undiagnosed or misdiagnosed cases. Failure to explicitly account for imperfect detection can result in biased inference since the geographic variability in disease presence and detectability are otherwise confounded. We have illustrated how this modeling framework can be applied to public health data to quantify spatial variability and identify geographic regions of increased risk.

Our finding of a positive association of histoplasmosis with soil nitrogen content is not surprising. H. capsulatum is well-known to thrive in soil enriched with high levels of nitrogen and phosphate.9,26 Future work could incorporate data that quantifies the geographical distribution of birds and bats (e.g., eBird27 and North American Bat monitoring program28). The association we observed between expected presence of H. capsulatum with cultivated crops also supports previous work showing that a high proportion of people with histoplasmosis live in rural areas and that people with jobs in the agriculture industry may be at increased risk.8 Similarly, historical evidence from skin test studies shows higher rates of histoplasmosis among people who lived on farms.29

One limitation of our approach is that we transformed the reported case data to a binary outcome and thus cannot describe rates of infection. Future work could model raw case counts to inform potential exposure risk. The reporting differences between states coupled with the large proportion of infections that go undetected make this challenging as the reported counts do not accurately reflect the true underlying incidence of histoplasmosis in a given county. Another limitation is the model’s inability to predict results for the entire United States, as we did not have evidence to determine whether or not the relationships with the included environmental variables hold outside of our study region. Last, future studies could also investigate host-related covariates. For example, persons 65 years old and older and those with immunosuppression are more likely to have severe histoplasmosis and are thus more likely to be diagnosed. By incorporating demographic information or information on underlying health status, the model’s detection process could be improved.

In conclusion, we have generated a map that quantifies the geographical distribution of endemic regions for histoplasmosis in a relatively large region of the United States, providing a better understanding of the areas where the presence of H. capsulatum is likely. A more comprehensive characterization of its spatial distribution is important for guiding clinical decision-making and public health surveillance and prevention activities.

ACKNOWLEDGMENTS

We thank state and county public health departments who collected the histoplasmosis case data.

REFERENCES 1. Kauffman CA. Histoplasmosis: a clinical and laboratory update. Clin Microbiol Rev. 2007;20:115–132. 2. Manos NE, Ferebee SH, Kerschbaum WF. Geographic variation in the prevalence of histoplasmin sensitivity. Dis Chest. 1956;29:649–668. 3. Benedict K, Thompson GR III, Deresinski S, Chiller T. Mycotic infections acquired outside areas of known endemicity, United States. Emerg Infect Dis. 2015;21:1935–1941. 4. Ashraf N, Kubat RC, Poplin V, et al. Re-drawing the maps for endemic mycoses. Mycopathologia. 2020;185:843–865. 5. Lockhart SR, Toda M, Benedict K, Caceres DH, Litvintseva AP. Endemic and other dimorphic mycoses in the Americas. J Fungi (Basel). 2021;7:151. 6. Reportable Fungal Diseases by State. 2021. Available at: https://www.cdc.gov/fungal/fungal-disease-reporting-table.html. Accessed 4 January 2022. 7. Armstrong PA, Jackson BR, Haselow D, et al. Multistate epidemiology of histoplasmosis, United States, 2011-2014. Emerg Infect Dis. 2018;24:425–431. 8. Benedict K, McCracken S, Signs K, et al. Enhanced surveillance for histoplasmosis—9 states, 2018–2019. Open Forum Infect Dis. 2020;7:ofaa343. 9. Benedict K, Mody RK. Epidemiology of histoplasmosis outbreaks, United States, 1938-2013. Emerg Infect Dis. 2016;22:370–378. 10. Benedict K, Kobayashi M, Garg S, Chiller T, Jackson BR. Symptoms in blastomycosis, coccidioidomycosis, and histoplasmosis versus other respiratory illnesses in commercially insured adult outpatients—United States, 2016–2017. Clin Infect Dis. 2021;73:e4336–e4344. 11. Maiga AW, Deppen S, Scaffidi BK, et al. Mapping Histoplasma capsulatum exposure, United States. Emerg Infect Dis. 2018;24:1835–1839. 12. Gu W, Swihart RK. Absent or undetected? Effects of non-detection of species occurrence on wildlife-habitat models. Biol Conserv. 2004;116:195–203. 13. Royle JA, Dorazio RM. Hierarchical Modeling and Inference in Ecology: The Analysis of Data From Populations, Metapopulations and Communities. Elsevier; 2008. 14. Multi-Resolution Land Characteristics (MRLC). Consortium National Land Cover Database (NLCD). 2021. Available at: http://mrlc.gov/data. Accessed January 2021. 15. Brakebill JW, Gronberg JM. County-Level Estimates of Nitrogen and Phosphorus From Commercial Fertilizer for the Conterminous United States, 1987-2012. U.S. Geological Survey Data Release; 2017. 16. U.S. Geological Survey Global Multi-Resolution Terrain Elevation Data 2010 (GMTED2010). Available at: http://usgs.gov/centers/eros/data-tools. Accessed January 2021. 17. National Centers for Environmental Information (NCEI), National Oceanic and Atmospheric Administration (NOAA). Available at: http://ncdc.noaa.gov/data-access. Accessed January 2021. 18. National Integrated Drought Information System (NIDIS), National Oceanic and Atmospheric Administration (NOAA). 2021. Available at: http://drought.gov/topics/soil-moisture. Accessed July 2021. 19. Huang J, van den Dool HM, Georgarakos KP. Analysis of model-calculated soil moisture over the United States (1931-1993) and applications to long-range temperature forecasts. J Clim. 1996;9:1350–1362. 20. Van den Dool H, Huang J, Fan Y. Performance and analysis of the constructed analogue method applied to US soil moisture over 1981-2001. J Geophys Res Atmos. 2003;108. 21. U.S. Census Bureau. American Community Survey 5-Year Estimates. American Community Survey; 2014. Available at: http://data.census.gov. Accessed January 2021. 22. Mahvi TA. Factors governing the epidemiology of Histoplasma capsulatum in soil. Mycopathol Mycol Appl. 1970;41:167–176. 23. Johnson DS, Conn PB, Hooten MB, Ray JC, Pond BA. Spatial occupancy models for large data sets. Ecology. 2013;94:801–808. 24. Hepler SA, Erhardt RJ. A spatiotemporal model for multivariate occupancy data. Environmetrics. 2021;32:e2657. 25. Robert C, Casella G. Monte Carlo Statistical Methods. Springer Science & Business Media; 2013. 26. Deepe GS Jr. Outbreaks of histoplasmosis: the spores set sail. PLoS Pathog. 2018;14:e1007213. 27. eBird. The Cornell Lab or Ornithology. Available at: http://ebird.org. 28. The North American Bat Monitoring Program. U.S. Geological Survey. Available at: https://sciencebase.usgs.gov/nabat/. NABat Status and Trends. 2021. Accessed May 2021. 29. Edwards LB, Acquaviva FA, Livesay VT, Cross FW, Palmer CE. An atlas of sensitivity to tuberculin, PPD-B, and histoplasmin in the United States. Am Rev Respir Dis. 1969;99(suppl):1–132.

留言 (0)

沒有登入
gif