A common question in epidemiology is whether disease cases tend to occur together in space and time at a higher risk rate expected based on the assumption of equal risk over space. An occurrence of such a spatial cluster could arise from proximity to an environmental exposure source, genetic susceptibility from familial clustering, or some shared and unmeasured risk factor (Waller and Gotway, 2004). Often, analyses that are conducted to investigate disease clusters use case-control studies, in which all reported disease cases from a disease registry and a set of population-based controls that represent the at-risk population are identified for inclusion in the study. In this type of study, a significant excess of disease cases compared with the number of controls in some geographic area would constitute a spatial cluster.
Several statistical methods have been developed to detect a spatial cluster and assess its significance in case-control studies. For example, generalized additive models (GAMs) have included spatial random effects, which can estimate risk over a smooth surface and permit inference through exceedance probabilities (Hastie and Tibshirani, 2017, Richardson et al., 2004). GAMs have been fit in both the frequentist (Wheeler et al., 2011) and Bayesian (Boyle et al., 2022) paradigms. Another method is Q-statistics, which can identify spatial clusters when participants have recorded residential histories (Jacquez et al., 2005, Sloan et al., 2015). This method requires an a priori choice for the number of nearest neighbors to evaluate when specifying a cluster size over space. A popular method is the local spatial scan statistic (Kulldorff, 1997, Kulldorff, 1999), which moves a series of circular windows varying in radius throughout the study region, comparing the number of cases and controls inside versus outside the circle according to a Bernoulli distribution. The test identifies the most likely cluster among all tested circles using a likelihood ratio statistic and performs inference through Monte Carlo simulations, comparing the observed likelihood ratio to those obtained from many random permutations of the case and control labels. A recent review of cluster analysis methods (Fritz et al., 2013) identified the local spatial scan statistic as being used frequently in the epidemiological literature and applied to a wide variety of data types (Andrade et al., 2004, Ernst et al., 2006, Meliker et al., 2009, Tanser et al., 2009, Wheeler, 2007).
While the methods described above are useful for disease cluster analyses, case-control studies in general could suffer from several types of biases that impact analyses including numerator bias due to issues with case ascertainment or diagnostic accuracy (Elliott, 2000, Wilcox et al., 1988), exposure misclassification due to measurement error or population migration (Elliott, 2000, Beale et al., 2008) or to geocoding error (Oliver et al., 2005), or selection bias. Perhaps one of the most important biases for disease cluster analysis, selection bias can occur when certain factors affect the likelihood that eligible individuals ultimately participate in the study (Tripepi et al., 2010). One important selection bias occurs from participant non-response, when factors such as socio-economic status, language barriers, or lack of financial investment in the study make eligible study participants less likely to participate (Chiu et al., 2017, Slusky et al., 2014). Overall, non-participation in case-control studies can affect both cases and controls, though a recent review has suggested that the problem is greater for controls than for cases in studies of occupational risk factors (Sritharan et al., 2020). A study of childhood leukemia had participation rates that were lower for controls than cases and lower among those controls with lower socioeconomic status (Slusky et al., 2014), and a case-control study of non-Hodgkin lymphoma (NHL) had a participation rate of 52% among eligible controls versus 76% for cases (Wheeler et al., 2011). In the NHL study the residential locations of non-participants at the time of selection were known and therefore a separate cluster analysis using non-participant locations as well was performed, allowing an evaluation of the effects of non-participation on detection of an NHL cluster in that specific situation (Wheeler et al., 2011, Shen et al., 2008). However, there are very few such examples of using the non-participant locations in spatial cluster analysis. While the effect of selection bias on the estimates of odds ratios for exposures has been studied and had methods developed to adjust for it (Geneletti et al., 2009), and some sampling weights for non-response in stratified survey data have been developed (Chen et al., 2014, Watjou et al., 2019), the problem of non-participation and its impact in spatial cluster detection has not yet been assessed in a comprehensive manner.
While understudied, it is plausible to expect that study participation is not uniform over space, given the heterogeneous distribution of demographics and resources due to factors such as social inequality, segregation, immigration and housing policy (Pillas et al., 2014, Wright et al., 2014, Power, 2012). Thus, if non-participation is concentrated in some geographic area due to a factor, such as socio-economic status, then a spatial cluster study could reflect this bias, possibly leading to erroneous conclusions regarding a spatial cluster of disease. In particular, if eligible controls in an area of low socio-economic status (but also of no excess spatial risk) are less likely to participate than controls in other areas, such a geographically concentrated absence of controls may increase the possibility of an artificial spatial cluster. As a hypothetical example, an individual in a lower-income household where English is a second language may see less of a reason to participate as a healthy control in a case-control study, owing to the time and effort required to participate, an understandable mistrust of medical power dynamics (Gamble, 1993, Jaiswal et al., 2018), or other reasons. We hypothesize that study non-participation can materially affect the conclusions of a spatial cluster study for case-control data, arising in selection bias, having a variety of effects depending on the geographic nature of the bias and how it affects case and control participation. To our knowledge, the problem of study non-participation resulting in selection bias in spatial analyses of case-control data has not been studied. Therefore, in this paper we design a simulation study to assess and quantify the effects of study non-participation on spatial cluster analyses of case-control data using the popular local spatial scan method. We then propose an algorithm to correct for study non-participation and demonstrate its benefits for spatial cluster studies using the same simulated data sets. Finally, we apply our method to a case-control study of NHL to assess how it changes the conclusions that would be made from a spatial cluster analysis of these data.
留言 (0)