Model-based disease mapping using primary care registry data

Primary care registries (PCRs) are important to support public health surveillance, holding individual patient information routinely collected by general practitioners (GPs) during daily practice, ranging from disease diagnoses to demographic details and medications. This wealth of data renders PCRs valuable for passive sentinel surveillance (Centers for Disease Control and Prevention (CDC), 2014), with examples worldwide, including the Clinical Practice Research Datalink (United Kingdom) (Herrett et al., 2015, Wolf et al., 2019), the Nivel Primary Care Database (The Netherlands) (Hasselaar, 2022), the Canadian Primary Care Sentinel Surveillance Network (Canada) (Garies et al., 2017, Garies et al., 2019), and others (Smeets et al., 2018, Busingye et al., 2019, Bakken et al., 2020, de Ridder et al., 2022, Recalde et al., 2022).

In Flanders, Belgium, the INTEGO PCR, operational since 1994, collects weekly data from 537 GPs affiliated to 122 general practices active in INTEGO at the time of writing (Truyers et al., 2014, Delvaux et al., 2018). Currently holding records for over 600,000 patients, the registry’s mission is to monitor public health in Flanders. Recent studies based on the INTEGO database include the modelling of the dementia prevalence (Beerten et al., 2022), the evaluation of trends in medication prescription (Boon et al., 2022, Couteur et al., 2022), the assessment of the impact of the COVID-19 pandemic on primary care provision (Van den Bulck et al., 2022), and the quantification of the effects of pneumococcal vaccination on the severity of lower respiratory tract infections (LRTIs) (Mamouris et al., 2022).

Spatial analysis of registry data is gaining prominence for public health agencies (Lawson et al., 2000). PCRs, with their daily addition of public health information, offer timely insights into spatial and spatiotemporal prevalence and incidence trends of endemic and emerging diseases (Lawson et al., 2000). Geostatistical disease mapping is employed to analyse geographical variation in disease risk, typically at specific spatial aggregation levels such as municipalities (Lawson et al., 2000). In this approach, disease mapping models, often generalized linear mixed models, utilize random effects to estimate latent spatially structured and unstructured processes (Lawson et al., 2000, Lawson, 2021a, Lawson, 2021b, Neyens et al., 2012). These processes capture explicit spatial epidemiological phenomena and non-spatial extra-variability, representing noise from non-spatial processes or finer-scale spatial mechanisms. Model-based estimates of disease risk, or estimates of the spatial random effect surfaces, help to identify regions with elevated mean disease risk.

Disease mapping methods rely on assumptions, such as the sample being representative for the population at risk, unless unrepresentativeness factors can be observed and corrected for in the model. PCRs typically fail to meet this condition as GPs and patients participate voluntarily, resulting in opportunistic samples with regional differences in size and diversity of the covered population. The underlying mechanisms are poorly understood and therefore challenging to correct for. Furthermore, GPs can adopt distinct reporting protocols leading to an additional source of variation (Tulloch et al., 2020). Disease mapping models are often fitted through Bayesian estimation, and throughout the last decade, Integrated Nested Laplace Approximation (INLA), as an alternative to Markov chain Monte Carlo (MCMC), has become a popular approximating Bayesian estimation method (Rue et al., 2009). In addition to correct model specification, a Bayesian analysis requires careful selection of prior distributions for specific model parameters. For example, it has been argued that INLA’s default precision priors are not always suited for disease mapping (Carroll et al., 2015, Simpson et al., 2017, Khan et al., 2021).

Despite the fact that PCR data can be informative to spatially monitor diseases, challenges persist. Data quality concerns arise as PCR participation is voluntary, leading to, e.g., samples with regional differences in size of the covered population. Additionally, distinct reporting behaviors among participating GPs introduce variation (Tulloch et al., 2020). From this, we define five commonly encountered factors in registry data analysis that we consider to be associated with the statistical analysis performance in estimating spatial trends in disease risk within a study region: (i) spatial representativeness of the sample, related to the spatial (im)balance of the practices taken up in the database; (ii) differences in practices’ reporting efforts; (iii) replication, related to variation in the number of practices per municipality; (iv) sample size, related to the total number of practices taken up in the database; (v) strength of the spatial trend, related to the spatial epidemiological trend. By means of case and simulation studies, we investigate the impact of these five registry-related challenges on the quality of spatial trend estimation in disease risk. In doing so, we additionally show the importance of the prior specification in these analyses.

留言 (0)

沒有登入
gif