Spatiotemporal trends and ecological determinants of cardiovascular mortality among 2844 counties in mainland China, 2006–2020: a Bayesian modeling study of national mortality registries

Data sourceMortality, under-reporting, and population data

Death certificate is regarded as an essential data source to measure and inform trends and patterns of disease mortality in public health practice [20]. This study used deidentified death cases collected by National Mortality Registration Information Management System (NMRIMS) housed in Chinese Center for Disease Control and Prevention (China CDC). Currently, National Mortality Surveillance System (NMSS) is the most populous mortality surveillance system worldwide and the only one with provincial representativeness in China, covers over 300 million individuals from 605 surveillance points (among which, “districts” was defined as urban areas and “counties” was defined as rural areas) in 31 provincial-level administrative divisions (PLADs, excluding Hong Kong Special Administrative Region, Macau Special Administrative Region and Taiwan province) in mainland China that accounts for 24% of total population and routinely collects individual details of death information in real-time through an internet-based approach [21]. Other than surveillance points included in NMSS, those rest of districts/counties also reported death cases directly through NMRIMS to China CDC; thus, NMRIMS have collected death cases from nearly all districts/counties across the country. Apart from national mortality registries, data for under-reporting adjustment were obtained from NMSS under-reporting field surveys conducted in 2009, 2012, 2015, and 2018, separately, which collected under-reporting data from 2006–2017 [22]. Data for county-level population was obtained from National Bureau of Statistics of China [23]. Data for CVD mortality rate standardization was acquired from 2010 population census [23]. Detailed descriptions of data source for NMSS, NMRIMS, under-reporting surveys for national mortality registries, population data, and 2010 Chinese census accompanied with their data curation process were reported in the additional file (see Additional file 1: Part 1. Table S1, Table S2).

Nonmedical ecological determinants

On the basis of the ecological model of health determinants developed by Dahlgren and Whitehead, the health determinants of focused outcomes could be attributed to direct causes (or proximal factors, downstream determinants) like disease-specific causes or individual risk factors and indirect causes (or distant factors, upstream determinants) like social determinants or environment factors [24]. In this study, we do not analyze the medical causes but rather focus on nonmedical ecological determinants at county-level [17, 18]. For which, being the upstream capstone of social development and preconditions of other factors, nonmedical ecological determinants play much important role in tailoring public health practice in real world, especially at population level, among either CVD patients, population at high risk, or even normal people [1, 25,26,27].

One of the most important nonmedical ecological determinants, also called indirect determinants, was gross domestic product per capita (GDP, 10,000 yuan per person), and it reflected to each resident’s economic contribution or value creation of his country or region [1, 26,27,28]. Second is nighttime light (NTL) data, which is often used to involve human social activities and urban expansion, socioeconomic factors estimation, and other fields such as environment, disaster, fishery, and energy [29]. Third, we included number of beds in health care institutions (NB, units per 10,000 persons) to present local healthcare resources and capacity [23]. Fourth, we included population density (PD, persons per 1 square kilometer) to present the population distribution in a specific area [23]. Apartment from that, we summarized that both the environmental and meteorological factors like particulate matters emission, temperature, and humidity, might have impacts on CVD mortality [30,31,32]. We thus included annual average temperature (TEMP, °C), temperature variability (TV, °C), annual average relative humidity (HUMID, %), longitude (LT), altitude (AT), and concentration of PM2.5 emission (PM2.5, μg/m3), as confounders in main analysis. We utilized multiple imputation to deal with the missing data of each determinant. Detailed descriptions of those indicators and confounders were reported in the additional file (see Additional file 1: Part 1. Table S3).

ParticipantsUnder-reporting adjustment and all-cause mortality calculation

In order to ensure the reliability and validity of data from NMRIMS, we calculated under-reporting rate (URR) annually for each stratum (urban/rural areas in eastern/central/western regions) during 2006–2017 as the proportion of missed deaths among the total number of deaths identified in under-reporting surveys. We then used spline regressions to predict URR in each stratum during 2018–2020. Afterwards, we derived under-reporting-adjusted all-cause mortality rate by sex and age group for all districts/counties by dividing reported number of deaths by (1-URR) [16].

Data quality control

In order to acquire consistent information of mortality at district/county level, we first unified administrative codes throughout 2006-2020 to an identical one to deal with the issues caused by newly-added, cancelation, alternation of the codes in each of the district/county. Afterwards, we have reviewed, compared, and evaluated the data quality of each county to exclude certain ones that were considered to be seriously under-reported and might affect overall results. We made the under-reporting-adjusted all-cause mortality rate of lower than 4.5‰ as exclusion criteria for those counties belongs to VRs and DSPs earlier than 2013 [33]. Since 2013, we made the criteria of 5‰. For counties that neither belongs to VRs nor DSPs, we made the criteria of 3‰ during 2006–2020. Afterwards, we performed a covariate-based modeling approach to further evaluate the data quality of remained counties [33]. We also excluded death cases with missing values of key indicators like COD, sex, and age. Detailed descriptions of data quality control were reported in the additional file (see Additional file 1: Part 2).

CVD mortality calculation

Underlying COD in NMRIMS was recorded by using International Classification of Disease 10th Edition (ICD-10). All death cases between 2006 and 2020 where CVD was identified as underlying COD were extracted (ICD10: I00-I99) [33]. CVD mortality rate by location-year-sex-age group was calculated by multiplying by all-cause mortality rate generated previously and proportion of COD, for which the latter was calculated by cases of CVD deaths divided by all-cause deaths at each stratum [16].

Statistical methodsHBSTM framework

We applied hierarchical Bayesian spatiotemporal model (HBSTM) to reveal the spatiotemporal patterns and measure the associations between nonmedical ecological determinants of CVD mortality at county-level in mainland China during 2006–2020. Comparing with classical regression models, by taking a spatially explicit approach to modeling, the spatial configuration of the data may contribute information about the outcome not captured by other area attributes. Moreover, HBSTM is proposed in the form of prior knowledge incorporated with spatial and temporal correlations and uncertainties existed in spatiotemporal process, which could fully depict spatiotemporal evolution of disease distribution. HBSTM framework consists of three models: data model, process model, and (hyper) parameter model, and it can be described as follows [15, 34,35,36,37,38,39,40,41,42,43,44]:

$$\textrm\ \textrm:\kern0.5em }_\sim P\left(}_|_,\Theta \right)$$

(1)

$$\textrm\ \textrm:& _=\textrm(i)+\omega (t)+_\left(i,t\right)+\end}_$$

(2)

$$\left(\textrm\right)\ \textrm\ \textrm:& \Theta \sim \textrm\end}\left(\Theta \right)$$

(3)

In Eqs. (1), (2), and (3), i denotes a certain county in China; t denotes a specific year during 2006 and 2020; Yit denotes the observed sample data with spatiotemporal attributes; θit denotes the spatiotemporal process metric; S(i) denotes the overall spatial pattern; ω(t) denotes the overall temporal trend; Ωit(i, t) denotes the local spatiotemporal effects; εit denotes the spatiotemporal stochastic noise; and Θ denotes the (hyper) parameter data.

HBSTM specification

For modeling count data of rare events like CVD mortality occurred in a small-scale area with large population, Poisson regression is often applied, and it can be specified as a log-linear model for the CVD mortality as follows [15, 34,35,36, 38,39,40,41,42,43,44,45,46]:

$$_\sim \textrm\left(__\right)$$

(4)

In Eq. (4), Yit denotes the observed CVD death counts in county i and year t, and it follows a Poisson distribution. Eit denotes the expected CVD death counts in county i and year t. θit denotes a county specific CVD mortality risk, which is equivalent in this setting to the standardized mortality ratio (SMR), as the SMR is often seen as a highly unstable outcome because of the number of events and the size of population at risk, and this instability needs to be accounted for in the model. By using methods of hierarchical Bayesian smoothing, this can be addressed directly and specified as follows [15, 34, 36,37,38,39,40,41,42,43,44,45,46]:

$$\mathit\left(_\right)=\alpha +_i+_t+_\\ \mathit\left(_\right)=\alpha +_i+_i+_0t+_t+_\textrm+_\end}$$

(5)

$$\mathit\left(_\right)=\alpha +_i+_i+_0t+_t+_\textrm+_+\beta X$$

(6)

In Eq. (5), the observed spatiotemporal variability in CVD mortality risk is decomposed into three components: overall spatial pattern Si(si + ui), overall temporal trend Vt(b0t + vt), and local spatiotemporal variations φit (b1it + εit) within each spatial unit (district/county). Among which, α denotes the intercept quantifying the average CVD mortality rate in all counties in China over 2006–2020. For spatial terms, si denotes the spatially structured random effects and is used to describe the distribution of CVD mortality risk across China during the study period, and ui denotes the spatially unstructured random effects. The overall time trend consists of a linear temporal trend b0, which represents the overall rate of change in CVD mortality and is also called overall slope, with an additional Gaussian noise vt used to detect nonlinear trend. The combination of the common spatial pattern and temporal trend represents the stable component of CVD mortality risk. Apart from this, local spatiotemporal variations represent the spatial and temporal interaction effects describing local attributes, among which, b1i denotes the departure from global trend for each county and also called local slope, εit denotes the spatiotemporal stochastic noise for county i in year t, and captures additional variability that not explained by other terms in the model. In this study, we used various random effects to partition the observed space-time variations, and those random effects serve as surrogate measures of unobserved risk factors that vary over space, time or both. Further, we extended null spatiotemporal model, Eq. (6), to incorporate covariates that could help to explain the spatiotemporal patterns, where X represents a series of covariates and β denotes their corresponding coefficients. In this study, we included GDP, NTL, NB, and PD as potential influential factors, and TEMP, TV, HUMID, LT, AT, and PM2.5 as confounders.

Prior and hyperprior were assigned for all parameters in the model [16, 34, 36,37,38,39,40,41,42,43,44,45,46]. For prior distribution specifications, we assumed that the overall CVD mortality risk α and overall rate of CVD mortality risk change b0 followed a uniform distribution. We assumed that the overall spatial random effect si and local departure from global trend b1i followed a conditional autoregressive (CAR) prior with a spatial weight matrix W to impose spatial structure. Specifically, W is a matrix of size N × N, where its diagonal entries wii = 0 and the off-diagonal entries wij = 1 if county i and j share a common boundary and wij = 1 otherwise. Herein, CAR prior on the spatial random effect implies that the adjacent counties tended to have changes in CVD mortality risk that were more alike than is the case for counties that were far apart. Both of the nonlinear temporal trends vt and overdispersion parameter εit followed a normal distribution. For hyperprior distribution specifications, we assigned a strictly positive half Gaussian prior to all random effect standard deviations, and precision parameter followed gamma noninformative distribution. Detailed descriptions of prior and hyperprior parameter specifications and model selection were reported in the additional file (see Additional file 1: Part 3).

Afterwards, we visualized the spatial variations of CVD mortality risk and its change at county-level through using the results of SMR of CVD estimated by HBSTM. In the consideration of soaring death cases reported by the national mortality registries especially after its expansion in 2013, as well as the population structure change, we calculated absolute value of age-standardized mortality rate (ASMR) of CVD to provide a comparable estimation across time and space. Thereby we used average annual percent change (AAPC) to present ASMR of CVD at county-level that change at a constant percentage every year change linearly on a log scale. AAPC is a summary measure of the trend over a pre-specified fixed interval, and it allows us to use a single number to describe the average percent change over a period of multiple years [47].

CVD mortality risk classification

We also reported the posterior probability which estimated the relative change over time and space in each county to represent an increase versus a decrease of CVD mortality risk. The posterior probability represented the inherent uncertainty of changing CVD mortality trends, specifically, if the estimated CVD mortality was the same in 2006 and 2020, and an increase was statistically indistinguishable from a decrease, there was a 50% posterior probability of an increase and a 50% posterior probability of a decrease. For example, in a county in which the posterior distribution of CVD mortality in 2020 was entirely greater than in 2006, there was around a 100% posterior probability of an increase, and hence around a 0% probability of a decrease, and vice versa. Posterior probabilities more distant from 50% indicate more certainty, and it can be described as follows [12, 44]:

$$p\left(\mathit\left(_i+_i\right)>1| data\right)$$

(7)

$$p\left(_>0|_i, data\right)$$

(8)

In Eqs. (7) and (8), (exp(si + ui)) denotes the logarithm of CVD mortality rate over time in county i relative to national average, data denotes the observed data, and p(exp(si + ui) > 1| data) denotes the posterior probability of relative risk of CVD mortality larger than 1. p(b1i > 0| hi, data) denotes the posterior probability of local trend of CVD mortality risk in county i relative to national average larger than 0.

On the basis of this, we performed a two-stage classification method to identify hot/cold/warm spots of CVD mortality risk at county-level by using the p(exp(si + ui) > 1| data) and p(b1i > 0| hi, data )[12, 15, 44]. For details, at the first stage, we defined a county as a hot spot (h1) if p(exp(si + ui) > 1| data) was greater than 0.8 and as a cold spot (h2) if p(exp(si + ui) > 1| data) was less than 0.2; besides, we defined the other counties as warm spots (h3). At the second stage, we further classified a county under each risk category in the first stage into one of the three trend patterns by using estimates of local slopes b1i, namely, a county with a faster increasing local trend than global trend if p(b1i > 0| hi, data) > 0.8, a decreasing trend relative to the global trend if p(b1i > 0| hi, data) < 0.2, and a local trend not differing from the global trend if 0.2 < p(b1i > 0| hi, data) < 0. 8 [15, 34, 41, 44]. Detailed descriptions of CVD mortality risk classifications were reported in the additional file (see Additional file 1: Part 3. Table S4).

In this study, 95% confidence intervals (CI) were used to represent the 2.5th to 97.5th percentiles of the posterior distribution of estimated CVD mortality, and a P value < 0.05 was considered statistically significant; all tests were two sided. Data cleaning and preparation was performed in SAS version 9.4 (SAS Institute Inc., Cary, North Carolina, USA), while spatiotemporal modeling was performed in R version 4.1.2 (The R foundation for Statistical Computing, Lucent Technologies, Auckland, New Zealand) by using “INLA” package.

留言 (0)

沒有登入
gif