Spatially varying effects of measured confounding variables on disease risk

In this section, we first present the analysis of detecting geographical disease clusters of peak incidence and incidence paucity performed by the generalized map-based pattern recognition procedure and the spatial scan statistic, respectively, based on data on the spatial occurrence of SIDS incidence in North Carolina counties. Secondly, we present the analysis of investigating geographical variability in the association between SIDS incidence and race and gender, using the proposed interaction regression model and the Freeman-Tukey square-root transformation.

Geographical SIDS clusters by the generalized map-based pattern recognition procedure

The analysis of detecting geographical disease clusters of peak incidence and incidence paucity performed by the generalized map-based pattern recognition procedure was presented in our previous report [8]. We determined the 3 groups of counties to use in constructing hierarchical (in intensity) disease clusters of mutually neighboring high-risk counties with 3 different levels of intensity. Level-H1 counties are the 8 top ranking counties; Level-H2, 10 counties ranking from 9 to 18; Level-H3, 6 counties ranking from 19 to 24. The overall incidence of the 8 Level-H1, 10 Level-H2, and 6 Level-H3 counties combined are 5.57, 3.95, and 2.79 per 1000 live births, respectively. Correspondingly, We constructed 3 hierarchical intensity clusters of peak SIDS incidence that were located in the northeast (6 counties: 5 Level-H1 and 1 Level-H2) with combined incidence of 4.98, the south (6 counties: 1 Level-H1 and 5 Level-H2) with combined incidence of 4.06, and the mid-east (6 counties: 1 Level-H1 and 5 Level-H3) with combined incidence of 3.09 per 1000 live births.

Next, we further constructed 3 hierarchical low-intensity clusters appearing in the northwest (6 counties: 4 Level-L1 and 2 Level-L2) with combined incidence of 0.28, the mid-west (9 counties: 1 Level-L1 and 8 Level-L2) with combined incidence of 0.70, and the eastern coast (3 counties: 3 Level-L1) with combined incidence of 0.00 per 1000 live births. Level-L1 counties are the 13 top ranking counties with 0 SIDS; Level-L2, 11 counties ranking from 87 to 77. The overall incidence of the 13 Level-L1 and 11 Level-L2 counties combined are 0 and 0.81 per 1000 live births, respectively. Figure 3A presents the county-specific SIDS incidence intensity-level map.

Fig. 3figure3

A SIDS Incidence Intensity-Level Map in North Carolina by Generalized Map-based Pattern Recognition Procedure. Fifty-two medium-risk counties are indicated in white that are not considered to be used in constructing hierarchical intensity clusters of peak and low SIDS incidence. B Most Likely and Secondary SIDS Clusters Map in North Carolina by Spatial Scan Statistic. Seventy-six counties are indicated in white that do not lie in the most likely and secondary disease clusters of peak SIDS incidence or the most likely disease cluster of low SIDS incidence

Geographical SIDS clusters by the spatial scan statistic

We applied the Poisson model of the spatial scan statistic for detecting geographical disease clusters of peak incidence and incidence paucity to data on SIDS patients in North Carolina, using the program package of SaTScan™. The most likely disease cluster, denoted by M, and secondary disease cluster, denoted by S, of peak incidence were located in the northeast (4 counties in red) with a p-value of 1.11 × 10–4 and combined incidence of 5.12 and in the south (6 counties in yellow) with a p-value of 4.89 × 10–4 and combined incidence of 3.76 per 1000 live births, respectively, as shown in Fig. 3B. Anson county appeared as a highly significant sub-cluster inside the secondary cluster, denoted by SO, with a p-value of 6.06 × 10–4 and incidence of 9.55 per 1000 live births.

Next, we searched for geographical disease clusters of incidence paucity. The most likely disease cluster of low incidence, denoted by M, was located in the mid-west (14 counties in navy) with a p-value of 1.00 × 10–6 and combined incidence of 1.10 per 1000 live births. The secondary disease cluster of low incidence, denoted by S, in the mid-east (7 counties) was not statistically significant with a p-value of 6.12 × 10–1.

A summary of spatial SIDS cluster detection analysis based on the generalized map-based pattern recognition procedure and the spatial scan statistic is presented in Table 1. Note that the detected geographical SIDS clusters of high incidence in the article by Kulldorff were different from those identified and presented here because his analysis was based on a larger data of SIDS incidence in North Carolina, which were over the 9-year period in 1974–1984 [7]. In addition, his report did not search for spatial SIDS clusters of low incidence.

Differential spatial effects of race

The expected incidence of SIDS patients, adjusted for race, in Anson was 4.35 per 1000 live births through indirect standardization, which was unacceptably low in comparison with its raw incidence of 9.55. We therefore removed Anson from the regression analysis to avoid one unusual value vastly affecting the fit to the other 99 North Carolina counties. Here, we applied the proposed interaction regression model, expressed in Eq. (2), to a total of 99 North Carolina counties for spatial risk analysis.

We started with a non-spatial analysis of SIDS incidence related to race by using the proposed model with no spatial covariates; that is, the linear regression model with one single covariate Race, denoted by XFT1, for Freeman-Tukey transformed non-white live-birth rate and β2 = β3 = 0. The covariate XFT1 was a highly significant predictor variable at a nominal significance level of 10–3 with the estimated coefficients b1 = 3.87 × 10–2, se(b1) = 5.53 × 10–3. The adjusted R2 for the XFT1-YFT regression line was 32.86% (R2 = 33.55%). The estimates of the model parameters are presented in the second column of Table 2.

Table 2 Summary of spatial risk analysis by different models with the generalized map-based pattern recognition procedure

Because different geographical SIDS clusters of peak incidence and incidence paucity were detected by the generalized map-based pattern recognition procedure and the spatial scan statistic, separate spatial risk analyses were performed and presented. In addition, measured spatial covariates to adjust for the counties in previously detected geographical SIDS clusters identified by these 2 models were coded accordingly.

Spatial risk analysis with the generalized map-based pattern recognition

We tested the significance of geographical difference on disease risk in a measured covariate of race by letting the covariate XFT1 depend on the measured spatial covariate X2. That is, the interaction covariate XFT1X2, the product of XFT1 and X2, was used to estimate the excess of SIDS risk related to measured Freeman-Tukey transformed non-white live-birth rate in previously detected geographical SIDS clusters of peak incidence over counties outside these geographical SIDS clusters. Note that X2 is coded as 1 for 18 counties in the 3 hierarchical intensity clusters of peak incidence and 0 otherwise. Based on the proposed interaction regression model and β3 = 0, F(Regression | b0) = 47.23 (> F(2, 96, 0.999) = 7.43) was significant at a nominal significance level of 10–3 by the F-test for overall regression. The contribution of XFT1 and the additional contribution of XFT1X2 given that XFT1 was already introduced to the model were both very important and significant with F(due to b1 | b0) = 63.90 and F(due to b2 | b1, b0) = 30.57 (> F(1, 96, 0.999) = 11.52) by the sequential F-test. The estimates of the model parameters are presented in the third column of Table 2, including the adjusted R2 = 48.55% (R2 = 49.60%).

Next, we applied the proposed interaction regression model with β2 = 0 and used the interaction covariate XFT1X3 to estimate the excess of SIDS risk related to race in previously detected geographical SIDS clusters of incidence paucity over counties outside these geographical SIDS clusters. X3 is coded as 1 for 18 counties in the 3 hierarchical intensity clusters of incidence paucity and 0 otherwise. In this analysis, XFT1 and XFT1X3 after XFT1 was already in the equation were both highly significant with F(due to b1 | b0) = 62.21 and F(due to b3 | b1, b0) = 27.21 (> F(1, 96, 0.999) = 11.52). The F-test for overall regression was highly significant with F(Regression | b0) = 44.71 (> F(2, 96, 0.999) = 7.43). The result of the model with covariates XFT1 and XFT1X3 is presented in the fourth column of Table 2 with the adjusted R2 = 47.15% (R2 = 48.23%).

We further included both interaction covariates XFT1X2 and XFT1X3 in the model in the presence of the main effect of XFT1. Importantly, we found that the additional contributions of XFT1X2 given that XFT1 was already in the equation and XFT1X3 given that XFT1 and XFT1X2 were both in the equation remained highly significant each with F(due to b2 | b1, b0) = 38.70 and F(due to b3 | b2, b1, b0) = 26.53 (> F(1, 95, 0.999) = 11.53) by the sequential F-test. The XFT1 remained very important with F(due to b1 | b0) = 80.89 (> F(1, 95, 0.999) = 11.53). It is noted that F(Regression | b0) = 48.71 (> F(3, 95, 0.999) = 5.88) by the F-test for overall regression; b1 = 2.02 × 10–2, se(b1) = 5.16 × 10–3; b2 = 2.09 × 10–2, se(b2) = 3.82 × 10–3; b3 = − 3.26 × 10–2, se(b3) = 6.32 × 10–3; and the adjusted R2 = 59.36% (R2 = 60.60%). Each of the 3 predictor variables, XFT1, XFT1X2, and XFT1X3, was significant at a nominal significance level of 10–3 by the t test or partial F-test. The result of the model with covariates XFT1, XFT1X2, and XFT1X3 is shown in the fifth column of Table 2.

The inclusion of both the interaction covariates XFT1X2 and XFT1X3 to the proposed interaction regression model in the presence of the main effect of XFT1 was supported by the test statistics, although there existed a substantial correlation coefficient of 0.55 between XFT1 and XFT1X2, and a small correlation coefficient of − 0.13 between XFT1 and XFT1X3 in the model. It was further evidenced by the fact that the model with covariates XFT1, XFT1X2, and XFT1X3 had a substantially higher value of the adjusted R2 than that with XFT1 and XFT1X2 or that with XFT1 and XFT1X3 in comparison with the model with covariate XFT1 alone. Thus, our parsimonious fitted least-squares regression equation was

$$\widehat} }} = ~2.1528~ + ~0.0202~X^} _ + ~0.0209~X^} _ X_~~0.0326~X^} _ X_.$$

(3)

We classified as Region 1 the 63 counties outside the 6 geographical SIDS clusters of peak incidence and incidence paucity, the majority of which were medium-risk counties; as Region 2 the 18 counties in the 3 hierarchical intensity clusters of peak incidence (in the northeast, south, and mid-east); and as Region 3 the 18 counties in the 3 hierarchical intensity clusters of incidence paucity (in the northwest, mid-west, and eastern coast).

The coefficient b2 of XFT1X2 measures the differential effect of Freeman-Tukey transformed non-white live-birth rate XFT1 on the slope of the regression line between Region 1 and Region 2. The b2 = 0.0209 indicates that the slope of the regression line for Region 2 is higher by 0.0209 than that for Region 1. According to Eq. (3), the regression line has YFT slope 0.0202 for Region 1; YFT slope 0.0411 (= 0.0202 + 0.0209) for Region 2. Next, the coefficient b3 = − 0.0326 of XFT1X3 indicates that the slope of the regression line for Region 3 is lower by 0.0326 than that for Region 1; that is, the regression line has YFT slope − 0.0124 (= 0.0202 – 0.0326) for Region 3.

Letting the response function as a function of XFT1 conditional on X2 and X3, the spatial effect of race was highest in Region 2 with the response function equal to 2.1528 + 0.0411 XFT1 for X2 = 1 and X3 = 0 and lowest in Region 3 with the response function = 2.1528—0.0124 XFT1 for X2 = 0 and X3 = 1. The response function was 2.1528 + 0.0202 XFT1 for Region 1 with X2 = 0 and X3 = 0. Figure 4A shows a plot of XFT1 versus YFT for the 99 North Carolina counties as well as the 3 fitted regression lines based on the generalized map-based pattern recognition procedure.

Fig. 4figure4

A Plot of Freeman-Tukey Transformed Non-White Live-Birth Proportion XFT1 versus Freeman-Tukey Transformed SIDS Incidence YFT for 99 North Carolina Counties and Fitted Regression Lines based on Generalized Map-based Pattern Recognition Procedure. Red Symbol and Blue Symbol Indicate Counties in Hierarchical Intensity Clusters of Peak Incidence and Incidence Paucity, Respectively. B Plot of Freeman-Tukey Transformed Non-White Live-Birth Proportion XFT1 versus Freeman-Tukey Transformed SIDS Incidence YFT for 99 North Carolina Counties and Fitted Regression Lines based on Spatial Scan Statistic. Red Symbol and Blue Symbol Indicate Counties in Likely SIDS Clusters of Peak Incidence and Incidence Paucity, Respectively

In conclusion, we determined the presence of spatial variability in the association between SIDS incidence and race and estimated the differential spatial effects of race on SIDS incidence among the 3 distinct regions defined by the generalized map-based pattern recognition procedure.

Spatial risk analysis with the spatial scan statistic

We applied the proposed model in Eq. (2) to the geographical SIDS clusters of peak incidence and incidence paucity detected by the spatial scan statistic, as shown in Fig. 3B. In this application, X2 is coded as 1 for 9 counties in previously detected most likely disease cluster in the northeast with 4 counties and secondary disease cluster in the south with 5 counties of peak incidence and 0 otherwise. Note that the secondary disease cluster S comprises only 5 counties rather than 6 here because Anson is removed from this analysis.

Based on the proposed interaction regression model and β3 = 0, the contribution of covariate Race, denoted by XFT1, was significant at a nominal significance level of 10–3 with F(due to b1 | b0) = 52.66 (> F(1, 96, 0.999) = 11.52), and the additional contribution of XFT1X2 given that XFT1 was in the equation was significant at a nominal significance level of 10–2 with F(due to b2 | b1, b0) = 8.30 (> F(1, 96, 0.99) = 6.91) by the sequential F-test. The F-test for overall regression was highly significant with F(Regression | b0) = 30.48 (> F(2, 96, 0.999) = 7.43). The result of the model with covariates XFT1 and XFT1X2 is shown in the third column of Table 3, including the adjusted R2 = 37.56% (R2 = 38.84%).

Table 3 Summary of spatial risk analysis by different models with the spatial scan statistic

With X3 coded as 1 for 14 counties in previously detected most likely disease cluster of incidence paucity and 0 otherwise, we next applied the proposed interaction regression model with β2 = 0. Note that the 7 counties in the secondary disease cluster of incidence paucity located in the mid-east are all coded as 0 as this cluster is not statistically significant at a nominal significance level of 0.05.

We found that the contribution of XFT1 was significant at a nominal significance level of 10–3 with F(due to b1 | b0) = 53.33 (> F(1, 96, 0.999) = 11.52) and the additional contribution of XFT1X3 given that XFT1 was in the equation was significant at a nominal significance level of 10–2 with F(due to b3 | b1, b0) = 9.63 (> F(1, 96, 0.99) = 6.91) by the sequential F-test. The F-test for overall regression remained highly significant with F(Regression | b0) = 31.48 (> F(2, 96, 0.999) = 7.43). The result of the model with XFT1 and XFT1X3 is presented in the fourth column of Table 3 with the adjusted R2 = 38.35% (R2 = 39.61%).

Incorporating covariates XFT1, XFT1X2, and XFT1X3 all into the proposed interaction regression model, we found that XFT1, XFT1X2 given that XFT1 was in the equation, and XFT1X3 given that both XFT1 and XFT1X2 were in the equation were all important and significant contributors to the observed spatial variation in SIDS risk each with F(due to b1 | b0) = 57.17 (> F(1, 95, 0.999) = 11.53), F(due to b2 | b1, b0) = 9.01, and F(due to b3 | b2, b1, b0) = 9.22 (> F(1, 95, 0.99) = 6.91). By the F-test for overall regression, F(Regression | b0) = 25.13 (> F(3, 95, 0.999) = 5.88) was highly significant. The estimates of the model parameters are presented in the fifth column of Table 3 with b1 = 3.00 × 10–2, se(b1) = 5.61 × 10–3; b2 = 1.52 × 10–2, se(b2) = 5.41 × 10–3; b3 = − 2.44 × 10–2, se(b3) = 8.03 × 10–3; and the adjusted R2 = 42.49% (R2 = 44.25%). The covariates XFT1, XFT1X2, and XFT1X3 were significant each at a nominal significance level of 10–2 by the t test or partial F-test.

Although the statistical evidence to include both XFT1X2 and XFT1X3 to the proposed interaction regression model in the presence of the main effect of XFT1 was not as strong as the previous application, we concluded the presence of spatially varying association between SIDS incidence and race. We found that the correlation coefficient between XFT1 and XFT1X2 = 0.39 remained substantial but smaller than the one (= 0.55) in the previous application. The correlation coefficient between XFT1 and XFT1X3 = − 0.17 was similar to the one (= − 0.13) in the previous application. The parsimonious fitted least-squares regression equation in this application was.

$$\widehat} }} = ~1.9133~ + ~0.0300~X^} _ ~ + ~0.0152~X^} _ X_~~0.0244~X^} _ X_.$$

(4)

We estimated the differential spatial effects of race on SIDS among the geographical SIDS clusters of incidence anomalies and outside the geographical SIDS clusters, detected by the spatial scan statistic. According to Eq. (4), the spatial effect of race was highest in the most likely and secondary disease clusters of peak incidence with the response function equal to 1.9133 + 0.0452 XFT1 for X2 = 1 and X3 = 0 and lowest in the most likely disease cluster of incidence paucity with the response function = 1.9133 + 0.0056 XFT1 for X2 = 0 and X3 = 1. The response function was 1.9133 + 0.0300 XFT1 for 76 counties outside the detected geographical SIDS clusters by the spatial scan statistic with X2 = 0 and X3 = 0. Figure 4B shows a plot of XFT1 versus YFT for the 99 North Carolina counties as well as the 3 fitted regression lines based on the spatial scan statistic.

Table 4 gives a sample of counties with the observations used for the estimation of the parameters of the models expressed in Eqs. (3) and (4), respectively presented in the fifth column of Tables 2 and 3, as well as the fitted values and residuals.

Table 4 A sample of North Carolina counties with observations, fitted values, and residuals with full modelsSpatial effects of gender

Gender was another important risk factor for SIDS incidence in this data. We found a significant difference between state-wide SIDS incidence rates for male children and female children, 2.284 versus 1.765 per 1000 live births, with a p-value of 1.07 × 10–3.

Letting covariate Gender, denoted by XFT1, for Freeman-Tukey transformed male live-birth rate and β2 = β3 = 0 in Eq. (2), the linear regression model with YFT indicated the non-significance of sex difference on SIDS risk in geography with the estimated coefficients b1 = 2.35 × 10–1, se(b1) = 1.89 × 10–1, which gives a p-value of 0.22 by the t test or partial F-test. It was further evidenced by the fact that the values of the adjusted R2 (< 0.6%), R2 (= 1.6%), and the correlation coefficient between XFT1 and YFT (= − 0.13) were all very low.

A plot of SIDS incidence × 1000 versus non-white and male live-birth rates for the 100 North Carolina counties, presented in Fig. 5, shows that non-white live-birth rate is highly spatially varying distributed, but male live-birth rate lies around 0.5. The result related to gender was very different from the previous one related to race because of the discrepancy between spatial distributions for race and gender. We concluded the absence of spatial association between SIDS incidence and gender. The spatial risk analysis of SIDS incidence performed by our proposed model that we have presented in this section well characterizes and assesses spatially varying associations between SIDS incidence and race and gender in studies of geographical disease clusters of peak incidence and paucity of incidence.

Fig. 5figure5

Plot of SIDS Incidence × 1000 versus Non-White (Blue Symbol ) and Male (Red Symbol ) Live-Birth Proportion

留言 (0)

沒有登入
gif