This section details the methodology and results of our study aimed at developing Z-Score models for normalizing measurements of three coronary arteries in children aged newborn to 10 years to enhance the diagnosis of Kawasaki disease (KD). The study included 1180 participants free from coronary or heart diseases. Measurements of the LCA, LAD, and RCA were collected, along with patient demographics such as age, height, weight, and body surface area (BSA). Several Z-Score models were proposed for different coronary arteries and age groups, with separate designs leading to multiple models. The models were rigorously tested for normality, and various data transformation techniques, including the Box-Cox method, were employed to optimize the RCA model. This section provides a comprehensive overview of the participant data, data transformation processes, basic and modified models, and the selection of our modeling efforts. Additionally, we further compared the predictive performance of each model. First, we calculated the Z-Scores for each age-specific model separately and then used these Z-Scores to compare the sensitivity and specificity across the entire dataset.
ParticipantsData collection was conducted at Kaohsiung Chang Gung Memorial Hospital’s Children’s Medical Center (n = 831) and through a free clinic project in Kaohsiung’s elementary schools (n = 349), encompassing a total of 1180 children. Participants were selected based on criteria that included being under 10 years of age, having undergone echocardiographic examinations, and having no history of coronary issues or congenital heart diseases. Collected data comprised the patients’ age, height, weight, and BSA, along with measurements of the internal lumen diameters of their LCA, LAD, and RCA. Table 1 presents the age distribution of the full training dataset. The mean age of the participants was 4.7 years. Due to variances in echocardiographic parameters, some LAD data were incompatible, resulting in 938 valid entries, while the LCA and RCA data both included 1180 entries. The LCA and LAD data passed the Anderson–Darling normality test, but the RCA data did not. On the other hand, the testing cohort comprised of 112 healthy subjects from a training dataset and 112 KD patients with coronary artery dilation, designated as the disease group. The mean age of the normal group and disease group was 4.80 and 4.07 years, separately. In the randomly selected testing dataset, the age distribution of the normal group closely resembled that of the full training dataset. Due to the nature of Kawasaki Disease, there are fewer individuals older than six years in the KD group (Table 2).
Table 1 Age distribution of full training datasetTable 2 Age distribution of testing datasetData transformationTo establish predictive models, methods including linear, logarithmic, exponential, and square-root regression are employed. The key differences among these models lie in the BSA transformation and the incorporation of age variables. We evaluated various BSA calculation methods, including those proposed by Mosteller, DuBois, Haycock, and Gehan. Additionally, we reanalyzed several regression models from existing literature, recalculating their parameters and Z-Scores. In these models, M represents the measured coronary vessel diameter, while α and β are constants reflecting the impact of different parameters. The Anderson–Darling test revealed that none of the regression models from the literature could simultaneously normalize the Z-Score distributions for our LCA, LAD, and RCA data. Examining the distributions of the LCA, LAD, and RCA data in relation to BSA, we noted inhomogeneous variation in data distributions, irrespective of conditional distributions of LCA, LAD, or RCA data against varying BSA values.
The Box-Cox transformation (Eq. 1) [15], a widely used technique in data transformation, effectively minimizes the impact of inhomogeneous variation in regression models. The Box-Cox transformation of the variable M is defined as M (λ), where λ is the transformation parameter (Table 3).
Table 3 Box-cox TransformationWe presented a simplified regression model (Eq. 2) using M (λ) as the dependent variable and BSA as the independent variable:
$$M^\lambda=\begin\frac\lambda&\mathrm\;\lambda\neq\;0\\\ln\left(M\right)&\mathrm\;\lambda=\;0\end$$
(1)
$$M_i^\lambda\;=\;\alpha\;+\;\beta\;\times\left(BSA_i\right)\;+\;\varepsilon_i$$
(2)
In this model, each value in the dataset (Mi, BSAi)|i = 1, …n is transformed into a corresponding value in the set (Mi(λ), BSAi)|i = 1, …n, fitting an optimal regression line. Assuming that the set εi, i = 1, …n is normally distributed with a mean of 0 and a variance of σ2, the optimal λ value is estimated using the maximum likelihood approach. The best regression model is then identified.
Adhering to Occam’s razor, we prioritized models achieving normalization with minimal complexity. In evaluating models, both the fit of the Z-Score distribution to a normal distribution and a higher R2 value are essential. For models with both normally distributed Z-Scores and R2 values of ≥ 0.5, we also considered the calculation complexity and the number of independent variables.
Basic modelIn our foundational model, Mi(λ) represents the ith data point for LCA, LAD, or RCA; BSAi is the BSA value calculated from the patient’s height and weight; and εi is an independently and normally distributed residual term with a mean of 0. The parameters λ, α, β, and γ in our optimal model are derived from our dataset. We calculated the Z-Score for each data point using the equation (Eqs. 3 and 4), where Zi represents the Z-Score of the ith data point in the optimal regression model, and the standard error is the square root of the mean squared error. Equations 3 and 4 illustrate the final version of our selected model.
$$M_i^\;=\;\alpha\;+\;\beta\;\times\;ln\;\left(BSA_i\right)\;+\;\varepsilon_i$$
(3)
$$_=\frac_^-\alpha -\beta \times \text\left(BS_\right)}}$$
(4)
In our research, we considered the impact of age and other variations. However, they did not improve the model performance. For the equation and results about the candidate models, we show them in Appendix A and Appendix B.
Modified model for RCA dataRegarding RCA data, initial modeling attempts without the Box-Cox transformation yielded nonnormal Z-Scores. Although our application of the Box-Cox transformation decreased data variation inconsistency, the RCA Z-Scores persisted in being nonnormally distributed when not including LCA and LAD values as independent variables. Consequently, RCA characteristics may not be fully explained solely by variables like BSA, age, and sex. Due to the nonnormal distribution of the RCA data, we restructured the model to incorporate LCA and LAD parameters (Eqs. 5 and 6). This adjustment aimed to produce more accurate RCA results aligning with the data. The refined model is detailed below:
$$M_i^\;=\;\alpha\;+\;\beta_1\;\times\;LCA\;+\;\beta_2\;\times\;LAD\;+\;\beta_3\;\times ln\;\left(BSA_I\right)\;+\;\varepsilon_I$$
(5)
$$}_i=\frac\left(_i\right)}}$$
(6)
Age group selectionTo manage the extensive combinations of age groups ranging from newborns to 10 years old, we implemented a systematic approach using a for-loop in R language to pair and test different age groupings. This allowed us to conduct a series of combi- natorial tests, assessing each combination against the Anderson–Darling normality test. For those models that successfully passed the normality test, we further evaluated them based on their R-square values, prioritizing models with higher R-square values for final selection. This approach ensured that the selected models were both statistically robust and provided the best fit for the data.
留言 (0)