Machine learning-based clustering to identify the combined effect of the DNA fragmentation index and conventional semen parameters on in vitro fertilization outcomes

Baseline characteristics of all patients

After all exclusions, a total of 1258 couples undergoing fresh transfer in vitro fertilization cycles were included in the analysis (Fig. 1). In this cohort, 664 (52.8%), 646 (51.4%), and 549 (43.6%) couples had positive β-hCG, clinical pregnancy, and live birth outcomes, respectively. The baseline characteristics of all couples in this study are presented in Supplementary Table 1.

Fig. 1figure 1

Flow chart for the selection of participants in the cohort study

The correlation coefficient values between the sperm DNA fragmentation index and the studied routine semen parameters ranged from -0.5 to 0. Correlation coefficient values between two studied routine semen parameters ranged from -0.2 to 0.4. A heatmap showing pairwise correlations among the studied parameters is presented in Supplementary Fig. 1.

We considered four as the optimal number of clusters by the cumulative distribution function (CDF) plot (Supplementary Fig. 2(A)), elbow method (Supplementary Fig. 2(B)), consensus matrix heatmap (Supplementary Fig. 2(C)), mean cluster consensus score (Supplementary Fig. 2(D)), and clinical application interpretability. Then, the K-means clustering method was used to cluster all 1258 infertile couples who underwent fresh transfer in vitro fertilization treatment cycles into four clusters. Supplementary Table 2 presents statistics depicting the distributions of the routine semen parameters and sperm DNA fragmentation index values after Min–Max scaling. The clustering results are shown in Supplementary Fig. 3. The violin plot illustrating the features of the four clusters is shown in Fig. 2. Compared with those in the other three clusters, male patients in Cluster 1 had lower median sperm DFI values (8.6% [6.4%, 12.5%]), higher median sperm concentration levels (62.0 [46.9, 87.3] × 106/ml), higher median rapidly progressive motility levels (51.5% [47.0%, 56.0%]), and higher median slow or sluggish progressive motility levels (19.0% [17.0%, 22.0%]). Male patients in Cluster 2 had relatively low median sperm DFI values (12.4% [8.8%, 17.1%]) and intermediate median semen parameter levels. The median sperm DFI value was also relatively low in Cluster 3 (15.9% [11.4%, 20.2%]), while the median semen parameter levels were also low (for example, the median rapidly progressive motility level was 19.0% [14.0%, 25.0%]). Male patients in Cluster 4 had higher median sperm DFI values (36.4% [30.1%, 43.3%]) and lower median semen parameter levels (for example, the median rapidly progressive motility level was 13.0% [6.5%, 20.0%]). Thus, we designated the 'low-level DFI/high-level sperm motility and semen concentration group' as Cluster 1, the 'low-level DFI/median-level sperm motility and semen concentration group' as Cluster 2, the 'low-level DFI/low-level sperm motility and semen concentration group' as Cluster 3, and the 'high-level DFI/low-level sperm motility and semen concentration group' as Cluster 4.

Fig. 2figure 2

The violin plot of sperm DFI and the studied routine semen parameters stratified by the 4 clusters based on the variables among all participants. Green dots refer to cluster 1 (low-level DFI/high-level semen parameter group); red dots refer to cluster 2 (low-level DFI/median-level semen parameter group); blue dots refer to cluster 3 (low-level DFI/low-level semen parameter); purple dots refer to cluster 4 (high-level DFI/low-level semen parameter)

The characteristics of the study participants across the four clusters are shown in Table 1. Compared with those in the other three clusters, the median female age (32.00 years [30.0, 35.0]) and the median male age (36.0 years [31.5, 39.5]) were both higher in Cluster 4 (P < 0.05). The duration of the attempt to conceive, male BMI, female BMI, anti-Mullerian hormone level, oestradiol level, follicle-stimulating hormone level, and endometrial thickness on the hCG trigger day were not significantly different among the four clusters. The proportion of participants undergoing controlled ovarian stimulation using the long downregulation protocol was similar in all four clusters (67.7%, 69.7%, 68.1%, and 67.8%, respectively) (P = 0.927). Although no significant differences were seen in the number of eggs retrieved among the four clusters, the numbers of fertilized eggs, the numbers of oocytes cleaved, and the numbers of embryos available on Day 3 were lower in Cluster 4 than in the other three clusters in terms of embryo laboratory outcomes. Cluster 4 had the lowest median fertilization rate (P < 0.001), but the three clusters had similar median cleavage and D3-available embryo rates.

Table 1 Baseline characteristics of 1258 infertile couples clustered in 4 clusters according to sperm DFI and the studied routine semen parametersSperm DFI values and IVF outcomes

After controlling for covariates such as the duration of the attempt to conceive, female age, male age, female BMI, male BMI, controlled ovulation stimulation protocols, AMH level, E2 level, FSH level, endometrial thickness, and the number of oocytes retrieved, linear exposure–response relationships were observed between the sperm DFI value and live birth, clinical pregnancy, and positive β-hCG outcomes (P for overall < 0.05, P for nonlinear > 0.05) (Supplementary Fig. 4 and Supplementary Table 3). The results showed a decreasing trend in the live birth, clinical pregnancy, and β-hCG positivity rates with increasing sperm DFI values. The results shown in Supplemental Fig. 5 suggest a U-shaped relationship between the DFI and miscarriage rate. According to ROC curve analysis (Supplementary Fig. 6), the area under the ROC curve for the sperm DFI and live birth, clinical pregnancy, and positive β-hCG outcomes were 0.56 (95% CI, 0.53–0.59), 0.56 (95% CI, 0.53–0.59), and 0.55 (95% CI, 0.52–0.58), respectively, with cut-off values of 8.70%, 11.14%, and 11.14%. Individuals in the third and fourth quartiles of DFI values were less likely to have better IVF outcomes (including live birth, clinical pregnancy, and positive β-hCG outcomes) than those in the lowest quartile after controlling for demographic characteristics and ovulation stimulation-related factors, although the significance was attenuated after adjusting for additional covariates in Models 1 and 2 (Fig. 3 and Supplementary Table 4).

Fig. 3figure 3

The forest plots of IVF outcomes in relation to the levels of sperm DFI and the studied routine semen parameters. Abbreviations: DFI, DNA fragmentation index; OR, odds ratio; CI, confidence interval; BMI, body mass index; AMH, Anti-Mullerian hormone; E2, Estradiol; FSH, Follicle-stimulating hormone. Notes: Model 1 was adjusted for duration of attempt to conceive, female age, male age, female BMI, and male BMI. Model 2 was further adjusted for controlled ovulation stimulation protocols, AMH, E2, FSH, endometrial thickness, and numbers of oocytes retrieved

Studied routine semen parameter levels and IVF outcomes

After controlling for covariates such as the duration of the attempt to conceive, female age, male age, female BMI, male BMI, controlled ovulation stimulation protocols, AMH level, E2 level, FSH level, endometrial thickness, and the number of oocytes retrieved, a linear exposure–response relationship was observed between the rapidly progressive motility level and clinical pregnancy and positive β-hCG outcomes (P = 0.025, P = 0.040, respectively), whereas no such relationship was observed with the live birth outcome (P = 0.106) (Supplementary Fig. 4 and Supplementary Table 3). Although not statistically significant, it is clear (Supplementary Fig. 4) that the increase in the sperm concentration was conducive to better IVF outcomes. The odds of a good IVF outcome increased when the slow or sluggish progressive motility level was low, but when the slow or sluggish progressive motility level was too high, it led to a poor IVF outcome (see Supplementary Fig. 4). For rapidly progressive motility levels, individuals in the third quartile of had better IVF outcomes than those in the lowest quartile, although the significance was diminished after adjusting for covariates in Model 1 and Model 2 (Fig. 3 and Supplementary Table 4). For semen concentration, individuals in the second quartile had better live birth outcomes than those in the lowest quartile (OR = 1.38; 95% CI, 1.01–1.91) (Fig. 3 and Supplementary Table 4).

Multivariable clusters and IVF outcomes

As the primary outcome, the live birth rates for the first fresh transfer IVF cycle were 47.7%, 45.9%, 39.2%, and 34.8% from Cluster 1 to Cluster 4, respectively (Table 2). No statistically significant differences in IVF outcomes were observed between Cluster 1 (low-level DFI/high-level semen parameter group) and Cluster 2 (low-level DFI/median-level semen parameter group). In Model 2, the odds of live birth, clinical pregnancy, and positive β-hCG outcomes were lower in Cluster 3 (low-level DFI/low-level semen parameter group) than in Cluster 1, with ORs (95% CI) of 0.733 (0.537, 0.998), 0.720 (0.530, 0.977), and 0.733 (0.539, 0.995), respectively. Compared with Cluster 1, Cluster 4 (high-level DFI/low-level semen parameter group) had even lower odds of live birth, clinical pregnancy, and positive β-hCG outcomes, with ORs (95% CI) of 0.620 (0.394, 0.967), 0.592 (0.381, 0.914), and 0.587 (0.379, 0.906), respectively, in Model 2. The results are provided in Table 2 and Fig. 4.

Table 2 The crude and multi-variate adjusted odds ratios (95% CIs) of IVF outcomes in relation to the multi-variable co-exposure clustersFig. 4figure 4

Results of live birth, clinical pregnancy, and β-hCG positive odds ratios (95%CI) across the 4 clusters. Model 1 was adjusted for duration of the attempt to conceive, female age, male age, female BMI, and male BMI; model 2 was additionally adjusted for controlled ovulation stimulation protocols, AMH, E2, FSH, endometrial thickness, and numbers of oocytes retrieved; C1, cluster 1 (low-level DFI/high-level semen parameter group); C2, cluster 2 (low-level DFI/median-level semen parameter group); C3, cluster 3 (low-level DFI/low-level semen parameter); C4, cluster 4 (high-level DFI/low-level semen parameter)

Mediation analysis of the association of IVF outcomes with clusters and fertilization rates

Supplementary Table 5 presents the results of the mediation analyses of the association of IVF outcomes with clusters and fertilization rates, adjusted for demographic characteristics and ovulation stimulation-related factors. The estimated ACMEs in Cluster 3 and Cluster 4 were statistically significantly different from zero compared to those in Cluster 1 (for instance, -0.02 (-0.04 ~ 0.00) and -0.06 (-0.12 ~ -0.01) for the live birth outcome, respectively), although the estimated average direct and total effects were not. As an example of the live birth outcome, the proportion of the mediation effect was 24.8% (21.2% ~ 27.9%) and 44.1% (41.0% ~ 48.1%) in Cluster 3 and Cluster 4, respectively, compared with Cluster 1.

留言 (0)

沒有登入
gif