Preovulatory progesterone levels are the top indicator for ovulation prediction based on machine learning model evaluation: a retrospective study

Patients’ information and comparison

A total of 1632 records, out of 771 patients, were included in the study. These records were categorized based on the time before ovulation, with 306 records from 72 h before ovulation, 598 records from 48 h before ovulation, and 728 records from 24 h before ovulation. Various characteristics such as age, BMI, follicle diameter, E2 levels, P4 levels, and LH levels were utilized to establish a predictive model. The records were further divided into three groups, namely the 72 h group, the 48 h group, and the 24 h group, based on the time before ovulation. A comparative analysis was conducted to examine the follicle diameter and hormone levels at each point. The basic demographic characteristics, such as age and BMI, were compared among the three groups. The results presented in Table 1; Fig. 1 indicated that there were no significant differences in age and BMI between the three groups. However, there were significant differences in follicle diameter among the three groups (p-overall < 0.001). Specifically, the E2 levels in the 48 h group were significantly higher than those in the 72 h and 24 h groups, no significant difference was found between the 72 h and 24 h groups (p-value = 0.511). The LH levels exhibited significant differences among the three groups (all p-values < 0.001), progressively increasing from 72 h to 24 h prior to ovulation. Similarly, the P4 levels displayed significant differences among the three groups (all p-values < 0.001), showing an increase from 72 h to 24 h before ovulation.

Table 1 Baseline characteristics, follicle diameter, and hormone levels of records grouped by ovulation timingFig. 1figure 1

Box plots and Violin plots showing the baseline characteristics, follicle diameter, and hormone levels of all records at three time points before ovulation. The bottom line, middle line, and upper line of the box represents the first quartile, the medium, and the third quartile of the variable. Outliers marked by dots that are either 1.5*IQR or more above the third quartile or 1.5*IQR or more below the first quartile. Violin plots showed the distribution of each variable. 72 h: in 72 h before ovulation; 48 h: in 48 h before ovulation; 24 h: in 24 h before ovulation

Characteristics of the dynamic changes in follicle diameter and hormone levels prior to and following ovulation

In order to provide a more precise depiction of the dynamic changes in follicle diameter and hormone levels prior to and following ovulation, our analysis exclusively incorporated patients who had measurements for both variables at all four time points: 72 h before ovulation, 48 h before ovulation, 24 h before ovulation, and the day of ovulation. A total of 84 patients met this criterion and were included in the study. According to the data presented in Fig. 2, the follicle diameter exhibits a continuous increasing within 72 to 24-hour timeframe preceding ovulation (Fig. 2A and E). Additionally, the P4 levels demonstrate an increase from 72 h prior to ovulation until the day of ovulation (Fig. 2B and F). Furthermore, the E2 levels display an initial increase from 72 to 48 h before ovulation, followed by a decrease within the 48 to 24-hour period preceding ovulation, and a continued decrease post-ovulation. Notably, there is no significant difference in E2 levels between the 72 h and 24 h time points before ovulation (Fig. 2C and G). The LH levels exhibited increase from 72 to 24 h prior to ovulation, followed by a significant decrease on the day after ovulation. There was no significant difference in LH levels between the 72 h before ovulation and the day of ovulation (Fig. 2D and H). To account for potential variations in hormone levels and follicle diameter among individuals, we additionally presented paired line plots depicting the trends of follicle diameter and hormone levels at four specific time points for each individual (Fig. 2I and L), each line represents the trend of an individual’s follicle diameter or hormone levels at four time points. According to Fig. 2I, the follicle diameter of the majority of patients exhibits increase from 72 h to 24 h prior to ovulation, reaching its peak at 24 h before ovulation. Figure 2J demonstrates that the P4 level of most patients shows increase from 72 h before ovulation until the day of ovulation day, with minimal variability. Figure 2K reveals a significant variability in the timing of the E2 peak, as some patients reach their peak at 48 h while others reach it at 24 h. Figure 2K also demonstrates a notable variability in the timing of the LH peak, with some patients reaches their peak at 48 h while others reach it at 24 h.

Fig. 2figure 2

Dynamic changes of follicle diameter and hormone levels over time before 72 h within ovulation to ovulation day in selected patients with complete data at all four time point. A-D Box plots describe the follicle diameter and hormone levels at four time points. The bottom line, middle line, and upper line of the box represents the first quartile, the medium, and the third quartile of the variable. Outliers marked by dots that are either 1.5*IQR or more above the third quartile or 1.5*IQR or more below the first quartile. E-H Statistical tables describe the diameter and hormone levels at four time points. The different color in the table indicates that the p value of the two group comparison was statistical significant, while the same color color in the table indicates that the p value of the two group comparison was not statistical significant. The statistical description were presented by median [interquartile range]. I-L Paired line plots describe per individual’s follicle diameter and hormone levels over four time points.72 h: in 72 h before ovulation; 48 h: in 48 h before ovulation; 24 h: in 24 h before ovulation; 0 h: the day of ovulation

Classification trees model

A total of 1306 records were utilized in the training dataset to train a categorical regression model for predicting ovulation timing. The dataset consisted of 252 records for a 72-hour timeframe, 477 records for a 48-hour timeframe, and 577 records for a 24-hour timeframe. Additionally, a validation dataset comprising 326 records was employed, with 54 records for a 72-hour time-frame, 121 records for a 48-hour timeframe, and 151 records for a 24-hour timeframe. The classification trees analysis conducted on the training model (Fig. 3) revealed that a preovulatory P4 level of ≥ 0.65 ng/ml indicates a high probability of ovulation occurring within 24 h. However, when the preovulatory P4 level falls between 0.45 and 0.65 ng/ml, it is recommended to combined with E2 levels for accurate prediction of ovulation timing. A preovulatory P4 level ranging from 0.45 to 0.65 ng/ml, in conjunction with an estradiol (E2) level of ≥ 360.6 pg/ml, serves as a strong indicator that ovulation will take place within 48 h. However, in cases where a reduction in E2 levels following an E2 surge is observed, the presence of a preovulatory P4 level between 0.45 and 0.65 ng/ml, in conjunction with an E2 level < 360.6 pg/ml, suggests a high probability of ovulation occurring on the subsequent day. When the P4 level < 0.45 ng/ml, it can be combined with LH levels to obtain dependable outcomes. If the LH level ≥ 18.05 mIU/ml, there is a high probability of ovulation occurring within 48 h. Conversely, if both LH < 18.05 mIU/ml and P4 < 0.45 ng/ml, it indicates a high probability of ovulation will not taking place within 48 h, an ultrasound scan can be arranged two days later.

The confusion matrix revealed that the classification trees model achieved an overall predictive accuracy of 80.70% on the training dataset. Specifically, the accuracy rates for predicting ovulation within 24 h, 48 h, and 72 h were 92.72%, 67.92%, and 77.38% respectively. Similarly, on the validation dataset, the classification trees model demonstrated an overall predictive accuracy of 78.83%. The accuracy rates for predicting ovulation within 24 h, 48 h, and 72 h were 92.72%, 63.64%, and 74.07% respectively (Table 2). According to the findings of the classification trees model, the preovulatory P4 levels exhibit increase leading up to ovulation and emerge as the most significant parameter in predicting the timing of ovulation.

Table 2 Confusion matrix of classification trees modelFig. 3figure 3

Regression tree plot of the classification trees model. P4: progesterone (ng/ml), LH: luteinizing hormone (mIU/ml); E2: estrogen (pg/ml)

Random forest model

We further employed the random forest method to develop an alternative predictive model. The evaluation of this model using confusion matrix revealed a 100% accuracy rate for the training dataset. Moreover, the validation dataset exhibited an accuracy rate of 85.8%. Additionally, the accuracy rates for predicting ovulation within 24 h, 48 h, and 72 h were found to be 96.69%, 74.38%, 77.78% respectively (Table 3). The random forest model has demonstrated a notable enhancement in the accuracy rate compared to the classification trees model. Figure 4 illustrates the ranking of variable importance using the Gini index, revealing that hormone levels such as P4, LH, and E2 are the top three influential variables in predicting ovulation timing. Conversely, variables such as follicle diameter, BMI, and age do not significant important compared to hormone levels. This finding aligns with the classification trees model, which also identifies P4 as the most crucial variable for predicting ovulation time.

Table 3 Confusion matrix of random forest modelFig. 4figure 4

Dot plot showing the importance of variables in predicting ovulation timing by Gini index in random forest model. P4: progesterone, LH: luteinizing hormone; E2: estrogen; BMI: body mass index

Comparison of effectiveness of each variable in predicting ovulation time

To compare the effectiveness of each individual variable in predicting ovulation time, the overall accuracy rate, sensitivity, specificity, PPV, and NPV of P4 were compared to those of LH, E2, and follicle diameter using the classification trees model.

When training predictive model using single P4 levels, the cutoff values are 0.25ng/ml and 0.45ng/ml (Fig. 5A). In the validation dataset, the overall accuracy is 69.33%. The accuracy for predicting ovulation within 24, 48, and 72 h is 95.36%, 38.84%, and 64.81% respectively. Sensitivity values are 76.60%, 70.15%, 49.30%, while specificity values are 94.93%,71.43%, and 92.55% respectively. PPV values are 95.36%, 38.84%, 64.81%, and NPV values are 74.86%, 90.24%, and 86.76% respectively (Table 4). When training predictive model using single LH levels, the cutoff values of LH are 16.75mIU/mL and 30.04 mIU/mL (Fig. 5B). In the validation dataset, the overall accuracy is 69.94%. The accuracy for predicting ovulation within 24, 48, and 72 h is 88.08%, 45.45%, and 74.07% respectively. Sensitivity values are 70.47%, 65.48%, 74.07%, while specificity values are 86.96%,72.73%, 94.85% respectively. PPV values are 88.08%, 45.45%, 74.07%, and NPV values are 68.57%, 85.85%, and 94.84% respectively (Table 4). While P4 and LH had similar overall accuracy, P4 was more effective than LH in predicting ovulation within 24 h. Models using single E2 levels or follicle diameter had low accuracy (< 60%, Table 4). The cutoff values for E2 and diameter were 359.9 pg/ml and 17.75 mm, respectively (Fig. 5C and D).

Table 4 Effectiveness of each variable in predicting ovulation time in the validation dataset using classification trees modelFig. 5figure 5

Regression tree plot of classification trees model with each single variable. A classifications trees model with progesterone level (P, ng/ml). B classifications trees model with luteinizing hormone level (LH, mIU/ml). C classifications trees model with Estradiol level (E2, pg/ml). D classifications trees model with follicle diameter (mm)

留言 (0)

沒有登入
gif