Cross‐Cohort Automatic Knee MRI Segmentation With Multi‐Planar U‐Nets

Single-Cohort Experiments

Table 3 summarizes the segmentation performance of the MPUnet, KIQ, and 2D U-Net methods on all three study cohorts (see Tables 1 and 2). When trained on the same number of samples, the MPUnet performed significantly better in terms of the mean macro Dice scores (mean across compartments and patients) on the OAI dataset compared to KIQ (urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0026 vs. urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0027, urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0028), the 2D U-Net (urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0029 vs. urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0030, urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0031), and the single-view MPUnet (urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0032 vs. urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0033, urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0034. The MPunet performed significantly better on the CCBR dataset compared to KIQ (urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0035 vs. urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0036, urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0037), the 2D U-Net (urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0038 vs. urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0039, urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0040), and the single-view MPunet (urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0041 vs. urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0042, urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0043). The MPUnet performed significantly better on the PROOF dataset compared to the 2D U-Net (urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0044 vs. urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0045, urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0046) and the single-view MPunet (urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0047 vs. urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0048, urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0049) and indifferent from KIQ (urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0050 vs. urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0051, urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0052).

TABLE 3. Single-Cohort Experiments: Segmentation Performance Across Subjects for the MPUnet, Single-View MPUnet, 2D U-Net, and KIQ Methods on the OAI, CCBR, and PROOF Cohorts Dataset Method Eval. Type Eval. Images Tibia Bonea Tibial Medial Cartilage Tibial Lateral Cartilage Femoral Medial Cartilage Femoral Lateral Cartilage Patellar Cartilage Medial Meniscus Lateral Meniscus Macro Dice CCBR KIQ Fixed split 110 — 0.83 ± 0.06 — 0.79 ± 0.06 — — — — 0.81 ± 0.06 0.47 0.52 0.57 P < 0.05 P < 0.05 P < 0.05 2D U-Net Fixed split 110 — 0.83 ± 0.06 — 0.81 ± 0.05 — — — — 0.82 ± 0.05 0.57 0.64 0.64 P = 0.18 P < 0.05 P < 0.05 MP (V = 1) Fixed split 110 — 0.82 ± 0.06 — 0.80 ± 0.06 — — — — 0.81 ± 0.06 0.60 0.57 0.59 P < 0.05 P < 0.05 P < 0.05 MP (V = 6) Fixed split 110 — 0.84 ± 0.04 — 0.82 ± 0.05 — — — — 0.83 ± 0.04 0.68 0.65 0.69 MP (V = 6) 5-CV 140 — 0.85 ± 0.04 — 0.83 ± 0.04 — — — — 0.84 ± 0.04 0.65 0.68 0.68 OAI KIQ Fixed split 44 0.98 ± 0.00 0.84 ± 0.05 0.89 ± 0.04 0.83 ± 0.05 0.86 ± 0.04 0.78 ± 0.11 0.80 ± 0.10 0.86 ± 0.04 0.84 ± 0.04 0.98 0.69 0.73 0.68 0.73 0.40 0.34 0.75 0.72 P < 0.05 P < 0.05 P < 0.05 P < 0.05 P < 0.05 P < 0.05 P < 0.05 P < 0.05 P < 0.05 2D U-Net Fixed split 44 0.89 ± 0.01 0.85 ± 0.05 0.89 ± 0.03 0.85 ± 0.05 0.88 ± 0.04 0.81 ± 0.12 0.82 ± 0.07 0.87 ± 0.03 0.85 ± 0.03 0.87 0.71 0.80 0.65 0.74 0.33 0.57 0.79 0.77 P < 0.05 P = 0.16 P = 0.06 P < 0.05 P = 0.09 P < 0.05 P < 0.05 P < 0.05 P < 0.05 MP (V = 1) Fixed split 44 0.98 ± 0.0 0.84 ± 0.05 0.89 ± 0.04 0.84 ± 0.05 0.86 ± 0.05 0.82 ± 0.10 0.82 ± 0.07 0.88 ± 0.04 0.85 ± 0.03 0.98 0.68 0.75 0.61 0.70 0.51 0.60 0.74 0.76 P < 0.05 P < 0.05 P = 0.06 P < 0.05 P < 0.05 P < 0.05 P < 0.05 P < 0.05 P < 0.05 MP (V = 6) Fixed split 44 0.98 ± 0.0 0.85 ± 0.05 0.90 ± 0.04 0.86 ± 0.05 0.88 ± 0.04 0.83 ± 0.11 0.83 ± 0.06 0.89 ± 0.03 0.86 ± 0.03 0.98 0.72 0.79 0.66 0.73 0.26 0.66 0.82 0.75 MP (V = 6) 5-CV 176 (174) — 0.85 ± 0.05 0.89 ± 0.03 0.88 ± 0.03 0.81 ± 0.10 0.82 ± 0.07 0.87 ± 0.03 0.85 ± 0.03 0.67 0.76 0.71 0.26 0.55 0.66 0.70 PROOF KIQ 25-CV 25 0.96 ± 0.02 0.79 ± 0.06 0.76 ± 0.09 0.77 ± 0.10 0.80 ± 0.05 0.72 ± 0.11 — — 0.77 ± 0.07 0.91 0.61 0.41 0.44 0.64 0.36 0.52 P < 0.05 P = 0.97 P = 0.17 P = 0.09 P < 0.05 P < 0.05 P = 0.10 2D U-Netb 25-CV 25 0.97 ± 0.01 0.73 ± 0.09 0.67 ± 0.11 0.73 ± 0.08 0.75 ± 0.07 0.76 ± 0.07 — — 0.73 ± 0.07 0.94 0.48 0.39 0.52 0.59 0.60 0.54 P < 0.05 P < 0.05 P < 0.05 P < 0.05 P < 0.05 P < 0.05 P < 0.05 MP (V = 1) 25-CV 25 0.95 ± 0.05 0.76 ± 0.09 0.69 ± 0.15 0.75 ± 0.09 0.77 ± 0.09 0.78 ± 0.06 — — 0.75 ± 0.08 0.75 0.41 0.20 0.48 0.38 0.60 0.50 P < 0.05 P < 0.05 P < 0.05 P < 0.05 P < 0.05 P = 0.19 P < 0.05 MP (V = 6) 25-CV 25 0.96 ± 0.02 0.79 ± 0.06 0.72 ± 0.13 0.78 ± 0.08 0.80 ± 0.07 0.79 ± 0.07 — — 0.78 ± 0.07 0.89 0.63 0.29 0.50 0.47 0.59 0.56 MP (V = 6) 25-CV + 88 OAI 25 — 0.78 ± 0.08 0.73 ± 0.13 0.79 ± 0.09 0.83 ± 0.04 0.81 ± 0.04 — — 0.79 ± 0.06 0.53 0.26 0.45 0.67 0.70 0.60 P = 0.58 P = 0.44 P = 0.09 P < 0.05 P < 0.05 P < 0.05 Individual scores where the other models score better than the MPUnet are marked in bold. Accuracy is given as the Dice volume overlap showing mean ± SD and minimum values. P-values for the paired, two-sided Wilcoxon signed-rank statistic are shown for all compartments comparing the MPUnet performance against itself when trained on additional data, and the KIQ method, the single-view MPUnet and the 2D U-Net when evaluated on identical dataset. CV = cross validation; LOO = leave one out (number of CV folds identical to the number of evaluation images); OAI = Osteoarthritis Initiative; CCBR = Center for Clinical and Basic Research; PROOF = Prevention of OA in Overweight Females.

Table 3 also details the performance of all methods on each individual compartment across the three datasets and shows the minimal Dice scores observed for the compartment across all subjects in the cohort. Across a total of 14 segmentation compartments (tibia bone excluded as it is easily segmented by all methods), the MPUnet performed significantly better than the KIQ model on 11 compartments (TMC, TLC, FMC, FLC, PC, MM, and LM on OAI; TMC and FMC on CCBR; FLC and PC on PROOF; urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0053 for all) and with no significant difference on the remaining 3 (TMC, urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0054, TLC, urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0055, and FMC, urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0056, all on PROOF). The MPUnet performed significantly better than the Paniflov 2D U-Net on 10 compartments (FMC, PC, MM, and LM on OAI; FMC on CCBR; TMC, TLC, FMC, FLC, and PC on PROOF; urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0057 for all) and with no significant difference on the remaining 4 (TMC, urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0058, TLC, urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0059, FLC, urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0060 on OAI; TMC, urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0061 on CCBR). The MPUnet performed significantly better than its single-view counterpart on 12 compartments (TMC, FMC, FLC, PC, MM, and LM on OAI; TMC and FMC on CCBR; TMC, TLC, FMC, and FLC on PROOF; urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0062 for all) and with no significant difference on the remaining 2 (TLC, urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0063 on OAI; PC, urn:x-wiley:10531807:media:jmri27978:jmri27978-math-0064 on PROOF). None of the other models performed significantly better than the MPUnet on any compartment.

Table 4 details the performance of each model on the CCBR, OAI, and PROOF datasets grouped by KL grade assessments of each scan. Figure 3 shows box-plot Dice score distributions for each compartment of the CCBR dataset as segmented by the MPUnet, KIQ, and 2D U-Net models similarly grouped by KL grades. Box-plot figures for the OAI and PROOF datasets are shown in Figs. S2 and S3 in the Supplemental Material.

TABLE 4. Single-Cohort Experiments — KL Groups: Segmentation Performance Across Subjects for the MPUnet, Single-View MPUnet, 2D U-Net, and KIQ Methods on the OAI, CCBR, and PROOF cohorts on KL Subgroups Dataset Method Eval. Type Eval. Images KL 0 KL 1 KL 2 KL 3 KL 4 CCBR KIQ Fixed split 50/24/13/22/0 0.84 ± 0.03 0.82 ± 0.03 0.78 ± 0.04 0.75 ± 0.08 — 0.73 0.72 0.68 0.57 P < 0.05 P < 0.05 P < 0.05 P < 0.05 2D U-Net Fixed split 50/24/13/22/0 0.84 ± 0.03 0.83 ± 0.03 0.80 ± 0.04 0.76 ± 0.06 — 0.77 0.75 0.72 0.64 P < 0.05 P = 0.03 P = 0.74 P < 0.05 MP (V = 1) Fixed split 50/24/13/22/0 0.84 ± 0.02 0.83 ± 0.03 0.78 ± 0.04 0.73 ± 0.06 — 0.79 0.77 0.68 0.59 P < 0.05 P < 0.05 P < 0.05 P < 0.05 MP (V = 6) Fixed split 50/24/13/22/0 0.85 ± 0.03 0.84 ± 0.03 0.81 ± 0.02 0.78 ± 0.06 — 0.80 0.77 0.77 0.69 OAI KIQ Fixed split 0/2/10/30/2 — 0.88 ± 0.03 0.84 ± 0.04 0.83 ± 0.04 0.83 ± 0.02 0.86 0.76 0.72 0.82 P = NA P = 0.23 P < 0.05 P = NA 2D U-Net Fixed split 0/2/10/30/2 — 0.87 ± 0.03 0.85 ± 0.04 0.85 ± 0.03 0.86 ± 0.02 0.85 0.78 0.77 0.85 P = N/A P = 0.16 P < 0.05 P = NA MP (V = 1) Fixed split 0/2/10/30/2 — 0.87 ± 0.02 0.84 ± 0.03 0.85 ± 0.03 0.86 ± 0.01 0.85 0.77 0.76 0.85 P = NA P < 0.05 P < 0.05 P = NA MP (V = 6) Fixed split 0/2/10/30/2 — 0.88 ± 0.03 0.86 ± 0.03 0.86 ± 0.04 0.87 ± 0.01 0.86 0.78 0.75 0.87 PROOF KIQ 25-CV 12/11/1/1/0 0.76 ± 0.06 0.77 ± 0.09 0.81 ± 0.00 0.80 ± 0.00 — 0.63 0.52 0.81 0.80 P = 0.08 P = 0.41 P = N/A P = N/A 2D U-Neta 25-CV 12/11/1/1/0

留言 (0)

沒有登入
gif