Class imbalance on medical image classification: towards better evaluation practices for discrimination and calibration performance

Yu KH, Beam AL, Kohane IS (2018) Artificial intelligence in healthcare. Nat Biomed Eng 2:719–731

Article  PubMed  Google Scholar 

Beam AL, Manrai AK, Ghassemi M (2020) Challenges to the reproducibility of machine learning models in health care. JAMA 323:305–306

Article  PubMed  PubMed Central  Google Scholar 

Luque A, Carrasco A, Martín A, de las Heras A (2019) The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognit 91:216–231

Article  Google Scholar 

Çallı E, Sogancioglu E, van Ginneken B, van Leeuwen KG, Murphy K (2021) Deep learning for chest X-ray analysis: a survey. Med Image Anal 72:102–125

Article  Google Scholar 

Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM (2017) Chestx-ray8: hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 2097–2106

Irvin J, Rajpurkar P, Ko M, et al (2019) Chexpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In: Proceedings of the AAAI Conference on Artificial Intelligence, 33. AAAI Press, pp 590–597

Erickson BJ, Kitamura F (2021) Magician’s corner: 9. performance metrics for machine learning models. Radiology: Artificial Intelligence 3:e200126

PubMed  PubMed Central  Google Scholar 

de Hond AA, Steyerberg EW, van Calster B (2022) Interpreting area under the receiver operating characteristic curve. Lancet Digital Health 4:e853–e855

Article  PubMed  Google Scholar 

López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141

Article  Google Scholar 

Saito T, Rehmsmeier M (2015) The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets. PLoS One 10:e0118432

Article  PubMed  PubMed Central  Google Scholar 

Ozenne B, Subtil F, Maucort-Boulch D (2015) The precision–recall curve overcame the optimism of the receiver operating characteristic curve in rare diseases. J Clin Epidemiol 68:855–859

Article  PubMed  Google Scholar 

Sahiner B, Chen W, Pezeshk A, Petrick N (2017) Comparison of two classifiers when the data sets are imbalanced: the power of the area under the precision-recall curve as the figure of merit versus the area under the roc curve. In: Medical Imaging 2017: Image Perception, Observer Performance, and Technology Assessment, 10136. International Society for Optics and Photonics, p 101360G

Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5:221–232

Article  Google Scholar 

Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. Proceedings of the 23rd International Conference on Machine Learning, pp 233–240

Varoquaux G, Colliot O (2023) Evaluating machine learning models and their diagnostic value. In: Olivier C (ed) Machine learning for brain disorders. Springer

Kompa B, Snoek J, Beam AL (2021) Second opinion needed: communicating uncertainty in medical machine learning. NPJ Digit Med 4:1–6

Article  Google Scholar 

Blattenberger G, Lad F (1985) Separating the brier score into calibration and refinement components: a graphical exposition. Am Stat 39:26–32

Article  Google Scholar 

Ovadia Y, Fertig E, Ren J et al (2019) Can you trust your model’s uncertainty? Evaluating predictive uncertainty under dataset shift. Advances in neural information processing systems 32

Dawid AP (1982) The well-calibrated Bayesian. J Am Stat Assoc 77:605–610

Article  Google Scholar 

Mukhoti J, Kulharia V, Sanyal A, Golodetz S, Torr PHS, Dokania PK (2020) Calibrating deep neural networks using focal loss. Advances in Neural Information Processing Systems 33:15288–15299

Google Scholar 

Van Calster B, Nieboer D, Vergouwe Y, De Cock B, Pencina MJ, Steyerberg EW (2016) A calibration hierarchy for risk models was defined: from utopia to empirical data. J Clin Epidemiol 74:167–176

Article  PubMed  Google Scholar 

Collins GS, Moons KGM (2019) Reporting of artificial intelligence prediction models. Lancet 393:1577–1579

Article  PubMed  Google Scholar 

Gneiting T, Raftery AE (2007) Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc 102:359–378

Article  CAS  Google Scholar 

Brier GW (1950) Verification of forecasts expressed in terms of probability. Mon Weather Rev 78:1–3

Article  Google Scholar 

Mosquera C, Ferrer L, Milone D, Luna D, Ferrante E (2021) Impact of class imbalance on chest X-ray classifiers: towards better evaluation practices for discrimination and calibration performance. Preprint at https://arxiv.org/abs/2112.12843

Roberts M, Driggs D, Thorpe M et al (2021) Common pitfalls and recommendations for using machine learning to detect and prognosticate for covid-19 using chest radiographs and ct scans. Nat Mach Intell 3:199–217

Article  Google Scholar 

Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newslett 6:20–29

Article  Google Scholar 

Google Machine Learning Foundational Courses (2024) Imbalanced data. Published by Google Developers. https://developers.google.com/machine-learning/data-prep/construct/sampling-splitting/imbalanced-data. Accessed 1 Mar 2024

Cohen JP, Hashir M, Brooks R, Bertrand H (2020) On the limits of cross-domain generalization in automated X-ray prediction. Medical Imaging with Deep Learning (pp 136–155)

Rajpurkar P, Irvin J, Zhu K et al (2017) Chexnet: radiologist-level pneumonia detection on chest X-rays with deep learning. Preprint at https://arxiv.org/pdf/1711.05225

Cohen JP, Bertin P, Frappier V (2019) Chester: A web delivered locally computed chest X-ray disease prediction system. https://arxiv.org/abs/1901.11210

Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 4700–4708

Larrazabal AJ, Nieto N, Peterson V, Milone DH, Ferrante E (2020) Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis. Proc Natl Acad Sci USA 117:12592–12594

Article  CAS  PubMed  PubMed Central  Google Scholar 

Bugnon LA, Yones C, Milone DH, Stegmayer G (2019) Deep neural architectures for highly imbalanced data in bioinformatics. In: IEEE Transactions on Neural Networks and Learning Systems. IEEE

Wallace BC, Dahabreh IJ (2014) Improving class probability estimates for imbalanced data. Knowl Inf Syst 41:33–52

Article  Google Scholar 

García V, Sánchez JS, Mollineda RA (2012) On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowl-Based Syst 25:13–21

Article  Google Scholar 

Godau P, Kalinowski P, Christodoulou E et al (2023) Deployment of image analysis algorithms under prevalence shifts. International Conference on Medical Image Computing and Computer-Assisted Intervention (pp 389–399)

Ramos D, Franco-Pedroso J, Lozano-Diez A, Gonzalez-Rodriguez J (2018) Deconstructing cross-entropy for probabilistic binary classifiers. Entropy 20:208

Article  PubMed  PubMed Central  Google Scholar 

留言 (0)

沒有登入
gif