The number of papers presenting machine learning (ML) models that are being submitted to and published in the Journal of Medical Internet Research and other JMIR Publications journals has steadily increased over time. The cross-journal JMIR Publications e-collection “Machine Learning” includes nearly 1300 articles as of April 1, 2024 [], and there are additional sections in other journals, which collate articles related to the field (eg, “Machine Learning from Dermatological Images” [] in JMIR Dermatology). From 2015 to 2022, the number of published articles with “artificial intelligence” (AI) or “machine learning” in the title and abstract in JMIR Publications journals increased from 22 to 298 (13.5-fold growth), and there are already 312 articles in 2023 (14-fold growth). For JMIR Medical Informatics, the number of articles increased from 10 to 160 (16-fold growth) until 2022. This is consistent with the growth in the research and application of medical AI in general where a similar PubMed search (with the keyword “medicine”) revealed a 22-fold growth (from 640 to 14,147 articles) between 2015 and 2022, and there are already 11,272 matching articles in 2023.
Many papers reporting the use of ML models in medicine have used a large clinical data set to make diagnostic or prognostic predictions [-]. However, the use of data from electronic health records and other resources is often not without pitfalls as these data are typically collected and optimized for other purposes (eg, medical billing) [].
Editors and peer reviewers involved in the review process for such manuscripts often go through multiple review cycles to enhance the quality and completeness of reporting []. The use of reporting guidelines or checklists can help ensure consistency in the quality of submitted (and published) scientific manuscripts and, for instance, avoid instances of missing information. For example, in the experiences of the editors-in-chief of JMIR AI, missing information is especially notable because for manuscripts reporting on ML models, which are submitted to JMIR AI, this can delay the overall review interval by adding more revision cycles.
According to the EQUATOR (Enhancing the Quality and Transparency of Health Research) network, a reporting guideline is “a simple, structured tool for health researchers to use while writing manuscripts. A reporting guideline provides a minimum list of information needed to ensure a manuscript can be, for example: understood by a reader, replicated by a researcher, used by a doctor to make a clinical decision, and included in a systematic review” []. These can be presented in the form of a checklist, flow diagram, or structured text.
In this Editorial, we discuss the general JMIR Publications policy regarding authors’ application of reporting guidelines. We then focus specifically on the reporting of ML studies in JMIR Publications journals.
Accumulating evidence suggests that when authors apply reporting guidelines and reporting checklists in health research, they can be beneficial for authors, readers, and the discipline overall by enabling the replication or reproduction of studies. Recent evidence suggests that asking reviewers to use reporting checklists, instead of authors, offers no added benefits regarding reporting quality []. However, Botos [] reported a positive association between reviewer ratings of adherence to reporting guidelines and favorable editorial decisions, while Stevanovic et al [] reported a significant positive correlation between adherence to reporting guidelines and citations and between adherence to reporting guidelines and publication in higher-impact-factor journals.
JMIR Publications’ editorial policy recommends that authors adhere to applicable study design and reporting guidelines when preparing manuscripts for submission []. Authors should note that most reporting guidelines are strongly recommended, particularly because they can improve the quality, completeness, and organization of the presented work. At this time, JMIR Publications requires reporting checklists to be completed and supplied as multimedia appendices for randomized controlled trials without [-] or those with eHealth or mobile health components [], systematic and scoping literature reviews across the portfolio, and Implementation Reports in JMIR Medical Informatics []. Although some medical journals have mandated the use of certain reporting guidelines and checklists, JMIR Publications recognizes that authors may have concerns about the additional burden that the formalized use of checklists may bring to the submission process. As such, JMIR Publications has chosen to begin recommending the use of ML reporting guidelines and will evaluate their benefits and gather feedback on implementation costs before considering more stringent requirements.
Regarding the reporting of prognostic and diagnostic ML studies, multiple directly relevant checklists have been developed. Klement and El Emam [] have consolidated these guidelines and checklists into a single set that we refer to as the Consolidated Reporting of Machine Learning Studies (CREMLS) checklist. CREMLS serves as a reporting checklist for journals publishing research describing the development, evaluation, and application of ML models, including all JMIR Publications journals, which have officially adopted these guidelines. CREMLS was developed by identifying existing relevant reporting guidelines and checklists. The initial item list was identified through a structured literature review and expert curation, and then the quality of the methods used for their development was assessed to narrow them down to a high-quality subset. This high-quality item subset was further filtered to reveal those that meet specific inclusion and exclusion criteria. The resultant items were converted to guidelines and a checklist that was reviewed by the editorial board of JMIR AI, followed by a preliminary application to assess articles published in JMIR AI. The final checklist offers present-day best practices for high-quality reporting of studies using ML models.
Examples of the application of the CREMLS checklist are presented in . In doing so, we identified 7 articles published in JMIR Publications journals, which exemplify each checklist item. Note that not all of the items are relevant to each article, and some articles are particularly good examples of how to operationalize a checklist item.
Table 1. Illustration of how various articles published in JMIR Publications journals implement each of the CREMLS (Consolidated Reporting of Machine Learning Studies) checklist items.Item numberItemExample illustrating the itemStudy detailsaML: machine learning.
bAUC: area under the curve.
cSMOTE: synthetic minority oversampling technique.
dSHAP: Shapely additive explanations.
We strongly advise authors who seek to submit their manuscripts describing the development, evaluation, and application of ML models to the Journal of Medical Internet Research, JMIR AI, JMIR Medical Informatics, or other JMIR Publications journals to adhere to the CREMLS guidelines and checklist to ensure that they have considered and addressed all relevant details for their work before initiating their submission and review process. More complete and high-quality reporting benefits the authors by accelerating the review cycle and reducing the burden on reviewers. Hence, the need exists for reporting guidelines and checklists for papers describing prognostic and diagnostic ML studies. This is expected to assist, for example, in reducing missing documentation on hyperparameters for an ML model and to clarify how data leakage was avoided. We have observed that peer reviewers have, in practice, been asking authors to improve reporting on the same topics covered in the CREMLS checklist. This is not a surprise given that peer reviewers are experts in the field and would note important information that is missing. Nevertheless, we would encourage reviewers to use the checklist regularly to ensure completeness and consistency.
The CREMLS checklist’s scope is limited to ML models using structured data that are trained and evaluated in silico and in shadow mode. This provides a significant opportunity to expand on the CREMLS to different data modalities and additional phases of model deployment. Should such extended reporting guidelines and checklists be developed, they may be considered for recommendation for submissions to JMIR Publications journals, incorporating lessons learned from the initial checklist for studies reporting the use of ML models.
There is evidence that the completeness of reporting of research studies is beneficial to the authors and the broader scientific community. For prognostic and diagnostic ML studies, many reporting guidelines have been developed, and these have been consolidated into CREMLS, capturing the combined value of the source guidelines and checklists in one place. In this Editorial, we extend journal policy and recommend that authors follow these guidelines when submitting articles to journals in the JMIR Publications portfolio. This will improve the reproducibility of research studies using ML methods, accelerate review cycles, and improve the quality of published papers overall. Given the rapid growth of studies developing, evaluating, and applying ML models, it is important to establish reporting standards early.
KEE and BM conceptualized this study and drafted, reviewed, and edited the manuscript. TIL and GE reviewed and edited the manuscript. WK prepared the literature summary and reviewed the manuscript.
KEE and BM are co–editors-in-chief of JMIR AI. KEE is the cofounder of Replica Analytics, an Aetion company, and has financial interests in the company. TIL is the scientific editorial director at JMIR Publications, Inc. GE is the executive editor and publisher at JMIR Publications, Inc, receives a salary, and owns equity.
Edited by T Leung; This is a non–peer-reviewed article. submitted 04.04.24; accepted 04.04.24; published 02.05.24.
©Khaled El Emam, Tiffany I Leung, Bradley Malin, William Klement, Gunther Eysenbach. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 02.05.2024.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
留言 (0)