The increasing adoption of ambient artificial intelligence (AI) scribes in healthcare has created an urgent need for robust evaluation frameworks to assess their performance and clinical utility. While these tools show promise in reducing documentation burden, there remains no standardized approach for measuring their effectiveness and safety. This manuscript presents a review of existing evaluation frameworks and metrics used to assess AI-assisted medical note generation from doctor-patient conversations. We analyze eight studies across multiple institutions that evaluated ambient scribe technology between 2022-2024, comparing their evaluation methods and metrics. Our analysis reveals significant variations in evaluation approaches, from traditional natural language processing metrics like ROUGE and BERTScore to domain-specific measures such as clinical accuracy and bias. Critical gaps identified include a wide diversity of evaluation metrics that makes evaluation across studies challenging, limited integration of clinical relevance, and lack of standardized approaches for crucial metrics like hallucinations, errors and bias. This work provides a foundation for more rigorous and consistent evaluation of ambient scribe technology as its adoption continues to expand in healthcare settings.
Competing Interest StatementDr. Gebauer serves as a consultant to several healthcare AI companies including TORTUS AI in the past. None of these companies reviewed this manuscript, provided feedback, or were aware of it prior to publication.
Funding StatementThis study did not receive any funding.
Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data AvailabilityAll data referenced in the present work are publicly available at the referenced locations in the manuscript.
留言 (0)