The Utility of AI in Writing a Scientific Review Article on the Impacts of COVID-19 on Musculoskeletal Health

The main objective for this project was to determine whether AI could increase the efficiency in writing a scientific review article. To accomplish this, three approaches were taken to writing the first draft of scientific review articles as detailed before (e.g., human, AI-only, and AI-assisted). Of note, both of the AI approaches required humans to write the queries. However, only the human and AI-assisted articles were completed as there was too much overlap of the AI-only with AI-assisted due to the knowledge cutoff of ChatGPT.

With respect to the total time spent writing the COVID-19 and musculoskeletal health review articles, the human-only manuscript required less total time to complete at 114.66 h as compared to 219.09 h for the AI-assisted manuscript. In reviewing the data in Table 1, the higher number of hours spent during the student and faculty editing phase for the AI-assisted manuscript compared to the other two approaches may reflect the scientific writing experience of the first author. Indeed, the first author of the human-only manuscript is a senior postdoctoral fellow with several first author published manuscripts. The first author of the AI-only manuscript is a senior PhD student with 2 first author papers, whereas the first author of the AI-assisted manuscript is a medical student with only 1 previous first author manuscript. That said, it was found that ChatGPT 4.0 had a tendency to write broad, generalized statements without supporting facts and used many of the words of its limited word count on transition and concluding sentences. Additionally, even when given original research articles, ChatGPT 4.0 would frequently present the conclusion of the article but leave out details as far as the experimental design and specific results. When one paper in the section discussing the effect of SARS-CoV-2 infection on bone in animal models had differing results from the other papers, ChatGPT 4.0 could not provide a possible reason for the differing results and thus misreported the paper’s results to agree with the others. Furthermore, the chatbot was told to cite a specific article in a specific section but would not always identify and concisely summarize the relevant information from the article. For example, when prompted to cite a paper discussing various osteoporosis treatments including targeting the NLRP3 inflammasome that the human-written paper used to emphasize the potential role of the NLRP3 inflammasome in bone loss, ChatGPT 4.0 described osteoporosis and discussed treatments without specifically mentioning the NLRP3 inflammasome. Overall, the initial draft was not viewed as good scientific writing by the co-authors of the paper and had to be modified to assess the research results more critically. This was likely another reason for the increased amount of time spent on writing and editing. The readers are encouraged to examine Supplementary Material 5 for the initial draft of the AIA paper that was more reflective of the writing of ChatGPT 4.0. It remains to be seen whether the total writing and editing time would decrease for ChatGPT 4.0 for a subsequent paper once it had determined the preferences of a particular user or as the user gave more feedback and designed better queries. We did find that we received better results when we clicked like or dislike or gave specific feedback. Further, it was important to keep all of the queries in the same chat as that allowed the AI to learn from the previous responses. Having to give the feedback and fine-tune the results did add to the total time to complete the current task but may have reduced the time in the long run.

Another observation from reviewing Table 1 is the significant time spent on fact-checking the AI-only and AI-assisted manuscripts and the combined student and faculty edits were higher in the AI-assisted versus human-only approaches. Of note, the AI-only approach did not have complete student edits or any faculty edits as it was during this time that it came to light how similar approaches 2 and 3 were becoming for this topic, and it was at this point approach 2 was abandoned. With such extensive fact-checking, it was determined that the AI-only manuscript had the highest number of inaccuracies with 70.8% of references having errors including misattributions. The AI-assisted manuscript was better, with only 20.2% of the references being misattributed, but this high error rate is unacceptable in scientific writing. This large discrepancy between approaches 2 and 3 is mostly due to the AI-assisted approach using the human-assigned references and therefore, ChatGPT 4.0 was not given the opportunity to fabricate references. However, this did not prevent instances of plagiarism and misattribution. Moreover, when we subjected the initial manuscript drafts to plagiarism detection software, the AI-assisted manuscript had a similarity index of 25% which was much higher than the human manuscript which was only 8%, suggesting a higher probability for plagiarism in the AI-assisted manuscript. This may be due to the inherent methodology of the AI-assisted approach which consisted of querying ChatGPT 4.0 to give summations of the articles to generate the manuscript. Of interest, the similarity index increased to 13% for the final draft of the human-only approach and decreased to 19% for the final draft of the AI-assisted approach. The former likely reflects the numerous edits from other co-authors focused around the addition of specific new articles or ideas. The latter may reflect the numerous edits from the other co-authors addressing and reducing the incidence of AI plagiarism.

One of the greatest hinderances faced when writing the AI-only review article was the knowledge cutoff date of September 2021, for ChatGPT 4.0 at the time we used it. This made it especially difficult to write a well-informed manuscript. Indeed, most of the literature on the topic was after this date, limiting the utility of ChatGPT in writing the manuscript without significant human assistance (e.g., providing references). Therefore, when considering use of ChatGPT for any topic, it would first be important to determine whether there is an established pool of knowledge on the topic before the knowledge cutoff date or whether this limitation has been eliminated.

There were some interesting discoveries noted while completing the initial fact-checks of the AI-only manuscript prior to its being abandoned that are worth detailing for the interested reader. Many of the references generated by the AI were claimed to be published in 2021 and 2022. However, when fact-checked, a majority of the citations the chatbot claimed to be published in 2021 were in fact not published in 2021 and some of the references were published as far back as 2008 (prior to the COVID-19 pandemic). The articles ChatGPT claimed to be published in 2022 were easily identified as either incorrectly cited or not existing as it was past the knowledge cutoff date. Due to this occurrence, we speculate that ChatGPT may be aware of its own knowledge cutoff and falsely cited the year of publication for these references to possibly compensate for this limitation.

ChatGPT had a propensity to fabricate information, creating a system where misinformation is presented as fact and misleading the user into believing the information provided is in fact true. The act of fabrication of information by AI has been termed as a “hallucination” or “artificial hallucination” [15, 16]. Hence, it has been established that in order to responsibly use ChatGPT for writing any piece of literature one must be critical in fact-checking the information synthesized from the AI. Even in a task as simple as requesting a list of reference to support an idea, it is imperative to validate that the given citations exist, are correct, and are actually relevant to the idea they are meant to support.

Generating the AI-assisted manuscript proved a host of challenges that were unique from the AI-only manuscript. The author for this review had little previous experience with using ChatGPT and consulted with a colleague with more experience for assistance through the writing process. The open-access version of ChatGPT, GPT-3.5, cannot read PDFs. Thus, a limitation to writing this manuscript was having to pay for the premium, GPT-4 version of the language model. Only through this paid subscription were the authors able to access the plugin “AskYourPDF,” which generated a unique ID for each PDF. This process required the author to categorize each code with the appropriate PDF and was a time-consuming task. The author found that when more sources were uploaded, GPT-4 became less reliable when citing where the information had been pulled from. Therefore, a limit of 8–10 articles per subsection was set as a way to mitigate this issue. During the process of editing the AI-assisted manuscript, there were multiple instances of the AI plagiarizing the title of articles in the summaries in an attempt to make it seem like a newly synthesized idea. Moreover, there were instances of the AI plagiarizing partial sentences from within the same articles, but joined them together in perhaps an attempt to avoid detection. When asked to decrease the word count of a previously generated section while preserving sources, the AI had a tendency to inappropriately group citations following a sentence, introducing another source of error.

Ultimately, with the current limitation of AI, we argue it is not possible to write an accurate, well-informed, critical scientific review solely with ChatGPT. Indeed, due to concerns such as plagiarism, depth of content, and artificial hallucinations, it would not be advised. Despite these concerns, utilizing AI as an assistant when writing scientific reviews may be possible with caveats. Perhaps the most important caveat would be to combine AI writing with strict human oversight. However, while there is no guarantee that using AI would make the overall process faster, it could make parts of the writing process faster. For example, when prompted to create an outline when initially writing the AI-only paper, the initial outline generated was deemed acceptable with minor revisions. ChatGPT could also be used as a source to overcome writer’s block and may be particularly useful to those for which English is not their native language. Importantly, ChatGPT 4.0 contains a growing number of plugins that are able to streamline the process of writing a review from reading PDFs for literature reviews to providing summaries of reference material. Thus, the expansion of these capabilities could lead to a future where the need for extensive human intervention is more limited.

留言 (0)

沒有登入
gif