The Use of Artificial Intelligence in Writing Scientific Review Articles

Human-Generated and Written Review Articles

The traditional methods of writing a review article were employed for the human-generated writing style [13, 16, 19]. Specifically, a comprehensive literature review was completed, and an outline of relevant topics was created to help guide the authors in organizing and focusing their review article. Once a complete first draft was written, the manuscript underwent extensive editing and fact checking by all co-authors. Please see the associated Comment for each of the review articles to see the respective first drafts [21,22,23]. Reference citations were inserted into the manuscript using EndNote. A graphical abstract was created on BioRender.com. The idea of the graphical abstract was conceived by the primary author with input from other co-authors. This article was written and edited entirely by humans.

AI-Only (AIO) Review Articles

An author without an extensive AI background began by experimenting to determine the best queries for generating their AI-only review articles [14, 17]. Please refer to the Comments for each review topic [21,22,23] for a complete listing of all queries used to generate the AIO review articles. The AI model that was used to write first drafts of papers was the ChatGPT Plus version using the GPT-4 language model (OpenAI). Of note, the April 2023–August 2023 version of ChatGPT was utilized for research and text generation. At that time, the knowledge cutoff for ChatGPT was September 2021. As a result, all articles identified during the generation of the human manuscript after this date were uploaded such that ChatGPT could access them. This was accomplished by using the “AskYourPDF” plugin feature that can only be accessed through the paid version of GPT-4 (i.e., not available with the free GPT-3.5 version). Importantly, as most of the COVID-19 and musculoskeletal health articles were published after September 2021, the resulting AIO manuscript essentially became an AI-assisted manuscript (see below) and therefore was abandoned.

The first step of the AI-only paper was to generate an outline and a title for the review article. While each set of authors (for the 3 different review topics) used different query strategies [21,22,23], an example from the “Neural regulation of fracture healing” topic is provided below:

You are a PhD-level biological researcher who has experience writing sophisticated research paper outlines. Write a 2.5-page outline for a paper on the neural regulation of fracture healing. Include one section on an introduction, additional sections on each of the major concepts that will be included in the paper (generate these topics by synthesizing the research conducted to study the various aspects of the nervous system that regulate fracture healing), and one section on the conclusion. Create more detailed, specific sub-sections for all of the sections except for the introduction and conclusion. Include bullet points under each section with the key, detailed facts that will be expanded upon in that section. Each specific, detailed fact should be written as a complete sentence. For the introduction and conclusion, include multiple bullet points (each one specific sentence) that lay out the content of those sections. Based on the outline, generate a witty but informative title for the research paper and include the title at the beginning of the outline.

After some minor edits were made, originally by querying GPT and then eventually by making human edits, this outline was fed back into ChatGPT, and each section of the paper was written using variations on the following query:

Next, use the outline to write Heading 5. Ensure it is at least 300 words in length. Write at the level of a biological researcher and include all citations from the primary articles where you obtain the information in the section. When citing conclusions made by primary sources, expand upon the experiments researchers completed to come to these conclusions. It is imperative that you are very specific. Ensure this section is clear, logical, and flows well.

Due to GPT’s character limit (4096 characters), which restricted the entire paper from being written in a single prompt, new prompts were used to generate each section/subheading and the sections were merged to generate the full manuscript. A new “chat session” was started each time a new section/subheading of the paper was written. Next, each citation generated by GPT was fact checked and replaced by the authors when the citation did not exist or when it did not match the content of the sentence. This was done to ensure that the final version of the manuscript was accurate and suitable for publication and would not mislead readers. Rewrites were completed using ChatGPT, but some human intervention was warranted. Reference citations were inserted into the manuscript using EndNote. Of note, the unedited, first drafts of all articles are provided in the Comment associated with each review topic [21,22,23].

A graphical abstract was attempted using OpenAI’s DALL-E program; however, the quality of the images produced was not publishable. Given this, the idea for a graphical abstract was generated by ChatGPT based on its analysis of the paper’s finalized abstract, and the graphical abstract was created by the authors on BioRender.com (the same process was used for the AI-assisted paper detailed below). Similarly, an attempt was made to query GPT to select which important, recently published sources to annotate, a requirement for publication in Current Osteoporosis Reports. However, due to GPT’s inability to access knowledge after September 2021, it was ultimately decided that the authors would select which articles to annotate based upon general guidance provided by GPT, but to have ChatGPT write the highlight related to the identified reference. A few tips provided by GPT were to annotate sources whose content was specific to the topic of the neural regulation of fracture healing, and to look for sources that had been cited by other papers and that were published in higher impact journals.

AI-Assisted (AIA) Review Articles

For the AI-assisted review articles [15, 18, 20], ChatGPT-4 was used as outlined in the AI-only section above with the following differences. The AI-assisted article utilized the human-generated outline and all the references utilized to generate this article were provided using the AskYourPDF plugin as described above. Specifically, AskYourPDF was used to generate unique codes recognized by ChatGPT which corresponded to each PDF. This enabled the articles to be uploaded for analysis by ChatGPT, so this plugin was vital for this paper to be written. Through trial and error, it was found that ChatGPT was unable to properly analyze multiple codes in a single text box. Each code had to be uploaded to ChatGPT in a separate chat for proper analysis to occur. Slight differences in queries were utilized between authors but the general system described below was used. Again, please see the Comment associated with each review topic to see a complete listing of queries used [21,22,23].

Query 1: I need help writing a subheading of a review article about (paper topic). The subheading I need help writing is (subheading topic). Can I provide you with 10 documents using AskYourPDF that we can use to synthesize this section?

Query 2: Okay, I am going to upload each ID separately so that you may better process the information. After each ID, you may write a short summary of the key findings of that document. After all documents are uploaded, I will ask you to write the review. Are you ready or do you have any questions?

> Proceed to upload each document ID in separate text boxes.

Query 3: Okay that was the last one. I have now provided you with 10 documents (Document 1, Document 2, Document 3, Document 4, Document 5, Document 6, Document 7, Document 8, Document 9, Document 10). Please write an in-depth review of the linkage between (paper topic). It’s okay if your discussion contains information outside of (paper topic), only use these directions as a framework. Write at the level of a doctorate researcher. Use in-text citations when necessary. If there are multiple documents that contain a piece of information, use multiple citations at the end of a sentence.

Query 4: That is perfect! Please condense this review into approximately 300 words while retaining citations from all (number) documents. You do not need to include an introduction or conclusion to this section, focus only on the findings of the documents.

This system was repeated multiple times until the first draft of the paper was created. Reference citations were inserted into the manuscript using EndNote. As with all manuscripts, the paper went through rounds of fact checking and editing by all co-authors. Rewrites were completed using ChatGPT, but some human intervention was required. For the AI-assisted review articles, graphical abstracts were created by humans on BioRender.com, but the concept was provided by ChatGPT based on its analysis of the finished manuscript. Annotated references were those identified during the human-generated review, but ChatGPT generated the statement of significance.

All Papers

A number of parameters were measured and compared between the 3 (or 2 for COVID-19) types of review articles, and other parameters were compared between the first draft and the final draft. The findings for each of these assessments are located in the Comment for each topic [21,22,23].

Assessments included tracking time spent during different stages/activities of the review writing process. This was tracked using the “Toggl” application. Activities were divided into preparation, literature review, writing (which included writing queries for ChatGPT), fact checking, editing, other, and total time spent. Preparation refers to time spent reading articles, watching videos, and experimenting with AI before beginning official query generation. “Other” tracks activities that do not fall into defined categories, such as graphical abstract creation and reference annotation. The time spent was also attributed to trainees (in the first 3 author positions of all review articles) versus faculty (positions 4 through last author).

Similarity scores between original and final drafts were calculated using software from CopyLeaks. These scores were tabulated for all papers to measure edits and changes implemented from the original to the final draft. The final draft was also examined for plagiarism similarity index scores using Turnitin software. This program compares the provided text to internet sources, academic journals, and previously submitted papers to determine a percentage of the text that is highly similar to outside sources.

During the fact checking process, the validity of the references was examined. References were flagged as incorrect if there was any error in the actual citation such as incorrect year, authors, title, and journal. References were also deemed incorrect if the text for which the citation was listed was not relevant to the reference. Further, references were marked as incorrect if they were fabricated. Additionally, the number of queries used for each step of the process was tracked. It should be noted that even the purchased version of ChatGPT limited one to 25 queries/3 h.

留言 (0)

沒有登入
gif