Smoking cessation therapists have long used the motivational interviewing (MI) talk therapy to guide clients toward positive behavioral change []. MI engages clients in a structured conversation that encourages them to contemplate their behavior more deeply and motivates them to change it. MI has been shown to be successful in helping clients reduce or quit their smoking habits [], but the availability of MI-trained clinicians is limited to hospitals and medical centers, and MI therapy is usually only initiated after a smoking-related health issue occurs []. These restrictions make it difficult for smokers to access therapy outside of medical centers and occur too late to have a preventative effect.
Our research seeks to automate the therapist side of an MI conversation which, if successful, could broaden access to care at a population level. We have been developing a chatbot, called MIBot [], whose purpose is to move ambivalent smokers toward the direction of quitting. MIBot is being developed by an interdisciplinary research collaboration among expert MI-trained clinicians, social scientists, and computer engineers. The initial version of the MIBot chatbot guides the client through a fairly simple MI conversation by combining scripted interactions with context-specific responses generated by natural language models, based on elements of the MI approach.
The focus of the initial version of the MIBot chatbot is on one core skill of MI: reflective listening [], in which the chatbot provides reflections on what the client has most recently said. In general, reflections are meant to express the therapist’s current understanding of the client’s most recent response and invite the client to continue further contemplation of their behavior. Reflections can be simple or complex []. A simple reflection rephrases a client’s response, sending the message that the response was understood and inviting the client to continue. A complex reflection attempts to infer relevant information about the client from the client’s utterance by linking the client’s response to relevant facts or ideas. A good quality complex reflection may further infer something about the emotional state of the client through their utterance.
In a complex reflection, when these relevant facts come from a client’s earlier responses in the conversation, we call this a backward-looking complex reflection (BLCR). Preferably, a BLCR does not simply summarize all the past conversational information in order but is composed of the information that is sensible for the context. shows an example of a conversation in which the final statement by the therapist is a BLCR.
Textbox 1. Example motivational interviewing conversation in which the last utterance by the therapist is a backward-looking complex reflection.Therapist: What is one thing you like about smoking?
Client: It makes me have less stress and keeps me connected to my friends.
Therapist: What is one thing you dislike about smoking?
Client: It leaves bad breath.
Therapist: What is one thing about your smoking addiction that you would like to change?
Client: I would like to reduce smoking.
Therapist: [backward-looking complex reflection] It seems like you want to reduce your smoking, which might help your concern about bad breath
The initial MIBot chatbot [] only generates reflections using the client’s most recent utterance and does not make use of prior utterances. The ability to generate BLCRs can expand the chatbot’s options for generating context-appropriate complex reflections.
The goal of this work is to develop and evaluate a method to automatically generate BLCRs given a prior conversation. It has become possible to do this kind of generation through recent dramatically powerful advancements in natural language processing [], and more specifically the most recent large language models (LLMs) from GPT-3.5 and later [-].
LLMs are language models which take text as input and generate textual output. GPT-4, an LLM introduced in March 2023, has significantly improved capability to generate text to satisfy particular requirements compared to previous LLMs [-]. One way to use GPT-4 is to write a prompt, which is a language-based instruction that literally tells the model the processing that is desired []. This processing is potentially anything that can be described in language, which is a truly remarkable, new capability that will have many applications. We describe a method for developing the prompts needed to “tell” the model to create BLCRs.
This paper is organized as follows: the Prior Work section introduces MI, GPT-4, and the relevant parts of the MIBot project that we build on. The Methods section describes the prompt developed to generate a BLCR, the specific structure of the input to GPT-4, the rating scale developed to assess when a BLCR is acceptable, the experimental procedure to test the acceptability of BLCRs generated by the prompt, and the data used to test this procedure. The Results section provides the evaluation, and the Discussion section interprets the results of the experiment and lists limitations. The Conclusions section suggests avenues for further work.
Prior WorkMotivational InterviewingMI is a therapeutic technique in which a therapist engages in a conversation to guide and motivate clients who are ambivalent about their behaviors to move toward changing them []. These guided conversations use 4 MI core skills: asking open-ended questions, providing reflections, affirmations, and summarization. In an MI conversation, the therapist will typically begin with an open-ended question, listen to the client’s response, and reply with 1 of the other 3 core skill types, depending on the circumstances and the direction the therapist wishes to guide the conversation.
While all 4 core skill types are integral to a successful MI, we focus on the role of reflections and the related reflective listening. Reflective listening requires the therapist to listen to what the client has most recently said and formulate a response—called a reflection—that displays the therapist’s understanding while also guiding the conversation. The content of a reflection depends on the current context of the conversation. Reflections can be divided into 2 types: simple reflections and complex reflections. Simple reflections restate the client’s response, typically using different words, so that the therapist and client can establish that they are on the same page. Complex reflections allow the therapist to link what the client has most recently said to other facts or information about the client’s life and emotional state, usually providing some kind of inference. Complex reflections are used to guide the conversation toward new topics.
MI has been shown to be a successful therapy for moving clients toward reducing their smoking habits [], and reflections in particular have been correlated with high perceived support for patient autonomy in MI sessions [].
LLMs and GPT-4LLMs are digital models of natural language that are able to generate text from an input by autoregressively predicting the next word in a given sequence []. These models learn how to predict semantically and syntactically reasonable words by being trained to “fill in the blanks” on large amounts of diverse human-written text, which encompass questions and answers, web-based conversations, informative articles, and other kinds of digitized text. The wide range of data that LLMs are trained on have made them effective systems for generating solutions to various problems in the domain of natural language processing, such as answering questions, summarizing long text, and conversational dialogue generation [].
The GPT (Generative Pretrained Transformer) family of LLMs has proven to be state of the art in a number of general-purpose tasks []. ChatGPT and the related GPT-4 [] model can generate human-like text and answer questions correctly to the point that it has successfully passed m any professional and academic examinations [].
Due to the size of the model and the large amounts of human-produced textual data it is trained on, 1 emergent characteristic of the GPT-4 is its ability to answer questions and interpret human-readable text to follow instructions. This has led researchers to try and directly “ask” GPT-4 to generate some kind of desired text given some input. The study of ways to ask GPT-4 to generate desired text is a newly emerging field called prompt engineering. Thus, this asking process is called prompting a GPT-4 model, and these “asks” are typically called prompts [,].
A prompt usually consists of a request of the model to generate or process some desired text, usually followed by requirements that the generated text must satisfy or instructions that tell the model how to generate this text []. The request can optionally be followed by an input, with the goal that the model will use the input to process and generate the requested text ().
Table 1. Prompt example and generated result. The request is the first sentence, and the input is the italicized second line. The request and input message can be tested live on the OpenAI playground by copy and pasting the entire prompt text [].Prompt example and GPT-4–generated resultPrompt (request and input)Given the Keywords below, write a paragraph that incorporates them into a story about a princess on the moon.GPT-3– and GPT-4–based prompting has been shown to be highly effective in generating text to solve various natural language processing tasks [,,] and has already found applications in a diverse set of technical fields. However, a prompted GPT model does not always produce factually correct answers [,,]. In addition, a prompted GPT model is not deterministic, and a single prompt may produce different texts each time that a prompt is used to generate a completion [,]. Recent research on prompt engineering has produced new methods to structure prompts for generating satisfactory texts [].
The ability to prompt is not restricted by the architecture of GPT-3 or GPT-4. Prompting is possible with any LLM of similar structure, and the difference in output depends on how much knowledge and prediction capability has been retained by an LLM. Thus, while our work specifically used GPT-4, this paper’s method can be used with any LLM, including future improvements on GPT-4, and we will indicate this by referring to LLMs broadly in our methods and discussions.
Existing MI Smoking Cessation Chatbots and the MIBot ProjectThe research and development of MI-based chatbots across several therapeutic domains remains an open problem, with numerous approaches incorporating different natural language processing techniques, and nothing yet deployed in a commercial or therapeutic context for mass adoption. For MI focusing on smoking cessation, several research teams have independently developed chatbots that have been tested and evaluated on experimental study participants. Our particular work has focused on an early step in smoking cessation, which is moving ambivalent smokers toward the decision to quit smoking.
Almusharraf et al [] designed an MI chatbot, which used predefined answers in a scripted conversation and measured its effectiveness on clients’ confidence to quit smoking with an 11-point scale. After testing this method on 97 participants, they found that the average confidence among clients to quit smoking increased by 0.8 (P<.001 via paired 1-tailed t test) 1 week after the conversation. The scripted nature of these MI conversations, with answers not unique to clients’ responses, was suggested as a future point of improvement to investigate further.
Independently, He et al [] sought to investigate whether chatbots using MI techniques had any differing effects from neutral chatbots. They designed 2 chatbots—an MI-based chatbot and a neutral, affirming chatbot—and found that while there were no significant differences in clients’ reception of the 2 chatbots, both chatbots increased the clients’ motivations to quit smoking. The conclusions of He et al [] combined with the results of Almusharraf et al [] indicate that nonscripted responses from chatbots may be better received.
The text produced by generative models are an alternative to scripted responses, and Shen et al [] displayed how generative models could generate reflections dependent on context. Using a GPT-2–based architecture, they created unique, context-dependent generative responses by incorporating a combination of client and therapist utterances from an existing dialogue history, and drawing from a database of previous transcripts to help select between context-relevant responses based on semantic similarity. These generated reflections were compared to a seq2seq model baseline, an older model of conditional text generation that is not LLM based, and human evaluation using a 5-point Likert scale for absolute effectiveness. The generated reflections produced by this system were considered improvements over the baseline model using standard metrics such as the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score and, in terms of absolute effectiveness, were on-par or above ground truth reference reflections. These results indicate that custom reflections from generative models may be effective for MI-based smoking cessation chatbots to increase users’ confidence and motivation in quitting smoking.
To explore this possibility, Brown et al [] have been iteratively developing MIBot, an MI-based smoking cessation chatbot that uses GPT-2 to generate custom reflections. They tested 3 versions of the chatbot—labeled v5.0, v5.1, and v5.2—on independent groups of recruited smokers to measure the effect of GPT-2–based generative reflections on moving smokers towards changing their smoking habits. They also used a version of the chatbot that did not generate reflections—v4.7—for comparison. MIBot v5.0, v5.1, and v5.2 asked 5 core questions, shown in in sequence, expected a participant response after each question, and used a pretrained GPT-2 model to generate a custom reflection. MIBot v5.2 added extra secondary questions after questions 1 and 2 to allow participants to follow-up with their initial responses to a core question, and a specific version of question 4 if the answer to question 3 was to reduce smoking. MIBot v4.7 also asked these questions, but responded with “thank you” to each response rather than generating a reflection.
Textbox 2. The 5 motivational interviewing conversational questions in the MIBot v5.2 conversation used in this paper.What is one thing you like about smoking?What is one thing you dislike about smoking?What is one thing about your smoking addiction that you would like to change?What will your life look like once you make this change?What is one step you need to make this change?The effect of MIBot versions on readiness to quit was measured using a numerical scale called the Readiness Ruler []. Here, each participant was asked to rate their confidence, importance, and readiness to quit smoking from 0 to 10, with 10 indicating the highest value. Participants were asked to fill out the Readiness Ruler 3 times: just before, immediately after, and 1 week after the conversation with MIBot. Participants were also asked to score the perceived empathy of MIBot through the CARES (Consultation and Relational Empathy Survey) metric, a validated tool used to measure the perceived empathy of a health care interaction by asking a participant 10 statements that are each rated using a 6-point Likert scale [].
Brown et al [] found that there were statistically significant increases in participant confidence to quit smoking across all four chatbots 1 week after the conversation, with no statistically significant differences between them. This finding agreed with He et al’s [] results, and Brown et al [] posited that asking questions may be enough to evoke an impact on confidence to quit. Version v5.2 did display statistically significant increases in importance and readiness to quit smoking when the other versions did not. In addition, v5.2 did exhibit a statistically significant increase in perceived empathy compared to v4.7 (P=.004) on the CARE scale. Both results were in contrast to He et al’s [] findings that there were no statistically significant differences between neutral and MI-style chatbot conversations, and Brown et al [] postulated that this may be due to the effect of v5.2’s LLM-based generative reflections.
MIBot v5.0, v5.1, and v5.2 generate GPT-2–based reflections that only use a participant’s latest response. This precludes the generation of complex reflections that can refer to earlier responses in a conversation, which are the essential element of the BLCRs that are the focus of this paper. This work builds upon Brown et al’s [] work by creating and evaluating a method to generate BLCRs using GPT-4.
In this section, we describe the structure of the method used to generate BLCRs, the set of data we test our BLCR generation method on and how the resulting BLCRs are assessed.
Ethical ConsiderationsEthical standards and approval directly follow those of Brown et al [] as per the use of the data in the experiments described in that paper. The research used to acquire that data was approved by the University of Toronto Research Ethics Board under protocol number 35567, amended June 29, 2022, and all participants provided consent before participating in the Brown et al [] study.
BLCR Generation StructureIn a chatbot conversation with a client, the client’s latest and previous responses, along with the questions that were asked to evoke those responses, are packaged into a text called the client message input. A set of instructions, called the BLCR prompt, tells an LLM how to generate a BLCR from the client message input. These 2 texts are used together to generate a BLCR.
Client Message InputThe client message input () consists of (1) conversation—the sequence of therapist questions and client responses up to the client’s response right before the therapist’s latest question—and (2) latest question-response—the therapist’s latest question and the client’s latest response.
Textbox 3. A sample client message input.Conversation:
Therapist: What is one thing you like about smoking?
Client: It makes me to be more relaxed and releases my tension levels
Therapist: What is one thing you dislike about smoking?
Client: It would be the number of cigarettes I smoke a day plus the affordability of cigarettes theses days
Therapist: What is one thing about your smoking habit that you would like to change?
Client: The number or quantity I smoke a week
Therapist: What will your life look like when you make this change?
Client: If I can reduce by smoking 2 cigarettes a day and I would have some extra cash to do other things
Latest Question-Response:
Therapist: What are the steps you need to make this change?
Client: I need to probably set a smoking schedule that I need to stick too and also find a hobby to keep me distracted from my cravings
Backward-looking complex reflection:
The client message input is unique to each client response, and so changes on every client response. An LLM processes this input to generate a BLCR by first processing the instructions given in the BLCR prompt.
Prompt DesignThe BLCR prompt, shown in , consists of (1) a request to generate a BLCR meeting the standards of MI, using terms presented in the client message input (see Client Message Input section); (2) a description of a complex reflection, taken from Miller and Rollnick []; (3) constraints and criteria to ensure the generated text meets the criteria of a complex reflection; (4) constraints and criteria to ensure the generated text meets the criteria of a BLCR; and (5) repetition of the request to generate a BLCR, given the above constraints and criteria.
The BLCR prompt is the same regardless of the client input message used. The BLCR prompt draws upon an LLM’s implicit domain knowledge of MI [,], combined with a specific definition of a complex reflection, and constraints and criteria on what the output must follow to be an acceptable BLCR. For each client message input, an LLM can use the BLCR prompt’s guidelines to generate a BLCR.
Textbox 4. The full backward-looking complex reflection prompt.Generate a "backward-looking complex reflection" on the "Latest Question-Response" that meets the standards for Motivational Interviewing from the given "Conversation" about smoking cessation.
Refer to the following operational definition of a complex reflection in the context of Motivational Interviewing (MI):
Reflective listening statements are made by the clinician in response to client statements. A reflection may introduce new meaning or material, but it essentially captures and returns to clients something about what they have just said. Reflections are further categorized as simple or complex reflections.
Complex reflections typically add substantial meaning or emphasis to what the client has said. These reflections serve the purpose of conveying a deeper or more complex picture of what the client has said. Sometimes the clinician may choose to emphasize a particular part of what the client has said to make a point or take the conversation in a different direction. Clinicians may add subtle or very obvious content to the client's words, or they may combine statements from the client to form complex summaries.
A complex reflection has these hard constraints:
A complex reflection must be a statement and not a question.A complex reflection must not give advice or information without permission, even if this advice is helpful.A complex reflection must not direct the client by giving orders or commands.A complex reflection must not disagree or challenge what the client has said.A complex reflection must not incentivize people to smoke more, or discourage people from quitting smoking.A complex reflection must not be factually wrong about smoking.A complex reflection must be grammatically correct.Here are some additional hard constraints for backward-looking complex reflections:
A backward-looking complex reflection must directly reference the Client statement and the Therapist question it is responding to in the Latest Question-Response.A backward-looking complex reflection must include only one piece of extra information from earlier client statements in the Conversation.A backward-looking complex reflection must not summarize the conversation.A backward-looking complex reflection must use what the client has said in the last client statement, and the information from earlier client statements, and infer something about the client.Given all the context above, generate a backward-looking complex reflection on the "Latest Question-Response" from the given "Conversation" that meets the Motivational Interviewing criteria of a complex reflection and satisfies all above hard constraints.
The BLCR prompt was created through an iterative process. Starting with an initial description was set of rules describing a BLCR and the requirements to generate a BLCR. This initial prompt was used to generate reflections on preexisting conversational data from prior conversations. These reflections were evaluated using the scale described in the Evaluation of Quality of a BLCR section. The prompt was subsequently revised to improve the responses, and the method attempted again on another set of independent conversational data. The revisions consisted of additional constraints and guidance, written in English, to address the shortcomings of the generated reflections. This iterative process continued until a prompt of sufficiently high evaluation score of the generated reflections was was achieved. The following sections describe both the data and the scale used.
DataTo test the BLCR prompt and client message inputs on real conversational data, 50 conversations were randomly selected from the MIBot version5.1 experiment data []. Each conversation consisted of the 5 MIBot core questions shown in (), along with their respective participant responses. As described in Brown et al [], the participants were 50 anonymous volunteers from the Prolific platform who self-selected based on being current smokers. All 50 participants wrote their responses in text via the MIBot text-based chat interface. provides a sample conversation. Using the BLCR prompt and client message input, BLCRs would be generated for responses to Q3, Q4, and Q5 for each conversation, giving a total of 150 candidate BLCRs to assess.
Evaluation of Quality of a BLCRA rating scale was developed to numerically evaluate the quality of a BLCR. This scale allows one to determine whether a BLCR is acceptable, that is, it meets the definition of a BLCR described in the Prior Work section.
The BLCR rating scale () is an ordinal scale where higher number ratings successively include and build upon lower number ratings. If a BLCR achieves a rating of 3, this means it meets the criteria of 1 (referencing a client’s latest response), 2 (referencing previous information in the conversation), and 3 (makes an inference about the client using present and past information). Satisfying these 3 requirements meets the definition of a BLCR as defined in the Prior Work section; therefore, we call any BLCRs rated 3 or greater acceptable BLCRs. A further rating of 4 is included to meet the preference for a “good” BLCR, which does not summarize the previous contents of the conversation, an optional condition that was deemed useful for indicating an unambiguous BLCR that exceeds the minimum acceptability requirements.
Textbox 5. The backward-looking complex reflection rating scale.1: does the output reference the client’s latest response somewhere?
the output contains 1 or more references to the client’s latest response2: 1 + does the output reference some extra information from earlier in the conversation?
the output contains 1 or more references to 1 or more previous client responses3: 2 + does the output make an inference about the client using information in criteria 1 and 2?
the output generates 1 or more novel assumptions about the client using information in 1 and 24: 3 + is the output not summarizing the sequence of the conversation word for word?
the output does not repeat the information in each client response in sequenceCriteria to accept as a backward-looking complex reflection (score a 1 [True]): it is rated 3 or greater on the above rating scale.
A Python script was written to parse 50 conversations and build a formatted client message input for every Q3, Q4, and Q5 conversational sequence, creating 150 total inputs. These were fed to an LLM alongside the BLCR prompt, and the LLM generated 150 candidate BLCRs.
Three human raters were deployed to use the criteria of the BLCR Rating Scale to independently score all 150 generated BLCRs as acceptable or unacceptable. Using a binary score, an acceptable BLCR was scored 1 (true) if it received a rating of 3 or greater on the BLCR Rating Scale, while an unacceptable BLCR was scored 0 (false). The binary scoring was used to determine the acceptability: the percentage of accepted BLCRs among all generated BLCRs. The interrater reliability between the binary scores of the 3 raters was assessed using percent agreement and the calculation of Cohen κ. This metric was chosen specifically to measure interrater reliability with an ordinal scale, and was chosen instead of a similar metric such as Fleiss κ due to the latter’s unsuitability in a case where all raters rate all items, which is the case for this BLCR assessment experiment [].
This section reports the fraction of the BLCRs generated using the evaluation method described in the Methods section that were deemed acceptable by each of the 3 human raters. The first section reports the percentage of accepted BLCRs between the 3 raters and between the 3 questions, along with a breakdown of the frequency of ranking scores per question and rater. The second section reports the interrater reliability between 3 pairs of the 3 raters (rater 1 and rater 2, rater 1 and rater 3, and rater 2 and rater 3) using percent agreement, with a brief discussion on the κ results.
BLCR Acceptability Statisticsdisplays the percentage of BLCRs meeting the BLCR rating criteria as acceptable (BLCR rating of 3 or greater) broken down by the rater and the question. displays the frequency of rating ranks broken down by question and by rater.
Table 2. Percentage of backward-looking complex reflections deemed acceptable by question and rater.Q3 (n=50)Q4 (n=50)Q5 (n=50)Total (N=150)Rater 1 (%)92909693Rater 2 (%)73908884Rater 3 (%)90888688Average acceptance (%)85 (10)89 (1)90 (5)88 (5)Table 3. Frequency of rating by question and rater.Question and raterRating, n01234Q3Rater 1400344Rater 21013235Rater 3203640Q4Rater 1122046Rater 2104046Rater 3105045Q5Rater 1101049Rater 2105243Rater 3322043Total (all questions combined)Rater 16233139Rater 230224124Rater 362106129breaks down the percent of acceptable BLCRs by rater and question, and the total column indicates the percent of BLCRs scored acceptable across all 150 responses by a single rater. The percentages in parentheses indicate the SD of the acceptability percentage.
The combination of high acceptability () and high frequency of “4” ratings () indicates that the majority of BLCRs generated by this method were considered “good” among all 3 raters. This is an indication that the LLM GPT-4 is highly capable of generating a BLCR. and graph the frequencies of rating by question and rater, with both indicating a large skew toward “4” ratings.
Interrater ReliabilityTo assess the agreement of the results provided in -, displays the percent agreement and Cohen κ for each rater pair. All 3 raters agreed on results at least 80% of the time.
Table 4. Percent agreement and Cohen κ for rater pairs.Rater 1, rater 2Rater 1, rater 3Rater 2, rater 3Agreement (%)848880Cohen κ0.260.360.16Altogether, the combination of high “4” frequency and a rating agreement of 80% and above indicates that this BLCR generation method can be expected to produce “good” BLCRs in the large majority of cases. In comparison, the κ values () indicated weak to fair agreement between all 3 pairs of raters, based on standard interpretation criteria of κ. The discrepancy between high percentage agreement and weak to fair κ may be due to the majority of BLCRs being rated “4” by all 3 raters. The lack of contrastive negative examples (very few generated BLCRs that were rated 0, 1, or 2) skews the calculation of κ toward treating the labeling of widespread agreement as random chance. Therefore, percent agreement is thought to be a more realistic assessment of effectiveness in this context.
contains an example of a real conversation from Brown et al [], with Brown et al’s [] reflections (labeled MIBot [data]) and BLCRs generated by this paper’s method (labeled MIBot [BLCR]) below those reflections. Overall, the BLCRs generated successfully iterate on Brown et al’s [] provided reflections by better incorporating direct reflections on responses and linkages to previous responses to make inferences. A high-quality MI reflection would further infer about the emotional state of the client, and while the generated BLCRs are able to make rudimentary inferences about the mental state of the client (“it seems that…”), more work may be necessary to turn these inferences into those of emotional states. The high percentage of accepted BLCRs shows promise in prompt-based methods being an effective technique for MIBot to generate complex reflections that incorporate information from the past.
LimitationsThe prompt-based BLCR generation method is restricted to MI conversations for smoking cessation and has only been tested in the context of 5-question MIBot conversations. Beyond this scope, this work may not generalize to other MI smoking cessation therapeutic contexts without changes to the prompt. However, the structure of the prompt itself is not specific to the data or the situation. The prompt can in theory be modified to remove references to smoking cessation and replace these with references to other domains, potentially offering a degree of domain generalizability across different subjects of MI therapy beyond smoking cessation. GPT-4 was the LLM model used in this work, but this method is applicable to any LLM model in theory. Newer LLM models, including future GPT models, may provide more robust results.
ConclusionsThis paper presented a method to use an LLM-based prompt to generate BLCRs for a version of MIBot’s MI smoking cessation conversation. It provided a definition of a BLCR, a prompt used to generate BLCRs, and a BLCR rating scale to assess whether a BLCR is acceptable. We found that 88% (n=150) of the generated BLCRs were deemed acceptable. This paper extends the work of Brown et al [] by providing a method to generate complex reflections that incorporate information from earlier in the conversation, and uses GPT-4’s strong text-generation capability rather than GPT-2.
Future work may build upon the definitions and methods introduced by this paper in three ways. First, the definition of a BLCR and the BLCR rating scale may be further refined to provide an accurate conceptual model of what the BLCR is trying to capture in a MI conversation. Second, the BLCR’s prompt method can be adjusted to different MI therapeutic contexts beyond smoking cessation or refined to be more accurate for the smoking cessation context. Finally, the BLCR prompt method can be incorporated into MIBot, and its generated BLCRs can be assessed qualitatively and quantitatively in live experimental conversations.
This research was funded by a Natural Sciences and Engineering Research Council of Canada Discovery grant (RGPIN-2019-04395) and the Edward S Rogers Sr Department of Electrical and Computer Engineering at the University of Toronto.
None declared.
Edited by John Torous; submitted 18.10.23; peer-reviewed by Jackie Andrade, Jaimee Heffner, Steven Siddals; final revised version received 29.03.24; accepted 15.04.24; published 26.09.24.
© Ash Tanuj Kumar, Cindy Wang, Alec Dong, Jonathan Rose. Originally published in JMIR Mental Health (https://mental.jmir.org), 26.9.2024.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Mental Health, is properly cited. The complete bibliographic information, a link to the original publication on https://mental.jmir.org/, as well as this copyright and license information must be included.
留言 (0)