Evaluating the Potential and Pitfalls of AI-Powered Conversational Agents as Humanlike Virtual Health Carers in the Remote Management of Noncommunicable Diseases: Scoping Review


IntroductionBurden of Noncommunicable Diseases

Noncommunicable diseases (NCDs), also known as chronic diseases, are medical conditions that are not primarily caused by infectious agents (eg, viruses, bacteria, fungi, or parasites) and cannot be transmitted from one individual to another through close contact []. NCDs, such as cancer, cardiovascular diseases (CVDs), chronic obstructive pulmonary diseases, chronic respiratory diseases, chronic kidney diseases, cognitive disorders, metabolic syndrome, diabetes, and hypertension, are rising worldwide, with significantly higher rates in low- and middle-income countries (LMICs) [-]. In 2019, NCDs contributed to the highest proportion of total mortality (74.4%), accounting for 7.1 million additional deaths in 2019 as compared to 2009 []. Nearly half of the mortalities in Asia are attributable to NCDs, causing >80% of CVD and diabetes deaths, 90% of chronic obstructive pulmonary disease deaths, and two-thirds of all cancer deaths occurring in LMICs, resulting in 47% of the global burden of disease [,]. Major risk factors of NCDs include unhealthy dietary habits; physical inactivity; stress; and consumption of drugs, tobacco, and alcohol, which are generally modifiable due to lifestyle choices [,]. These chronic diseases are also major causes of long-term disabilities and prolonged costly treatment that may pose serious threats to a country’s health care resources and expenditure, especially among lower-income countries where health systems are not sufficiently equipped to tackle the escalating challenges [,,].

In addition to regular visits to health care centers, NCDs require challenging self-care management, including compliance with medications, lifestyle modifications, and constant symptom monitoring to prevent disease progression; nonetheless, adherence to these procedures is generally low, especially among adults with limited health literacy who struggle to comprehend and follow instructions from their health care professionals [,]. Moreover, the shortage of health care providers and their limited time availability are substantial causes of patients’ being deprived of receiving adequate health education and support to make informed decisions required for the effective self-management of their chronic illnesses [,]. In addition, access to proper health services can be limited in many underdeveloped areas and rural communities due to poor health care infrastructure and mobility facilities [].

The optimal prevention management strategy should incorporate elements of individual lifestyle management, societal health awareness management, national health policy decisions, and global health strategy []. However, the escalating prevalence of NCDs worldwide reveals that traditional disease management techniques are not sufficiently effective, thereby indicating an urgent necessity to develop effective supplementary management strategies to mitigate the substantial financial burden imposed by NCDs on many households, particularly in LMICs [,,].

The Role of Telehealth in NCD Management

Telehealth applications have the potential to improve patient self-care and disease-specific knowledge as well as minimize hospitalizations and mortality []. There is evidence from several past studies suggesting the significant effectiveness of mobile-based telehealth apps in improving nutritional intake and physical activity with technology intervention, resulting in body weight loss and adoption of recommended lifestyle changes due to their convenience and accessibility []. This is because such apps, with the help of existing and emerging technologies, have the ability to assist patients in managing their chronic diseases more effectively by providing constant self-monitoring tools and promoting improved self-management of health problems []. Furthermore, the ever-growing ownership of mobile devices worldwide has greatly contributed to the shift toward digital health care services, including assessment, monitoring, and treatment of physical and mental health, thereby indicating the promising ability of mobile health apps in self-monitoring, assessment, and treatment of NCDs at a reduced cost [-]. Indeed, the COVID-19 pandemic has expedited the use of telehealth [,], which has been proposed as a cost-effective method of delivering better health care services to people with chronic illnesses in a more flexible, personalized, transparent, dynamic, and accessible way [,].

Potential Enhancement of Existing Telehealth Apps With Artificial Intelligence

Although mobile-based telehealth apps offer an ideal platform for systems designed to help patients manage their chronic illnesses due to the smartphones’ computational power, connectivity, and consistent availability, many individuals struggle with complex user interfaces of existing digital health technologies []. While the ubiquity of these apps reduces some acceptance barriers, most apps still overlook many other barriers, such as lack of motivational, psychological, and emotional support [,]. Nonetheless, technological advancement involving artificial intelligence (AI) seems to have the potential to further upgrade existing mobile health apps with more user-friendly features that can support individual user needs []. For instance, many patients lack the ability to effectively navigate conventional telehealth apps due to limited health-related or computer literacy and disabilities such as visual impairment; hence, such intelligent dialogue systems, specifically termed conversational agents (CAs), may help overcome these limitations and improve usability by providing an oral presentation of the apps’ contents in plain language []. AI-powered CAs are computer systems that can communicate with humans through text, voice, and images on mobile, web-based, or audio-based platforms using AI techniques such as machine learning (ML; a statistical method of training models using data for making predictions based on a variety of features) and natural language processing (machines’ ability to detect and interpret humans’ verbal and written languages) [,]. A CA may serve as an empathetic listener to understand patients’ problems as well as aid in monitoring a patient’s health 24/7 and notify physicians about an anticipated medical emergency [].

The popularity of CAs, especially those that use unrestrained natural language, has increased over the last decade as consumers can use their smartphones to interact with CAs for daily tasks []. When deployed on mobile devices, CAs have the potential to augment human intelligence and demonstrate multiple benefits such as delivering health education and behavior change for a range of chronic health conditions []. Moreover, these AI-based CAs provide additional communication channels that are particularly effective for developing trust and therapeutic alliance to encourage adherence among users []. The human-like conversational features of CAs due to advancements in natural language processing, voice recognition, and AI are increasingly substituting human employees in service encounters, including the health care industry to deliver personalized care and support to individuals with chronic health issues [].

Types of CAs

In this paper, we broadly categorize intelligent CAs into 3 main types (): chatbots (text based or voice enabled—voicebots) without embodiment, computer-based embodied digital avatars, and physically embodied humanoid robots []. Although robots are beyond the scope of this paper, AI humanoid or social robots can also be classified as CAs that can be used as human-like health caregivers for managing NCDs. Despite the differences, all CAs, including humanoid robots, aim to enhance relational outcomes through human-like communication [].

Table 1. Types of conversational agents (CAs).AIa-based CA type (with examples)Features and functionalityInteraction modeAdvantagesChallengesChatbots (eg, ChatGPT)Computer programs integrated into messaging platforms that interact with users via free text []
Evolved from preprogrammed, fixed scripted responses to AI-powered versions enabling more human-like conversations [-]
Text basedSimple user interfaces
Easy and cheaper to design and develop
Offers mostly chat-based interactions
Lack of personalized emotional tones []
Voicebots (or voice assistants; eg, Amazon Alexa and Apple’s Siri)Voice-enabled intelligent chatbots that interact with users and respond to speaker commands primarily through voice [,]
Voice based (or text+voice enabled)Offers greater flexibility by allowing for hands-free conversations and multitasking []
Beneficial for users with limited digital expertise (eg, users with typing inabilities)
Development of enhanced voice interfaces crucial for a more natural conversation []
Require some level of emotion awareness (a significant component of voice conversations) []
Prone to speech recognition errors []
Digital avatars (eg, Replika, Mitsuku, and Soul Machines)AI-created anthropomorphic representations of real-world characters in a computer-simulated environment, partially automated in some actions and movements, emulating human behavior [,]
Digital AI avatars can generate human-like interactions through a fusion of multimodal features [,]
Digital avatars evolved from 2D cartoonlike characters to visually realistic and interactive human faces with 3D imaging, leading to the emergence of highly realistic avatars (“Digital humans”) []
Companies such as Soul Machines use CGIb, AI, and NLPc to create digital human prototypes in metaverse spaces []
Digital humans can represent fictional characters or virtual replicas of real humans, requiring digital twin technology for data-driven personalized care [-]
Multimodal (text based, voice activated, and face-to-face)Ability to mimic natural human interactions by delivering highly personalized responses [,]
Capable of face-to-face conversations, displaying physical nonverbal behaviors (eg, facial expressions, hand gestures, nodding, and head and body postures) []
Technical implementation is more complex, time-consuming, and resource intensive []
Real-time video interactions with digital AI avatars require high computing power and bandwidth connectivity []
Risk of perceived uncanniness due to increased realism, posing ethical concerns [,]
Humanoid (or anthropomorphic or social) robots (eg, Sophia and Ameca)Designed to mimic human-like characteristics both in behavior and physical appearance, including body structure and autonomous movement [,]
Often used for education, entertainment, assistance, and personal care through various sensor channels such as hearing, sight, and touch []
Similar to digital avatars, humanoid robots can exhibit human-like communicative behaviors, including social praise, head and torso movements, and nodding, to stimulate more natural conversations []
Physical embodimentCapability to perform difficult or dangerous tasks, provide companionship, and participate in social interactions, especially in circumstances where human interaction is limited [,]
Physical presence enables more distinct and natural interactions compared to chatbots or virtual avatars as HRId differs from traditional HCIe by involving both linguistic and physical aspects [,]
Higher development cost relative to other CAs
Lower ease of access for users compared to other CAs [,]
Higher risk of ethical and safety concerns

aAI: artificial intelligence.

bCGI: computer-generated imagery.

cNLP: natural language processing.

dHRI: human-robot interaction.

eHCI: human-computer interaction.

Ethical Concerns of AI Agents

While AI agents have the capability to provide constant health surveillance support, challenges and risks associated with using AI-based CAs in health care remain. These include ethical concerns regarding data collection and interpretability of results, patient safety risks, biases encoded in algorithms, and cybersecurity [,]. Furthermore, customers are often reluctant to engage with such AI-based CAs due to many factors such as trust, reliability, learning curve, usability, privacy, and data security that should be addressed in the design, deployment, and use of AI applications [,,-].

State-of-the-Art Summary

Our study is novel regarding CAs as human-like digital agents for managing NCDs remotely. As AI-based CAs are a relatively new area, limited research has been conducted on applying these emerging technologies in health care. While existing research on CAs as virtual caregivers for NCDs serves as a seminal foundation, such research has other critical limitations. To illustrate, most studies have primarily focused on applications of CAs for mental health conditions, overlooking broader NCDs such as CVDs, metabolic syndrome, or diabetes [,]. A recent review [] that focused on a different set of research questions from ours concluded the following: “A future chatbot could be tailored to metabolic syndrome specifically, targeting all the areas covered in the literature, which would be novel.” Our research looks specifically into this gap.

Furthermore, despite some favorable anecdotal evidence, the effectiveness of CAs in NCD management is seldom explored in large-scale trials, particularly in older adults, who have the highest risk of developing NCDs [-]. Some recent previous studies have conducted systematic reviews [,,] or scoping reviews [,,] on CAs in chronic disease management; however, to the best of our knowledge, no review has been conducted yet on the application of AI-based CAs as human-like digital agents for managing NCDs remotely. Such limitations and gaps as highlighted previously present both challenges and opportunities for research to advance our understanding of how CAs can contribute to managing NCDs.

Aim

This scoping review aimed to provide an overview of the existing evidence and research on using assistive humanoid AI-based CAs in health and social care for managing NCDs. Our primary objective was to explore the impact of AI-based CAs, including embodied avatars, as human-like health carers for the self-management of chronic diseases. By examining the current literature on this topic, we hoped to identify key areas for future research and provide insights into how these technologies can be effectively used in health care and personalize NCD management strategies.

Research Objectives

Our research objectives were as follows: (1) to explore the current state of research on the use of AI-based CAs as human-like virtual health carers for managing NCDs, (2) to identify the potential benefits and challenges associated with the use of these technologies in the health care field, (3) to explore the efficacy of AI-based CAs in the remote management of NCDs, (4) to discover the specific target users primarily studied, and (5) to provide recommendations for future research in this area.

Research Questions

Our research questions were as follows: (1) what is the current state of research on using AI-based CAs as human-like health carers for managing NCDs? (2) What are the limitations or challenges associated with the use of these technologies in health care? (3) What are the potential benefits of using AI-based CAs in managing NCDs, and how can they be effectively used to improve health care delivery and reduce health care burden? (4) What is the efficacy of the CAs in the remote care of NCDs? (5) What are the frequently targeted user groups for such virtual agents (eg, specific age groups and individuals with special needs)?


MethodsSearch Strategy

We followed the methodological frameworks proposed by Arksey and O’Malley [] and Levac et al []. Initially, research objectives and questions were formulated, followed by a systematic literature search conducted on July 31, 2023. For primary searching, we used 6 electronic databases that are considered relevant to the research focus—Ovid MEDLINE, Embase, PsycINFO, PubMed, CINAHL, and Web of Science—applying the same set of keywords, such as “conversational agents,” “artificial intelligence,” and “noncommunicable diseases,” including their associated synonyms (as shown in ). The Boolean operators “*” and “OR” were used to expand and ensure that different word combinations were included. The operator “AND” was used for combining the main search terms to identify articles focusing on AI-based CAs (as health carers) only applicable in the health care field, particularly for managing NCDs or chronic diseases.

Additional studies were identified by hand searching the reference lists of included studies and relevant review articles. Furthermore, a supplementary manual search was conducted to identify specific articles from diverse sources, including ProQuest Central, ResearchGate, ACM Digital Library, and Google Scholar. Some of these articles were identified through pilot hand searching and preselected as they were closely relevant but not retrieved from the selected databases used for primary searching, whereas others were discovered through manual searches targeting specific authors well known in this field. The search strategy underwent refinement following expert recommendations, including insights from a coauthor with digital health expertise, who advised excluding “robots” from the search terms, recognizing it as a distinct field. These methodological steps collectively ensured a comprehensive exploration of the relevant literature.

Textbox 1. Search query example.

Example

(“conversational agent*” OR “relational agent*” OR “virtual agent*” OR “dialogue agent*” OR “dialogue system*” OR “virtual assistant*” OR “chatbot*” OR “voice assistant*” OR “voicebot*” OR “voice-bot*” or “voice bot*” OR “humanoid *bot*” OR “social *bot*” OR “avatar*” OR “human-like avatar*” OR “anthropomorphic avatar*” OR “digital human*” OR “human digital twin*” OR “virtual human*”) AND (“intelligent” OR “artificial intelligence” OR “AI” OR “AI-based”) AND (“health” OR “healthcare” OR “caregiver” OR “self-management” OR “self-monitor*” OR “non-communicable disease*” OR “noncommunicable disease*” OR “chronic disease*”)Eligibility Criteria

Our scoping review used comprehensive inclusion and exclusion criteria to ensure the selection of pertinent studies. We did not impose any limitations based on gender or age groups in the selection of articles. The search scope was confined to scholarly articles in English published between January 2010 and July 2023, aligning with the substantial rise in CAs after 2010 [], notably with the introduction of Apple’s Siri in 2011 []. Evidently, most of the selected papers were on recent studies conducted within the last 5 years due to the accelerated technological advancements and the latest evolution of AI. Specifically, we focused on empirical studies exploring CAs applied exclusively to human interaction within the health care context, emphasizing their role in the remote management of NCDs, ideally within home environments.

The exclusion criteria comprised the exclusion of conference abstracts, posters, reviews, protocols, position papers or viewpoints, and certain types of studies such as those involving noninteractive robotic devices. We also excluded studies related to CAs designed for medical education and non–patient-centered applications in hospital settings and those addressing specific health care domains such as surgery, dentistry, pregnancy or maternity, addiction or substance use disorders, and communicable diseases. Furthermore, we excluded studies centered solely on medical history data storage, telephone monitoring, data set construction methods, or user evaluations of commercial CAs for health care without their practical applications in the remote management of NCDs. Our criteria aimed to streamline the focus on patient-centered CAs contributing to the self-management and remote monitoring of NCDs, eliminating studies with a primary emphasis on clinical interviews, disease prediction, or decision support without a social interaction element.

Screening and Selection

In total, 2 of the authors independently searched each database. Titles and abstracts were screened for inclusion according to the aforementioned criteria, followed by an exclusion of duplicates, unrelated studies, and articles that could not be retrieved. The abstract screening yielded 264 articles eligible for full-text screening, of which 70 (26.5%) were review papers comparing different types of CAs in the health care field and 156 (59.1%) were empirical studies that explored different types of CAs used in the prevention, treatment, or rehabilitation of chronic diseases involving consumers, caregivers, and health care professionals.

Subsequently, the authors screened full texts of the remaining articles independently, and 40 full-text articles that met the inclusion criteria were selected for review. In addition, 10 [] specific articles were obtained from hand searches of other sources (Google Scholar, ProQuest, ACM Digital Library, and ResearchGate), of which 3 (30%) [] relevant ones were selected upon full-text screening. It was an iterative process; any discrepancies were discussed among the authors. The search and selection process is illustrated in .

Figure 1. PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flowchart of the search and screening process. AI: artificial intelligence; CA: conversational agent; NCD: noncommunicable disease. Data Collation and Reporting

An Excel (Microsoft Corp) spreadsheet was initially created to aid the screening and selection process. Following the screening process, a total of 43 articles were eventually selected for synthesis. Quantitative and qualitative data from the included studies were extracted and summarized in a tabular format, including information such as intervention, type of CA, target population, number of participants with their age, methods, study duration, location, measures and outcomes, and limitations. The relevant extracted information was collated and summarized using the narrative synthesis approach, which was deemed appropriate for capturing the breadth of evidence in scoping reviews, identifying themes aligned with the research questions as well as patterns observed across the included studies and relevant reviews. The results were reported according to the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines. Given the divergent methodologies of the included studies, predominantly using mixed methods and qualitative approaches, no quantitative synthesis or meta-analysis was conducted.


ResultsCharacteristics of the Included Studies

Most of the included studies (22/43, 51%) were feasibility, acceptability, and usability studies (). These studies mostly found positive results in terms of feasibility and usability of the proposed CAs. Some studies (4/43, 9%) used the System Usability Scale questionnaire to measure the system’s usability score and predominantly found high usability, with System Usability Scale scores of >70 [-]. Alternatively, multiple studies (10/43, 23%) used the subjective ratings of most of the participants to evaluate the feasibility and acceptability of CAs by applying indicators such as perceived usefulness [-], ease of use [,], user satisfaction [,], engagement rate [,,], perceived closeness [], and Net Promoter Score []. However, 2% (1/43) of the studies reported negative usability and acceptability outcomes, indicating unsatisfactory reliability, usability, and goal structure, which hindered health care professionals’ acceptance and trust [].

Table 2. Characteristics of the included studies.Study, yearIntervention and study durationCAa type and delivery platformTarget population, sample size, and participants’ ageMethods and locationMeasures and outcomesLimitationsWatson et al [], 2012Behavior change intervention for obesity self-management
Duration: 12 weeks (divided into four 3-week periods—period 1, period 2, period 3, and period 4)
2D-animated human-like avatar (rule-based AIb)
Platform: desktop and laptop computers (web based)
Adults with overweight or obesity
N=62
Participants aged 20-55 years (inclusive)
Pretest-posttest observational study (quantitative)—2-arm RCTc
Location: United States
No significant percentage change in step count between intervention and control arms from start to end (2.9% vs −12.8%, respectively; P=.07)
Significant difference between percentage change in step count across all study periods in the intervention vs control arms (P=.02)
Secondary outcomes: no significant changes in secondary outcomes—BMI, 7-day physical activity recall, physical activity stage of change, self-efficacy and exercise benefits and barriers, and program satisfaction (eg, mean decrease in BMI of 0.04 and 0.25 in control and intervention groups, respectively; P=.44)
Average step count: significant decrease in the average mean step count for the control group from period 1 (7174) to period 4 (6149; P=.01) but no significant change for the intervention group from period 1 (6943) to period 4 (7024; P=.85)
Mean activity level: significant percentage change in mean activity levels between period 1 and period 3 (P=.02) but no significant change between period 1 and period 2 (P=.12) and between period 1 and period 4 (P=.07)
58% of the intervention participants agreed that the virtual coach influenced their increased activity
Participants were primarily White, college-educated women, limiting generalizability of findings to a wider patient population with overweight or obesity
Lack of baseline step count data for participants, although survey results showed no substantial baseline activity level differences
Initial observed increase in step count after enrollment likely reflects a change from baseline, but without baseline data, this cannot be confirmed
Kimani et al [], 2016Lifestyle intervention and patient counseling support for AFd self-management
Duration: not identified
Animated human-like avatar (network-based AI with XML scripting language)
Platform: mobile devices or smartphones
Patients with AF
N=16 (5 female and 11 male)
Participants aged ≥18 years (20-58 years)
Feasibility assessment—mixed methods pilot study with self-report scale measures and a semistructured interview
Location: not identified
High overall user satisfaction with the agent (mean 3.45 on a 4-point scale) and the AliveCor heart rhythm monitor (mean 3.54)
High ratings for ease of use (mean 3.54)
Interaction duration: participants reported a 7- to 10-minute–long interaction with the agent, whereas older participants reported longer interactions due to content relevancy
Agent’s user satisfaction correlated with the participants’ satisfaction with the use of the AliveCor heart rhythm monitor
Acceptable feasibility of delivering AF counseling via a smartphone-based humanoid avatar as a virtual agent
Small sample size and limited patient diversity
Short-term study duration and no long-term use evaluation
No control group for comparison to provide additional insights into the effectiveness of the virtual agent
No assessment of the virtual agent’s impact on patient outcomes or behavior change
Shamekhi et al [], 2017Behavioral intervention for self-management of stress and depression
Duration: 9-21 weeks
3D-animated human-like avatar (rule-based AI)
Platform: touch-screen tablets
Adults with chronic pain and depression
N=154
Participants aged≥18 years
Mixed methods observational study—2-arm RCT
Location: United States
Impact on patient performance and satisfaction: significant positive stress management behaviors among the CA intervention participants compared to the control group after 9 weeks (t136=3.74; P<.001)
Avatar (Gabby) was found to be very useful for group visits, allowing for review of class lessons and detailed information at any time and location
Most participants found the meditation, yoga, and mindfulness sessions provided by Gabby to be very “useful” and “relaxing”
Participants were very likely to recommend Gabby to others
Relatively small sample size from health centers in the Boston area, limiting generalizability of findings to other populations
Inability to isolate the effects of face-to-face weekly group visits from those of the home-based CA
Potential usability issues, especially for less computer-literate users; may affect Gabby’s user satisfaction
Bickmore et al [], 2018AF self-management through symptom monitoring and patient counseling
Duration: 30 days
3D-animated human-like avatar (network-based scripted AI)
Platform: mobile devices or smartphones
Older adult patients with AF
N=120
Participants aged≥60 years (mean 72.1, SD 9.10 years)
Mixed methods pretest-posttest (observational) study—quasi-experimental demonstration of CA and 2-arm RCT for usability and effectiveness evaluation
Location: United States
Change in AFEQTe score: significantly higher AFEQT score among the intervention group participants after 30 days compared to those in the control group (P<.05)
Over 89.7% of participants rated the CA positively, whereas 4.4% rated it negatively and 5.9% had mixed opinions
Participants praised the personalized interaction with the avatar
Relatively small convenience samples with limited participant details and a singular focus on a specific chronic condition (AF) may impact generalizability
Lack of long-term, objectively measured health outcomes
Insufficient emphasis on the complexity of self-care management regimens, especially for patients with low health literacy
Cheng et al [], 2018Personalized patient education and medication adherence intervention for improved self-management of T2DMf
Duration: 1 month
AI voice assistant and voicebot (MLg based)
Platform: Google Home devices (API.AI platform)
Older adult patients with T2DM
N=10
Participant age not identified
Feature-based comparison between the proposed voicebot and other similar mHealthh apps and usability evaluation—qualitative assessment
Location: not identified
AADEi requirement evaluation: the feature-based comparison indicates that most similar mHealth apps fail to meet the AADE requirements for effective T2DM self-management compared to the proposed voicebot
Practical usability evaluation:
There were more satisfied users than unsatisfied ones, primarily due to the voicebot’s speaker functionality and natural conversation flow
Unsatisfied users reported difficulty learning commands and limited answer choices
80% of older adult participants would prefer using Healthy Coping on Google Home over a smartphone
Narration speed concern: older adult users may struggle with application commands due to narration speed; a speed setting could allow users to adjust the pace
Accessibility limitations: Healthy Coping’s voice interface may not fully support users with hearing or speech disabilities; integrating devices such as Bluetooth-enabled hearing aids could improve accessibility
Suganuma et al [], 2018Internet-based cognitive behavioral intervention for self-monitoring of mental health
Duration: 1 month
AI (rule-based) chatbot via SABORI (mental health) app
Platform: web based
Adults with psychological distress and mental health problems
N=454 (intervention group: n=191; control group: n=263)
Participants’ age: mean 38.07 (SD 10.75) years
Feasibility and acceptability (pilot) study—nonrandomized prospective study with quantitative pretest-posttest intervention questionnaire
Location: Japan
Japanese version of the WHO-5j: a 1-factor scale to measure positive mental health based on physical aspects using a 5-item, 6-point Likert scale
On the variable of positive mental health, a significant main effect (P=.02) and interaction for time (P=.02), including a significant simple main effect for the intervention group (P=.002)
K10k: a 1-factor scale to measure negative mental health based on physical aspects using a 10-item, 5-point Likert scale
On the variable of negative mental health, a significant main effect (P=.02) and interaction for time (P=.005), including a significant simple main effect for the intervention group (P=.001)
BADSl: a scale to measure 4 factors-“Activation” (BADS-ACm), “Avoidance/Rumination” (BADS-ARn), “Work/School Impairment” (BADS-WSo), and “Social Impairment” (BADS-SIp)
2-way ANOVA test showed a significant trend for behavioral activation, indicating the possibility for a certain degree of effectiveness
BADS-AC: a significant main effect for time (P=.10), including a significant simple main effect trend in the intervention group (P=.06)
BADS-AR: no significant main or interaction effect
Nonrandomized comparison between control and experimental groups
Short-term study duration (1 month) for assessing behavioral activation effects
Recruitment of healthy participants in control group may have influenced outcomes, resulting in lack of effect for behavioral activation factors of avoidance and rumination
Hussain and Athula [], 2018To provide patient education for improved self-management of diabetes
Duration: not identified
AI-ML chatbot (VDMSq)
Platform: computer or mobile devices (web based)
Patients with diabetes and their carers seeking diabetes-related information
N=10
Participants aged 20-50 years
Quantitative and qualitative study—performance evaluation of VDMS compared to other information sources (eg, search and websites)
Location: not identified
Evaluation metrics (VDMS vs other sources):
Correct replies: VDMS=65%; search engine=80%
Satisfactory user satisfaction level of VDMS (very close to that of the search engine) due to VDMS’s timely response and correct answers’ quality and clarity
Small sample size (10 participants) may restrict diversity and evaluation scope
VDMS chatbot’s implementation is incomplete and lacks full testing
Conversation is mostly controlled by the chatbot, limiting user input
Extracted data from Wikipedia may lack reliability as no published work has explored integrating Wikipedia knowledge with a diabetes management chatbot
Inkster et al [], 2018To deliver positive psychology and mental well-being techniques for improved self-management of depression and anxiety
Duration: 8 weeks
Commercial CA: text-based AI chatbot (Wysa)
Platform: mobile devices or smartphones
Individuals with self-reported symptoms of depression
N=not identified
Participants’ age not identified
Quasi-experimental (pretest-posttest) design—mixed methods study using the Wysa app’s inbuilt assessment questionnaire (for effectiveness evaluation)
Location: global (not specified)
Impact (pre-post) analysis outcomes:
PHQ-9r score (within groups) measured using a Wilcoxon signed rank test: significant reduction in PHQ-9 score among both high users (P<.001) and low users (P=.01), indicating an improvement in depression symptoms from before to after the intervention
High users’ group showed significantly higher average mood improvement compared with the low users’ group (Mann-Whitney P=.03, with a moderate effect size of 0.63)
User engagement and feedback: 73.6% of users provided at least one response to the in-app feedback questions (indicating high user engagement), and 60.9% of them reported feeling better after app use
Small sample size restricts investigation of user reactions to app design elements
Small and unbalanced comparison group sizes undermine the findings’ generalizability
Limited detailed feedback on users’ app experience hinders qualitative analysis
Lack of a randomized controlled environment may introduce biases
Absence of users’ previous health information hampers comprehensive understanding
Quasi-experimental design used, slightly lower in design quality compared to interrupted time-series designs
Neerincx et al [], 2019Behavioral intervention and patient education for self-management of T1DMt
Duration: 3 months
Humanoid robot (PALu) and its robotlike digital avatar version for home (MyPal; rule based+ML with cloud computing)
Platform (avatar): tablets
Children with T1DM
N=49
Participants aged 7-14 years
Qualitative study—prototype design and usability evaluation using an iterative, incremental development process
Location: Italy and the Netherlands
Unsatisfactory reliability, usability, and goal structure of PAL system, which hindered the acceptance and trust of health care professionals
Improvement in children’s diabetes knowledge when using the PAL system and increased motivation in performing diabetes-related activities due to enjoyable interaction with the PAL robot and avatar
Improvement suggestions: personalization enhancement required to establish patients’ adherence
Innovative PAL functions (eg, experience-sharing function) were identified and tested with positive results
Reduced bonding effect with higher perceived similarity between the robot and its digital avatar
Most children stopped using the avatar (MyPAL app) some weeks after the study started
Small sample size may affect the findings’ generalizability
Limited study duration may not capture the PAL system’s long-term effects on T1DM management
Comparison between physical robot and avatar function may introduce biases in user preference assessment
Insufficient detailed information on AI algorithms or ML techniques used restricts insights into the PAL system’s technical aspects
Easton et al [], 2019Self-management for people with comorbid long-term conditions and mental health problems
Duration: not identified
AI chatbot with an animated human-like 2D avatar (Avachat)
Platform: computers, tablets, smartphones, and televisions
Older adults with comorbid long-term conditions and mental health problems
N=10
Participants aged ≥55 years (56-86 years)
Co-design and acceptability testing involving stakeholders—qualitative study with snowball sampling and workshops for initial user requirement gathering of prototype and user feedback
Location: United Kingdom
Acceptability and feasibility:
Patients found Avachat to be helpful, informative, and easy to use
Health care professionals are optimistic about the avatar’s potential to improve patient outcomes and reduce the health care burden
Improvement suggestions: enhance personalization of contents and ensure accessibility for users with visual and hearing impairments
Small sample of White, British, medically stable, and regional participants recruited may limit findings
No current mental health problems reported by participants, although past instances of low mood or worry were mentioned
Chaix et al [], 2019To provide personalized patient education for improved quality of life and medication adherence in patients with breast cancer
Duration: 1 year
Commercial CA—AI chatbot (Vik)
Platform: mobile devices or smartphones (iOS or Android) and web browser via Messenger app
Patients with breast cancer and their relatives
N=4737
Participants’ age not identified
Quantitative study using user-chatbot conversational data
Location: not identified
Users’ interactivity level: average of 132,970 messages exchanged per month, with 147 total average interactions per question (for open-ended questions), resulting in 2.7 interactions per person per question
Overall user satisfaction rate: 93.9% (900/958), and 88% (843/958) found Vik helpful in following treatment effectively
Average medication adherence rate of patients improved by >20% (P=.04) after using Vik for 4 weeks
Absence of a control group for comparison
Lack of long-term evaluation of the chatbot’s impact on clinical outcomes
Reliance on users’ self-reported data, introducing potential bias
No investigation of the viewpoints of health care providers or other stakeholders in patient care
Potential technical issues or limitations of the chatbot not addressed
Stephens et al [], 2019Behavioral intervention for constant management of obesity and prediabetes symptoms
Duration: 6 months
Commercial CA—AI behavioral coaching chatbot (Tess)
Platform: mobile devices or smartphones via messaging apps (eg, WhatsApp and Messenger); Google Home or Amazon Alexa (for voice conversations)
Adolescent pediatric patients with obesity and prediabetes symptoms
N=23
Participants aged 9-19 years
Feasibility study—mixed methods approach with qualitative interviews, usability testing, and quantitative surveys (SUSv questionnaire)
Location: United States
Positive progress toward goals was reported by the participants 81% of the time
4123 messages were exchanged, with patients rating the chatbot’s usefulness 96% of the time, indicating its high perceived usefulness and feasibility among adolescents
Small sample size, causing limited generalizability
Gradual program adjustments may introduce variability and potential inconsistency
Lack of an experimental design to control for factors
Inability to ensure detection of a treatment effect
Balsa et al [], 2020Behavioral intervention for self-management of T2DM
Duration: 8-10 days
Animated human-like avatar (Vitória) in a 3D environment (rule-based AI)
Platform: mobile devices or smartphones
Older adults with T2DM
N=20
Participants aged ≥65 years (67-80 years)
Usability study—qualitative assessment
Location: Portugal
Usability evaluation for T2DM medication adherence and lifestyle improvement
Usability (SUS) score: 73.75 (SD 13.31; indicates significantly high usability of Vitória)
Small sample size and skewed participant sample toward expert technology users
Reliance on previous experience and available resources for sample size estimation
Criticism of questionnaires for yielding only overall measures without addressing specific concerns
Potential bias in field-testing compared to laboratory setting for usability tests
Gong et al [], 2020Behavioral intervention with personalized support and motivational coaching for remote self-management of T2DM
Duration: 12 months
Human-like (text+voice enabled) avatar (Laura; BCTw-based prescripted AI)
Platform: mobile devices or smartphones
Adults with T2DM
N=187
Participants aged ≥18 years (mean 57, SD 10 years)
Pretest-posttest (quantitative) observational study—2-arm, open-label RCT for adoption, usefulness, and effectiveness evaluation
Location: Australia
Program adoption and use:
Number of valid chats with Laura completed per person: 18.4 (SD 15.0; range 1-53)
Total duration of valid chats per person: 242.7 (SD 212.3; range 0-1050) minutes
Number of glucose level uploads per person: 181.8 (SD 192.1; range 1-966)
Number of clinical alerts: total=297; average per month=13.7 (SD 8.8)
Number of technical alerts: total=179; average per month=8.3 (SD 6.5)
Number of posts on the web-based discussion forum: total=19; average per month=1.1
Program effectiveness (in terms of coprimary and secondary outcomes)—coprimary outcomes:
Statistically significant between-arm difference at 12 months in the mean change in HRQoLx (AQoL-8Dy utility value: 0.04, 95% CI 0.00-0.07; P=.04)
Reduction in HbA1cz levels during the trial but no statistically significant between-arm difference at 6 months (0.06, 95% CI −0.35 to 0.47; P=.78) or 12 months (−0.04, 95% CI −0.45 to 0.36; P=.84)
Significant improvement in HRQoL from baseline to 12 months (mean estimated change in AQoL-8D score: 0.04, 95% CI 0.01-0.06; P=.007)
Increase in the score of the physical health and mental health subscales compared with baseline
Significant between-arm difference in the mean change in the HADSaa anxiety score at 6 months (–0.89, 95% CI –1.74 to –0.04; P=.04) but not at 12 months or for other secondary outcomes reported
Small sample size may restrict the generalizability of the findings, and subgroup analyses require cautious interpretation
Absence of blinding of participants and their GPsab to the study arm allocation might lead to potential self-report bias and Hawthorne effects
Control arm participants showed a higher rate of completed assessments, possibly due to their interest in program access or higher attrition in the intervention arm
Subgroup analyses were underpowered, with multiple testing increasing the risk of false positives
Issom et al [], 2020Health behavioral intervention for self-management of SCDac
Duration: not identified
AI chatbot
Platform: mobile devices or smartphones
People with SCD
N=19
Participants’ age not specified
Preliminary feasibility study—quantitative posttest survey
Location: France
88% of participants rated the following question—“The chatbot contains all the information I need”—with at least 3/4, and its total score was 54/68
58% rated the following question—“The chatbot encouraged me to be more active in order to improve my condition”—with at least 3/4, and its total score was 51/68
Results indicate high perceived usefulness of the chatbot in promoting knowledge and motivation for improved self-care practices
Small sample size: only 17 participants completed the evaluation, potentially limiting generalizability
2 withdrawals due to smartphone issues may have impacted data collection and results
Reliance on self-reported data in the posttest survey could introduce bias or inaccuracies
Limited scope: evaluation focused solely on perceived usefulness of the chatbot, lacking assessment of long-term impacts on self-care or health outcomes
Lack of comparison with other support forms or interventions hinders assessment of relative effectiveness
Anastasiadou et al [], 2020To offer continuous health education and interaction in English, Spanish, and Bulgarian for self-management of diabetes
Duration: 6 months
Multilingual AI-ML chatbot (EVAad)
Platform: mobile devices or smartphones
People diagnosed with diabetes
N=not identified
Participants’ age not identified
Qualitative pilot study—validation and acceptance evaluation by integrating EVA into an mHealth app (CHRODIS PLUS Joint Action) to collect data on user queries and responses provided based on the educational content
Location: Greece
Users sent a total of 940 unique messages to EVA
Users’ common questions related to diabetes varied based on EVA’s language:
English: self-measurement and understanding what diabetes is
Spanish: glucose self-monitoring
Bulgarian: insulin and high blood pressure
A comprehensive analysis of the effectiveness of EVA in improving diabetes management or patient outcomes was not provided
Lack of detailed user demographic information may affect generalizability
Study duration limited to 6 months, possibly overlooking long-term user interactions and feedback
Absence of specified measures to address potential biases in user interactions or data collection
No discussion about potential technical limitations ad challenges during EVA system development and testing
Roca et al [], 2021Medication adherence intervention for improved self-management of T2DM and depressive disorder
Duration: 9 months
AI-ML–based chatbot
Platform: mobile devices or smartphones
People with comorbid T2DM
N=13
Participants aged ≥18 years
Quantitative pilot trial—pretest-posttest observational study (1 arm)
Location: Spain
Significant improvement in the average HbA1c level and PHQ-9 scores from 7.6 (SD 0.7) to 7.3 (SD 0.8) and from 13.2 (SD 6.0) to 8.6 (3.6), respectively
Reduction in the number of physical medical consultations per month from 2.0 (SD 2.6) to 1.3 (SD 1.5) in 30.8% of the patients
Health care professionals participating in the study found the chatbot useful in improving medication adherence
Participants who used the chatbot daily found it useful in fulfilling their medication reminder needs
The chatbot’s language and vocabulary were appropriate and easy to understand
38% of participants had difficulty learning to use the chatbot, and 15.4% reported the chatbot’s inability to understand the users’ instructions
Almost 70% of the patients (9/13) expressed willingness to continue using the chatbot after the study
Small sample size (13 participants) may limit generalizability of the study findings
Some patients did not update medication information after the initial configuration, leading to reminders being stopped
Patients require digital literacy or assistance for configuring the messaging platform
Reminder sounds may be accidentally disabled by patients, affecting the use of the chatbot
Yao et al [], 2021Remote monitoring of patients after stroke and automation of suspected stroke screening
Duration: not identified
Hyperrealistic human-like 3D avatar (iLAMA; ML based with computer vision algorithms)
Platform: mobile devices or smartphones and tablet
Older adult patients after stroke
N=140 (videos)
Participants aged ≥60 years
Quantitative beta testing of the prototype on 140 videos with volunteers via email
Location: not identified
The app was able to recognize body parts and extract 68 facial landmarks from the facial videos
The app could provide accurate stroke screening results for neurologists and stroke specialists to review
No major technical issues or problems with the app during beta testing were reported
iLAMA has the potential to improve stroke assessment and care, particularly in areas with limited access to stroke specialists
Participants were regular volunteers (friends and family) rather than patients after stroke
Beta testing involved a relatively small sample size of 140 videos
The app may struggle to differentiate minor mistakes from actual stroke signs, especially without supervision
Screening process takes approximately 5 minutes, which some users may find burdensome
Krishnakumar et al [], 2021Lifestyle intervention and self-monitoring of diet, exercise, weight, and blood glucose for T2DM management
Duration: 16 weeks
A

留言 (0)

沒有登入
gif