The etiopathological underpinnings of psychosis, defined as a pathological brain state (1), as well as psychotic disorders as diagnostic constructs, still elude us after more than 50 years of research (2). Apart from their inherent heterogeneity regarding clinical manifestations, it is challenging to demystify the causality of psychotic disorders due to their seemingly random onset, chronic course and recurrent nature, which leaves a lasting and progressive impact on patient functioning. Crucially, over 80% of individuals with psychotic disorders will experience relapses (3), or transitions to a state of psychosis. The current best approach to prevent them is via continuation of antipsychotic and/or mood stabilizing treatment for years (4), which then exposes patients to a variety of serious medication side effects. It is established in the literature that this leads to issues regarding compliance (5), while treatment non-adherence has been shown to be the single most significant predictor of relapse (6). Furthermore, even among those who follow treatment, there is still a 20-30% chance of symptom recurrence after First-Episode Psychosis (FEP) (7). From a clinical perspective, early identification of psychotic relapse would be of vital importance since the clinician would then be able to stop the vicious cycle of symptom recurrence after treatment discontinuation.
Nevertheless, an overwhelming percentage of studies in the field of psychotic disorders revolve around distinguishing between patients and healthy controls. The most prominent study design includes a cross-sectional comparison of a potentially implicated etiological factor between patients and healthy controls, or between patients with different diagnoses. While this approach has unraveled several risk factors for psychotic disorders, it does not allow for predictions regarding the course of the disease for individual patients, therefore providing limited clinical benefits. Studies involving relapse, on the other hand, could shed light on possible deciders of disease course, but are significantly harder to design and perform. Given the chronicity and random trajectory of the phenomenon, these studies must be prospective, while ideally measurements or monitoring need to be close to continuous, to have available data at, or around the time of relapse, to draw comparisons with data originating from periods of remission. Additionally, the amount of data required to be amassed for a sufficient number of relapse events to be recorded is massive, given their relative sparsity (an epidemiological study (8) measured 751 events in 3980-participant years). Analyzing such a long-term phenomenon entails diligent patient monitoring for years. Another caveat that needs to be accounted for, is that treatment adherence cannot be controlled in such studies. The lack of a scalable way to confirm medication status (9) at relapse introduces confounding factors that could hinder result reliability.
Despite these obstacles, there have been attempts to map the course of psychotic disorders and identify potential risk factors, or predictors for relapse. The vast majority of these efforts involve models with solely clinical variables as predictors and include no biological factors (or biomarkers) [see (10) for a meta-review]. Treatment non-adherence and premorbid functioning have been isolated as the most significant predictors of relapse. However, two distinct issues arise when developing exclusively clinical prediction models. Firstly, little to no new insight is gained regarding pathophysiology, thus no progress can be achieved regarding intervention effectiveness and new medications. Additionally, clinical models provide no information regarding the exact temporal onset of a relapse occurrence, which would allow for early intervention. Clinical variables such as family history, alcohol consumption, or drug abuse are represented as binary variables measured at one instance in time (cross sectional design). Other variables, such as premorbid functioning do not evolve at all in time. Yet the course of psychotic disorders is dynamic in time, characterized by psychotic episodes followed by remission phases. Moreover, given the relatively sudden onset of symptom recurrence, it would be reasonable to assume some biological change happening on short time scales. To capture it, one would have to monitor some biological factors, or biomarkers, at a sufficiently high frequency. The term biomarker is defined as “a characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention” (11), and it could refer to anything from the serum concentration of a specific hormone to the time elapsed when a human responds to some stimulus [Reaction Time). Notably, biomarkers evolve on various time scales, ranging from milliseconds to hours or even days, in stark contrast to clinical markers. Heart rate, for example, has been shown to exhibit variability on very short time scales (in the 0.15 - 0.4 Hz frequency range (12)], but also fluctuates diurnally, especially between night and day time (13). To conclude, clinical variables seem unsuitable for predicting the temporal onset of relapse, whereas the same cannot be said for biomarkers (14), whose short-term alteration could correlate with symptom reignition.
It becomes apparent that the utilization of biomarkers in predictive models for psychotic relapse, either exclusively or in conjunction with clinical parameters, could provide the missing piece to solve the conundrum in hand. In this systematic review, we consolidate and present the findings of all studies using genetic, blood-based, neuroimaging, cognitive and behavioral biomarkers as predictors for psychotic relapse. We also cover a distinct category of studies in which data is continuously accrued via wearable devices or smartphones. Data from these studies includes accelerometer or heart rate measurements, which are commonly used biomarkers, but also information regarding geolocation, text messages, duration of phone calls, or screen activity, which we consider as proxies of behavior. The detailed inclusion criteria are reported in the methods section, but to outline the process, we included studies that longitudinally monitored biomarker levels and clinically evaluated patients to identify relapse. Biomarker levels were either measured continuously, or at two or more distinct time points (usually with one corresponding to a period of relative health and one corresponding to relapse). We included cross sectional studies if and only if the entire sample consisted of patients experiencing symptom recurrence, and not first-episode psychosis. The objective of the present review is to delineate the progress that has been accomplished so far regarding relapse prediction via biomarker monitoring, but also to underline potential methodological caveats.
2 Methods2.1 Main outcomeThe main outcome of this study is psychotic relapse, which is defined clinically, and refers to the occurrence of a noninitial psychotic episode, after a period of symptom remission. We largely base our definition of remission on Andreassen’s criteria (20), where the authors propose a clinical framework for defining remission in SCZ based on score thresholds in the Positive and Negative Symptom Scale (PANNS), for items such as P1 (Delusions), P2 (Conceptual Disorganization), P3 (Hallucinatory behavior), and G9 (Unusual thought content), as well as in the Brief Psychiatric Rating Scale (BRPS), regarding items 8(Grandiosity), 11 (Suspiciousness), 12 (Hallucinatory behavior) and others. Andreassen et al. suggest that these scores must remain at below-threshold levels for 6 months for remission to be defined, but we impose the lower bound of 1 month in the present review. Moreover, we deemed that if patients were discharged from the hospital after a clinical evaluation, it is implied that they entered a period of potential remission, even if the actual scores of the evaluations were not reported. We only excluded studies that treated rehospitalizations as adverse outcomes, with no mention of SCZ diagnosis or psychotic symptomatology as the reason for readmission to the hospital. The relative leniency of these criteria is due to the objective of this study, which is to bring the findings of biomarkers research to the forefront. Given the state of the field, we do not believe it is yet appropriate to formulate standardized guidelines, which would be directly applied in clinical practice.
2.2 Study design overviewThis systematic review was conducted in alignment with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines [ (15, 16), see Supplementary Materials]. The selection process of articles that were included broadly consisted of three phases, which started in February of 2024 and were concluded in April of the same year. During the first step of the process, a list of relevant keywords were identified, which were then run against records in the PubMed and Scopus databases. We also hand-checked citations of all retrieved papers, obtaining no new, unique records. Search results were then screened (title and abstract initially, then full papers) based on a set of inclusion and exclusion criteria tailored to the PICOS/PECOS worksheet (15, 16). The final step included a categorization of studies into subgroups according to the nature of examined biomarkers, namely genetic, blood-based, neuroimaging, cognitive, behavioral and related to wearable devices and smartphones. Studies were then meticulously analyzed, and relevant information was distilled in the form of tables.
2.3 Selection and analysis procedureTo begin with, PubMed and Scopus databases were searched based on a list of predetermined keywords (exact search queries depicted in Table 1), with no temporal restrictions and no other applied filter.
Table 1. Keyword combinations used in queries in both PubMed and Scopus, alongside the number of results produced in the initial search.
After removing duplicate records, we scanned (AS and CT) the titles and abstracts of the retrieved studies independently and excluded those that were irrelevant to the research question. Any discrepancies were addressed by a third independent reviewer (PF or NS). The main reason for exclusions at this stage of the process was that in search queries using the key word “prediction”, which yielded more results, the utilization of biomarkers instead of clinical variables as predictors was relatively rare. We then defined (see Table 2 below) and applied the PICOS/PECOS worksheet criteria to a pool of 121 full papers, ultimately selecting a total of 42 studies for inclusion.
Table 2. Inclusion and exclusion criteria designed according to the blueprint of the Population, Intervention or Exposure, Comparison, Outcomes, and Study Design (PICOS/PECOS) worksheet.
Since the goal of the present review is to assess available means for relapse prediction via biomarker monitoring, we only included original studies using biological or behavioral factors, involving relapsed patients. The notion of prediction implies monitoring the evolution of some phenomenon in time and drawing conclusions regarding occurrence rate and time of onset relative to some fixed time point. This would suggest that only longitudinal studies should be included, however, we also include cross sectional studies where patients experiencing relapse are compared to FEP patients or healthy controls. While this design introduces a plethora of confounding factors (for instance FEP patients are usually drug-naïve, whereas relapsed patients have taken or are still taking antipsychotic medication), it is still meaningful to consider them, since for certain categories, such as blood-based biomarkers, continuous and even sequential monitoring presents extreme practical difficulties. Regarding study outcomes, we only included studies where the main objective was to either predict relapse in a temporal sense, or to at least shape an a priori relative-risk profile. Studies related to diagnosis or treatment effects were excluded.
The 42 studies that were finally selected, were thoroughly analyzed, and the following information was extracted: Authors and country of origin, Study design (longitudinal or cross sectional), Sample size and diagnosis for patient groups, Data collection process, examined biomarkers, Analysis tools, Main objectives related to relapse, Statistical results and Synthesis of main findings. Data extraction was initially performed by AS, and was independently validated by CT, PF, and NS. No protocol was registered beforehand for this review. Excel files and data used in this systematic review are available upon request.
2.4 Quality assessment of included studiesThe Appraisal tool for Cross-Sectional Studies (AXIS (17), Table 3) was utilized for risk of bias assessment of included studies and it encompasses questions related to study objective and design, sample size justification, sample selection process and reasons for exclusion of participants, internal validity, presentation and replicability of statistical tools and their corresponding results as well as limitations, funding sources or conflicting interests. Note that although theoretically the AXIS tool is tailored to cross sectional studies, almost all questions can apply to a wider range of study designs. Reviewers had the same assigned roles as in the data extraction procedure.
Table 3. Quality assessment of included studies via the appraisal tool for cross-sectional studies (AXIS).
3 ResultsFigure 1 depicts a schematic representation of the selection process, which starts with a series of queries in the PubMed and Scopus databases yielding a total of 1891 results. Given the conceptual closeness of the various keyword combinations, duplicate records were expected and after their removal, a list of 808 unique papers was compiled. Of these 808 papers, 687 were excluded from the title and abstract, on grounds of non-relevancy to the research question. After application of the defined PICOS/PECOS criteria to the remaining 121 papers, a total of 42 were finally selected and analyzed. Of the 79 papers excluded based on PICOS/PECOS, 19 incorporated models using predominantly clinical parameters and not biomarkers, 38 were focused on distinguishing patients from controls, 12 made comparisons between groups receiving different treatment, 8 examined illness course via symptom scales such as PANNS, but relapse events were not identified and classified, 1 (18) sought to predict relapse, but no relapse instances occurred due to limited study duration and 1 (19) revolved around the transition of patients from high-risk to psychosis.
Figure 1. Flowchart of the selection process of the final 42 papers analyzed in the review.
The final 42 studies were grouped based on the nature of the examined biomarkers. This categorization consisted of:
1. the genetic biomarker subgroup (n = 4, or 9%) summarized in Table 4,
2. the blood-based biomarker subgroup (n = 15, or 36%) summarized in Table 5,
3. the neuroimaging biomarker subgroup (n = 10, or 24%) including studies in structural Magnetic Resonance Imaging (MRI), functional MRI, electroencephalogram (EEG) and Positron Emission Tomography. summarized in Table 6,
4. the cognitive-behavioral biomarker subgroup (n = 5, or 12%) including markers used to assess performance in cognitive domains such as memory, attention, perception and executive function, as well as behavior markers based on internet search history and Facebook posting habits, summarized in Table 7,
5. and the wearables biomarker subgroup (n = 8, or 19%). which encompasses studies that apply machine learning models on passively collected data from wearable devices such as smartwatches, but also from smartphones, to identify sudden pattern breaks constituting the signature of impending relapse, summarized in Table 8.
Table 4. Presentation of analysis results for papers using genetic biomarkers.
Table 5. Presentation of synthesized findings from papers using non-genetic, blood-based biomarkers.
Table 6. Main points regarding sample, methodology, analysis, and results for papers examining neuroimaging/neurophysiology biomarkers.
Table 7. Synthesized findings from studies using cognitive/behavioral markers as predictors of relapse.
Table 8. Presentation of samples, data collection process, analysis tools and key findings from studies using passively collected data from wearable devices and smartphones.
It should be noted that in many studies relapse prediction constitutes only one of the pursued goals. Here we focus solely on this axis, disregarding unrelated findings, for example any group differences between patients and healthy individuals, which relate to diagnosis and not relapse.
3.1 Genetic biomarker subgroupIn Table 4 (below) synthesized findings from 4 (9%) studies using genetic biomarkers are depicted.
Meier et al. (21) analyzed approximately 3000 genomic risk profiles (GPRS), which is calculated using alleles at thousands of different loci. The study examined correlations between the GPRS and admission frequency, specifically for SCZ relapse. The authors reported that for a high number of single-nucleotide polymorphisms included in the GPRS, it correlated significantly with number of admissions. Segura et al. (24) calculate the polygenic risk score (PRS) [metric similar to GPRS in (21)], but find no significant differences in the PRS between SCZ patients who relapsed in a span of three years and those who did not. Pawelczyk et al. (22) calculated the length of telomeres (regions at the end of chromosomes), whose faster than normal decay has been linked to a plethora of non-psychiatric disorders. Their results indicate that shorter telomere length correlates with a higher number of psychotic episodes. Gasso et al. (23) implemented a prospective design and collected the total RNA (337.000 transcripts and variants corresponding to 20.800 genes) twice, at baseline and after 3 years or at relapse if it occurred, in 91 patients with SCZ. They created gene connectivity networks or clusters, consisting of 41 to 5627 different genes. This approach captures co-expression of certain genes which form larger coalitions and via implementation of preservation analysis, the stability of these networks in time can be assessed. The authors identified two main networks that were semi-conserved after 3 years, but their ability to predict relapse did not reach significance. Patients were then split based on gene expression of the most stable network, with those at the highest percentiles showing higher relapse risk.
3.2 Blood-based biomarker subgroupIn this next category we review the 15 (36%) studies using non-genetic biomarkers derived from blood samples. Table 5 showcases aggregated findings from our analysis.
In (30, 34, 39) prospective designs (2-3 years, 221, 69, and 105 SCZ patients respectively) are used to assess the relation of BDNF and NGF with psychotic relapse. In Pillay et al. (30), Martínez-Pinteño et al. (34) as well as Isayeva et al. (39), both BDNF and NGF do not predict relapse, neither in ROC curve analyses, nor in linear effects models. Borovcanin et al. (27), Luo et al. (33), and Miller et al. (37) assess the predictive value of various cytokines, such as IL-2, IL-4, IL-6, IL-8, IL-23 TNF-α and others (see Table 5). Borovcanin et al. (27) implemented a cross-sectional design to compare IL-23 levels between 78 FEP and 47 relapsed patients. No significant differences in IL-23 levels between FEP and relapsed patients were reported. Luo et al. (33) collected blood samples at admission and discharge from 68 patients and compared the levels of IL-6, IL-18, INF-γ, and TNF-α. Paired t-tests revealed significant differences only in IL-6 levels. Miller et al. (37) collected blood samples with a maximum frequency of once every 3 weeks for up to 30 months in 200 patients (70 relapses during the study). Group comparisons between relapsed and non-relapsed patients yielded no significant results. However, comparisons between pre- and post-relapse cytokine levels within the relapse groups revealed significant decreases in IL-6 and INF-γ values. Ozdin and Boke (32) retrospectively compared ratios involving white blood cells, platelets, neutrophils, monocytes, and lymphocytes in 105 patients, during admission and discharge from the hospital. Multiple markers indicative of inflammation, most notably white blood cells and the monocyte to lymphocyte ratio, were found to be significantly elevated during relapse compared to remission. Piotrowski et al. (31) use a cross-sectional design comparing FEP patients (42), with relapsed patients (25). They use the allostatic index (AI), which encompasses a variety of biomarkers, namely Systolic and Diastolic blood pressure, BMI, waste to hip ratio, high sensitivity CRP, fibrinogen, albumin, fasting glucose and insulin, total cholesterol, LDL, HDL, triglycerides, cortisol and DHEA-S. ANCOVA revealed significant differences between relapsed patients and FEP patients, with blood pressure and waste to hip ratio being the most important individual contributors. Marques and Ouakinin (35) prospectively followed a sample of 60 patients, measuring unconjugated bilirubin (UCB) during relapse and remission. UCB differed significantly (ANOVA) between relapse and remission. Fabrazzo et al. (36) performed a cross-sectional comparison between 74 acutely relapsed patients and 78 stable outpatients measuring Vitamin D and Parathyroid Hormone (PTH). Both Vitamin D and PTH were found to be significantly lower in patients experiencing relapse. Morera-Fumero et al. (28) collected blood samples from43 patients at admission and discharge and calculated the Total Antioxidant Capacity (TAC), which refers to the antioxidant capacity of water-soluble molecules such as albumin, caeruloplasmin and other proteins. TAC values did not differ significantly between relapse and remission. The final four studies, namely (25, 26, 29, 38) use biochemistry markers. Kaddurah-Daouk et al. (25) compared the plasma lipid profiles (5 phospholipids) of 20 patients experiencing relapse with 20 FEP patients. Wilcoxon rank sum tests revealed no significant differences between FEP and relapsed patients. Schwartz et al. (26) used multiplex immunoassays to obtain a panel of 191 proteins and small molecules in 77 patients, who were prospectively evaluated clinically (18 relapsed, 59 did not). Wilcoxon rank sum tests were applied to test for group differences, while random forest analysis was used for classification based on baseline biomarker values. Significant group differences were present in 27 molecules. Random forest analysis predicted the time to relapse (as a binary classification problem, i.e. short vs long term relapse) with 94.5% accuracy. Weight change alone predicted the same result at an accuracy of 83.4%. Szymona et al. (29) compared the levels of certain kynurenines (such as Kynurenic Acid and 3-Hydroxykynurenine), in blood samples of 51 patients between relapse and remission. Mann-Whitney U tests did not reveal significant differences within the patient group at the two time points. Lin et al. (38) used weighted correlation networks to identify clusters of metabolites (Liquid Chromatography-Mass Spectrometry metabolomics) that could differ between 34 first episode and 30 recurrent patients. Weighted correlation network analysis isolated a cluster of 317 metabolites correlating with status (FEP, recurrent patient), with phenylalanylphenylalanine being the single most influential predictor.
3.3 Neuroimaging/neurophysiology biomarker subgroupWe proceed with the findings of neuroimaging and neurophysiology studies, including structural Magnetic Resonance Imaging (MRI), functional MRI, electroencephalogram (EEG) and Positron Emission Tomography. Our search yielded 10 (24%) such papers, briefly summarized in Table 6.
In (40, 41, 43, 46, 47) metrics from baseline structural MRI scans are used to predict relapse. (Only in (40), repeat MRI scans were performed every 18 months). Liebermann et al. (40) performed structural MRI scans every 18 months on 107 patients (with 51 remaining in the study after a year). The first and last MRI scan of every individual was used in mixed model multivariate analysis of covariance, which showed significant ventricular enlargement in chronic patients when compared to those who achieved maintained remission. De Castro-Manglano et al. (41) used structural MRI scans to compute Grey Matter (GM) volume, which was then used for comparisons between patients with good versus poor outcome. Two sample t-tests showed that GM volume was significantly lower, most notably in the right hippocampus of patients experiencing poor outcomes. Nieuwenhuis et al. (43) performed a baseline MRI scan and clinically followed up on 212 patients for 3-7 years. They constructed a probability map for grey matter values in approximately 170.000 voxels and proceeded to apply a support vector machine classifier to test the feasibility of illness course classification based on baseline MRI data. Classification accuracy did not significantly differ from chance level when using aggregated data from 7 research centers. Solanes et al. (46) performed baseline MRI scans and clinically monitored 277 patients for up to one year aiming to develop a risk stratification framework based on MRI and clinical data. While only 16 relapse incidents were reported in the study, the authors report that Cox regression resulted in a 4.58 Hazard ratio for the high risk group, albeit in the combined clinical and biomarker model. The sole use of biomarker data did not result in accurate high versus low risk classification. Sasabayashi et al. (47) calculated the Local Gyrification Index (LGI) from the MRI scans of 52 patients, 19 of which experienced relapse during the 3 year follow up period. The LGI is a measure of cortical folding (i.e. area of peaks and troughs). Significant differences between relapsed and non-relapsed patients were reported in 3 of 800 regions of interest, providing some evidence that higher LGI, associated with neurodevelopmental anomalies, could play a role in relapse.
In (42, 48, 49), the authors utilize data from functional MRI scans. Yamadar et al. (42) implemented a cross-sectional design with 13 FEP and 27 acutely relapsed patients, who performed the Semantic Association Retrieval Task (SORT), while getting a functional MRI scan. The main region of interest was the inferior parietal lobule. Behavioral results were mixed (no differences in accuracy but significant faster reaction times for the FEP group compared to the relapse group), while fMRI results showed no differences in inferior parietal lobule activation levels. Rubio et al. (48) obtained 20 minute resting state fMRI scans of 50 acutely relapsed patients, further split into those who relapsed despite taking antipsychotic medication (23) and those not receiving treatment (27). After calculating the Striatal Connectivity Index (SCI), linear regression models were applied to test for differences between the two patient groups, as well as between all relapsed patients and healthy controls. SCI was significantly lower in patients, with the difference being more prominent in individuals experiencing relapse despite treatment. Odkhuu et al. (49) gathered 5 minutes of resting state fMRI in 30 patients, split into those who relapsed and those who did not. The Global Functional Connectivity Strength (GFC) was found to differ significantly between relapsed and non-relapsed patients via one-way ANOVA.
Kim et al. (44) gathered two PET scans from 25 patients, before and after medication discontinuation. Linear mixed effects models showed significant group by time differences in the influx rate constant related to the cerebellum (Ki[cer]), between patients who relapsed and those who did not. Within the relapse group, baseline Ki[cer] correlated negatively with time elapsed until relapse.
Mi et al. (45) collected EEG data from 32 patients during the vowel and consonant Mismatch Negativity task (see table above for more details). Pearson correlations were significant for 6 out of 9 electrodes in the vowel MMN paradigm (0 of 9 for consonant). A reduced amplitude of the ERP correlated with more hospitalizations, higher medication dosages and suicidal ideation.
3.4 Cognitive/behavioral/internet activity biomarker subgroupAnother dimension of symptoms in psychotic disorders is in the cognitive domain, with deficits, not explained by positive or negative symptoms, being present in most patients. Markers used to assess cognitive performance include output (i.e. performance) metrics in memory, attention, perception, executive function, and various other tasks. There are two studies focusing on internet search history and Facebook posting habits respectively, which are deemed as behavioral. Our search yielded 5 (12%) papers related to cognition and behavior, briefly presented in Table 7.
Chen et al. (50) performed two cognitive assessments, including memory, executive function and lexical tasks on 93 patients at admission and after stabilization. Using a multiple logistic regression model, the relative risk for the occurrence of relapse based on various cognitive factors was estimated. Only the preservative error in the Wisconsin Card Sorting task yielded a significant odds ratio of 2,46. Rund et al. (51) implemented a prospective design, with yearly cognitive evaluations in 207 patients (111 remained after the first year), on five basic cognitive indices, namely Working Memory (WM), Executive Function (EF), Verbal Learning (VL), Impulsivity (Im) and Motor Speed (MS). WM and VL were found to correlate with relapses within the first year.
Tao et al. (54) conducted monthly cognitive assessments on 110 patients for one year, focusing on working memory. Of the 17 cognitive markers examined at baseline, and the 4 differences (i.e. value at baseline - value at visit x), only one, the deterioration of performance in the Letter-Number Sequence (LNS) paradigm was significantly associated with relapse risk, in a binary logistic regression model. Birnbaum et al. (52, 53), retrospectively analyzed search activity and Facebook posting habits in two separate studies, using data from 51 and 42 patients respectively. In the first (52), they assembled the entire spectrum of Facebook activity of the patients, which included messages, posts, likes, photographs, shares and comments. Words related to anger, swear or death were used significantly more frequently during pre-relapse periods. The authors used Support Vector Machine (SVM) classifiers, to label periods of data as relapse or relative health data. The best performing model achieved a specificity of 0.38 and specificity of 0.71. In the second study, they focused on internet search activity, utilizing random forests, SVMs, and gradient boosting. The best performing model (gradient boosting) had a classification accuracy of 0.65 (AUC = 0.71), which significantly differed from chance level. Most important features revolved around search length and use of words from categories such as “sexual”, “anger” or “sadness”.
3.5 Wearables subgroupIn this section we analyze results from an emergent area of research, which concerns the application of machine learning models on passively collected data from wearable devices such as smartwatches, but also from smartphones, to identify sudden pattern breaks constituting the signature of impending relapse. In Table 8, the findings of 8 (19%) such studies are illustrated.
In (55, 57, 59, 61) the CrossCheck data set is analyzed, consisting of 1-year, continuous, passively collected data via smarthones (accelerometer, GPS location, speech frequency and duration, number of calls and others) from SCZ patients. Ben-Zeev et al. (55) published data from only 5 patients for demonstrative purposes. Notably, abrupt changes in GPS activity were found in a patient, who stopped spending time in their identified primary location (likely their home). The temporal onset of this sudden behavioral shift corresponded to the first signs of psychotic symptomatology. Adler et al. (57), Zhou et al. (59) and Lamichlane et al. (61) utilized various machine learning tools, such as encoder-decoder neural networks, partition around medoids, gaussian mixture models, balanced random forests, and easy ensemble models, to classify hourly data as coming from a relapse or non-relapse period. All these approaches yielded significantly different from chance level prediction accuracies, but none of the values were absolutely very high (F2 scores were calculated to assess model performance and the maximum score was 0.23 for one of the models used in (59), which can be interpreted as yielding medium predictive power). Barnett et al. (56) followed a very similar approach, using the Beiwe app to collect 15 mobility (e.g. accelerometer or GPS location) and 16 sociability features (e.g. number of phone calls) from 15 patients for 3 months. The number of anomalies, or extremely abnormal marker values was compared between patients who relapsed and those who did not. While anomaly rate was found to be 71% higher in relapsed patients, only 3 instances of relapse were recorded. Henson et al. (58) implemented a similar data acquisition procedure, recruiting 83 patients (63 provided sufficient data) for 3-12 months. Paired anomalies (i.e. anomalies occurring simultaneously in more than one feature) were found to distinguish between the state of relapse and non-relapse with a positive predictive power of 60% and a negative predictive power of 94%. Cohen et al. (62) use a similar anomaly detection approach as in (58), collecting data from 76 patients, who used a smartphone with the mindLamp app installed for 12 months. Anomaly rate was 2.12 times higher in the months preceding relapse. Of the identified anomalies, 13 or 6.9% were found to correspond to relapse. Zlantinsi et al. (60) present preliminary results (24 patients, monitored for up to 2 years) from the ePrevention study, which aimed at using passively collected heart rate, accelerometer, gyroscope and sleep data, via the use of smartwatches, to predict relapse. Various autoencoder neural network architectures were utilized with the best performing model being a Convolutional Neural Network model, which yielded an PR AUC of 0.76 (significantly different from 0.68 for random classification).
4 DiscussionIt has been proposed that understanding the root cause of psychosis as a state may constitute a more well-defined research objective compared to the aim for a grand unification of heterogenous diagnostic constructs, such as SCZ and BP (63). Moreover, given the burden of recurrent psychosis episodes on individual patients, any progress related to relapse prediction would not only push the frontiers of current knowledge, but would also come with immense clinical benefits, allowing for early and thus more effective intervention (64). To develop a concrete strategy for individualized relapse prediction, one must first collectively assess results from diverse fields of research.
4.1 Genetic biomarker subgroupRegarding genetic biomarkers, we found that 3 (21, 22, 24) of the 4 studies used traditional regression models to correlate a compiled score with number of admissions in a cross-sectional design, while 1 (23), used a prospective design and the ROC to evaluate predictive power of semi conserved networks of co-expressed genes for the state of relapse. While at first glance results from (24) seem to contradict with (21), since in (21) the genetic risk profile correlates with relapse risk whereas in (24) it does not, it must be taken into consideration that in (21), although the sample size is larger, a lower threshold is set for inclusion of single nucleotide polymorphisms in the overall score, which could lead to interpretability or overfitting issues. In (22) authors also report significant telomere shortening in relapsed individuals, however, the implication of telomeres in a wide variety of diseases may pose specificity concerns. Overall, it could be stated that although there is promise in genetic research related to relapse, it is not yet feasible to concretely predict disease trajectory based on genetic markers.
4.2 Blood-based biomarker subgroupWe now turn to the most thoroughly studied category of biomarkers, the diverse set of blood-based biomarkers, encompassing everything from inflammatory cytokines or hormones regulating metabolism, to nerve growth factors. Interestingly, BDNF and NGF growth factors, hypothesized to play a key role in SCZ pathogenesis (65), were conclusively ruled out as predictors of relapse, in three separate prospective studies (30, 34, 39), providing possibly the most concrete, albeit negative result covered in the present systematic review. In (25, 28, 29) the levels of various phospholipids, kynurenines and the Total Antioxidant Capacity are compared between relapsed and FEP patients at one point in time. All three studies yielded the same qualitative results. While the examined biomarkers differed significantly between patients and healthy controls, they did not differ between the relapse and FEP patient groups. Although the variation of these biomarkers in time was not assessed and thus their implication in relapse cannot be ruled out conclusively, evidence points to them being potential trait, but not state markers for relapse. In studies focusing on cytokines (27, 33, 37), only the positive correlation of Interleukin-6 with relapse is replicated (33, 37). In (37) Interferon-γ pre and post relapse values within the group of patients who relapsed were found to differ significantly. Comparisons and results from (37) are representative of the complexity of the studied phenomenon. While direct comparisons between the relapse and non-relapse groups showcased no differences in group means, longitudinal variation of IL-6 and INF-γ was found to correlate with the temporal onset of psychotic episodes. It is noteworthy that baseline values were not predictive of relapse, for neither of the two cytokines. Nevertheless, IL-6 especially should be considered as a valid candidate state marker, while it should be highlighted that it is also the most widely accepted blood-based trait marker for psychotic disorders (66, 67). Significant results are also reported in (32, 35, 36), examining white blood cells (and various ratios among subtypes), Unconjugated Bilirubin, and Vitamin D (and PTH) respectively. In the first two studies, differences in time were observed between relapse and remission, whereas in (36), relapsed patients were compared to FEP. Even though (32, 35) both report significant results, potential study limitations should not be overlooked. Firstly, specificity issues arise in both studies, since white blood cell, but also bilirubin levels may be abnormal in a wide variety of transient or chronic syndromes not related to psychiatry. Moreover, concerns related to absolute biomarker values should be considered. In (35) for instance, mean UCB levels at both relapse and remission (0.38 +/- 0.19 and 0.34 +/- 0.16), although different to each other, lie well within the normal range of 0.2 – 0.8 mg/Dl (68). Lastly (26, 31, 38), measure multiple markers, which form panels, consisting of numerous individual substances (26), a single combined index [Allostatic Load, (31)], or weighted networks (38). All three papers report significant results while the most important predictors of relapse were phenylalanylphenylalanine, systemic blood pressure, leptin, proinsulin, b-cellulin and transforming growth factor-α. However, the effects of another crucial confounding factor, which is antipsychotic treatment, should be considered, as is evident in (26). While their model consisting of 12 proteins and molecules predicted the relative time to relapse with 94.5% accuracy, a model using just BMI as a predictor achieved an accuracy of 83.4%. This could be explained by the fact that included proteins such as leptin or insulin relate to metabolism, which is affected by antipsychotic treatment. Treatment adherence has already been isolated as the single most significant clinical predictor of relapse. Therefore, it would not be irrational to hypothesize that individuals showing good compliance with medication gain more weight as a medication side effect, and experience significantly less relapses. But perhaps the most substantial limitation of the blood-based relapse prediction approach pertains to the estimation of the temporal onset of psychotic relapse. It is massively impractical to monitor blood-based markers at a sufficient frequency, to capture the onset of a phenomenon, which is acute, yet may happen at any point in time over an extremely long period. To conclude, even though there are some promising results that warrant further investigation, it cannot yet be stated that relapse prediction is possible by blood examination.
4.3 Neuroimaging/neurophysiological biomarker subgroupWith respect to neuroimaging, the most prominent study design revolves around predicting long term clinical outcome based on data from a baseline, structural MRI scan (40, 41, 43, 46, 47). In (40, 41), post-hoc comparisons between good and bad outcome patients yielded significant differences between the two groups in ventricular volume (higher in bad outcomes) and grey matter volume in the right hippocampus (lower in bad outcomes). However, studies utilizing classification models, namely SVM [in (43)] and Cox regression [in (46)] failed to achieve above chance level predictive accuracy. In (47), the Local Gyrification Index (measure of cortical folding, characteristically higher in neurodevelopmental anomalies) was significantly higher in relapsed patients, at specific loci, including the left precuneus and cuneus cortex, the isthmus cingulate gyrus, the pericalcarine cortex, and the lingual gyrus. Of the remaining 5 studies, 3 included functional MRI scans, 2 (48, 49) of which were resting state, whereas in one (42), the scans were obtained during the Semantic association retrieval task (SORT). In (48, 49) measures related to functional connectivity across multiple regions, in particular the Striatal functional connectivity index and the Global functional connectivity strength, were found to be significantly different (SCI was lower, while GFC was higher) in the group of patients experiencing psychotic relapse, hinting at impaired connectivity as a possible generative mechanism for relapse. In (42), only the behavioral component of the task, specifically reaction times, and not the imaging data, yielded significant results for relapsed versus FEP patients (relapsed patients responded slower). In a prospective PET scan study (44), temporal changes in striatal dopamine levels were found to correlate with psychotic relapse, potentially implicating dopamine autoregulation, as another contributing factor in relapse. Finally, in the only electroencephalography (EEG) study, it was shown that the amplitude of an event related potential called the phonetic Mismatch Negativity (induced by vowel change) (45), was positively correlated with re-hospitalization frequency and medication dose increase (the amplitude was significantly higher in 6 of the 9 studied electrodes in the vowel change case, but in 0 of 9 electrodes in the consonant change case. To sum up, significant group differences seem to implicate structural anomalies (e.g. ventricular volume growth), functional connectivity dysregulation (lower SCI, higher GFC), aberrant dopamine autoregulation, as well as automatic speech processing dysfunction as potential factors in relapse pathogenesis. However, it should be highlighted that the only two studies attempting a priori classification both failed to achieve accurate predictions. Furthermore, the same issues and confounders described for blood-based biomarkers are present in neuroimaging studies. Even though signals originating from neuroimaging scans are information-rich, it is impossible to obtain them at a frequency necessary to capture relapse onset, and thus no estimate for the temporal onset can even be formulated. To conclude, no tangible clinical benefit for relapse prediction can be claimed from neuroimaging/neurophysiological biomarker monitoring.
4.4 Cognitive/behavioral/internet activity biomarker subgroupIn the cognitive/behavioral biomarker section, 3 studies (50, 51, 54) pertaining to typical cognitive assessments were analyzed. Generally, they consisted of various tasks evaluating five main pillars of cognitive function, defined as Working Memory (WM), Executive Function (EF), Verbal Learning (VL), Impulsivity (Im) and Motor Speed (MS) (51). In two of these studies (51, 54) working memory deficits were the only significant predictor of relapse, while in the third (50) it was executive function hindrance, captured with the preservative error rate in the Wisconsin cart sorting task. Inconsistent results could be attributed to confounders, mainly heterogeneity in clinical manifestations, or different treatment regimens and adherence. All three studies yielded one statistically significant predictor among almost 20, which does not support detailed cognitive assessments as viable tools for efficient relapse prediction. In (52, 53) authors used data from Facebook (i.e. messages, posts, comments) and internet search activity, known to exhibit distinct patterns and tried to detect pattern breaks related to relapse. The authors report highly significant differences in the frequency of use for significant words related to “anger”, “death” or “sadness”. They also utilized supervised machine learning techniques, such as SVMs or gradient boosting, to classify a random time series as coming from a period of relapse or remission. These analyses yielded predictions that significantly differed from chance level but were not highly accurate. For reference, AUC values range from 0.5 to 1, with 0.5 representing chance level accuracy and 1 indicating complete certainty. Anything over 0.8, which was not reached, is generally considered as accurate, whereas values lying in the 0.7-0.8 range (0.71 was the value obtained from the best performing model), are considered adequate depending on context.
4.5 Wearables biomarker subgroupRegarding the emerging field of relapse prediction via analysis of passively collected smartphone and smartwatch data, our search identified 8 studies conducted in the past 7 years. In (55, 57, 59, 61) various tools are used to analyze the data from the CrossCheck data set. Smartphones with the CrossCheck app installed were provided to all subjects and were utilized to gather a variety of mobility and sociability features, for instance GPS location, accelerometer data, or phone call duration and number of texts. After showcasing the potential of this approach in (55), whe
留言 (0)