Enabling Early Obstructive Sleep Apnea Diagnosis With Machine Learning: Systematic Review


IntroductionBackground

Obstructive sleep apnea (OSA) is a common sleep-related breathing disorder characterized by recurrent episodes of partial (hypopnea) or complete (apnea) upper airway obstruction, repeated throughout sleep. Its prevalence varies significantly according to how OSA is defined (methodology, criteria used such as apnea index, apnea-hypopnea index [AHI], or respiratory disturbance index and threshold definitions) and the population being studied []. The study by Benjafield et al [] estimated that worldwide, 936 million adults aged 30 to 69 years have OSA. Despite this high prevalence, many cases remain undiagnosed and untreated, leading to a decrease in patients’ quality of life and an increased risk of adverse events, with a high impact on morbidity and mortality []. Polysomnography (PSG) is the gold standard test for diagnosing OSA []. However, performing PSG is costly, time-consuming, and labor-intensive. Most sleep laboratories face long waiting lists of patients, as PSG is neither a routine clinical practice nor an absolute suitable screening tool []. Given these limitations, it would be useful to develop a clinical prediction model that could reliably identify the patients most likely to benefit from PSG, that is, exclude OSA diagnosis when the probability is low, establish a priori probability before considering PSG, and prioritize patients in need of PSG according to the probability of a positive result. This idea was backed up by the American Academy of Sleep Medicine (AASM) in its latest guidelines []. Clinical prediction models should be easy to use and easy to calculate. The model must be based on the gold standard and required to be validated, and when used for screening, its purpose depends on whether the path leads to a rule-out or rule-in approach. In the first case, we should have a high-sensitivity model, omitting the need to perform PSG in healthy patients. By contrast, if we chose a rule-in approach, a high-specificity model is needed to select patients with a high probability of having OSA, suitable for undergoing PSG.

Objective

Given these shortcomings, this systematic review aimed to identify, gather, and analyze existing machine learning approaches that are being used for disease screening in adult patients with suspected OSA.


Methods

This systematic review was carried out according to a protocol registered with PROSPERO (International Prospective Register of Systematic Reviews; CRD42021221339).

Search Strategy and Selection Criteria

We searched all evidence available in the MEDLINE database (PubMed) and in Scopus and ISI Web of Knowledge published until June 2020 in English, French, Spanish, or Portuguese. Specific queries were used (with a refresh in October 2021), and a manual search was also performed by using the references of the included studies and pertinent reviews on the topic. In addition, contact with specialists in the field was made to check whether all pertinent information was retrieved. Articles were selected by 3 reviewers independently (blinded to each other’s assessment) by applying the criteria to each title and abstract and then assessed fully. Divergent opinions were resolved through consensus. All processes were performed in Rayyan, a web application and mobile app for systematic reviews [].

Studies including adult patients with suspected OSA (population) that assessed the accuracy of predictive models using known symptoms and signs of OSA (exposure and comparator) and had PSG as the gold standard (outcome) were eligible as per the selection criteria.

Data Extraction

Once the articles were selected, data were extracted into a prespecified Excel spreadsheet and included (1) article information: title, author(s), publication date, country, and journal and (2) methods: study design, setting, study period, type of model, inclusion and exclusion criteria, participant selection, sample size, clinical factors analyzed, diagnostic test analyzed, and potential bias. For each type of model, specific data extraction was created and fulfilled, as demonstrated in the tables in further sections. We have ordered the identified studies by the obtained article results: first, the articles that only developed the algorithm; then the ones that internally validated the algorithm; and finally, the ones that externally validated the prediction algorithm. Within each subsection, we organized the published works by year of publication. Any missing information from the studies is reported in the Results section by “—” (not available), and the best obtained predictive model is marked in italic. Also, if the study applied different machine learning approaches, the clinical factors analyzed, and the discrimination measures are only described for the best obtained model.

Risk of Bias

At 2 points in time, 1 reviewer assessed the risk of bias and applicability by applying the Prediction Model Risk of Bias Assessment Tool (PROBAST) to all the included studies. This is specific for studies developing, validating, or updating diagnostic prediction models. More details are available in the study by Moons et al []. An important aspect needs to be referred to, as this tool states that “if a prediction model was developed without any external validation, and it was rated as low risk of bias for all domains, consider downgrading to high risk of bias. Such a model can only be considered as low risk of bias if the development was based on a very large data set and included some form of internal validation.” This means that the included studies only performing model development will be marked as high risk of bias. For those with internal validation, the risk of bias will depend on the sample size based on the number of events per variable (≥20 ratio between events and variables in development studies and ≥100 participants with OSA for model validation studies). In addition, studies that randomly split a single data set into development and validation are considered as internal validation.


ResultsOverview

We retrieved 6769 articles, 1290 being duplicates. From the 5479 articles, we kept 63 studies that fulfilled the inclusion criteria, as shown in .

The gold-standard examination—PSG—was performed in all the articles assessed, with one also adding the diagnostic part of the split-night exam []. The highest found age was 96 years [], with 54% (34/63) of studies presenting patients with ages of >18 years. To be certain to include all OSA clinical prediction algorithms, we kept the studies that only reported a mean age and SD, with this value being >42, and SD varying between 6 and 16 years. In addition, 10% (6/63) of studies reported an age group <18 years (>14 and >15 years in 2/6, 33% studies and >16 and >17 in 4/6, 66% others, respectively). Regarding the suspicion of OSA, this description was shown in 65% (41/63) of studies, whereas 32% (20/63) introduced OSA suspicion and any other sleep disorder. In addition, we have a study with healthy patients and patients with suspected OSA [] and another that does not specifically state this; instead, the authors write that patients already diagnosed with OSA were excluded from the study. The frequency of occurrence of the various clinical factors analyzed in more than 1 study is shown in .

There were disagreements between the reviewers in both phases, with an overall concordance rate of 78% in the title and abstract screening and 95% in the integral version.

Figure 1. Flow diagram of the study selection process. View this figureTable 1. The frequency of occurrence of the various clinical factors analyzed that appears more than once in all the included studies (n=63).Clinical factors analyzedFrequency of occurrence, n (%)BMI37 (59)Age32 (51)Sex29 (46)Neck circumference25 (40)Snoring14 (22)Epworth Somnolence Scale10 (16)Witnessed apneas8 (13)Waist circumference8 (13)Breathing cessation7 (11)Daytime sleepiness7 (11)Hypertension7 (11)Gasping6 (10)Oxygen saturation6 (10)Oxygen desaturation6 (10)Blood pressure5 (8)Smoking5 (8)Tonsil size grading5 (8)Modified Mallampati score4 (6)Alcohol consumption3 (5)Awakenings3 (5)Diabetes3 (5)Height3 (5)Nocturia3 (5)Restless sleep3 (5)Weight3 (5)Craniofacial abnormalities2 (3)Driving sleepy2 (3)Face width2 (3)Friedman tongue score2 (3)Snorting2 (3)Prediction Models Development

New prediction models were developed in 23 studies, as presented and described in . The most common approach was regression techniques, with logistic (6/23, 26%), linear (6/23, 26%), logistic and linear (6/23, 26%), and logistic regression compared with decision trees and support vector machines (3/23, 13%). In addition, 4% (1/23) of articles produced a Pearson correlation and another (1/23, 4%) produced a decision tree. The oldest model was developed in 1991 and included sex, age, BMI, and snoring whereas in 2020 the predictive variables included besides these were height, weight, waist size, hip size, neck circumference (NC), modified Friedman score, daytime sleepiness, and Epworth Somnolence Scale score. Only 13% (3/23) studies described the study design and period, with 22% (5/23) being retrospective. Regarding OSA definition by PSG, 4% (1/23) study did not report the cutoff, while 17% (4/23) reported an AHI>10 and 17% (4/23) more reported an AHI≥15. The largest sample size was 953, and the smallest was 96 patients with suspected OSA. An overall prevalence of OSA between 31% and 87% was stated, with 9% (2/23) of studies presenting incorrect percentage values [,]. Regarding discrimination measures, although no validation was performed, the best area under the receiver operating characteristic curve (AUC), sensitivity, and specificity were 99%, 100%, and 95%, respectively. It should also be noted that 4% (1/23) has no mention of the best prediction model (not marked in italic in ).

Table 2. Studies’ characteristics of prediction model development without internal or external validation with the best obtained model marked as italic in the respective model column.StudyStudy design; study periodMachine learning approachClinical factors analyzedOSAa definitionSample size, nOSA prevalence, n (%)AUCb, % (95% CI)Sensitivity, % (95% CI)Specificity, % (95% CI)Viner et al [], 1991Prospective; —cLogistic regressionSex, age, BMI, and snoringAHId>10410190 (46)77 (73-82)28 (—)95 (—)Keenan et al [], 1993—Logistic regressionNCe, age, WAf, daytime sleepiness, driving sleepy, oxygen desaturation, and heart rate frequencyAHI>159651 (53)—20 (—)5 (—)Hoffstein et al [], 1993—Linear regressionSubjective impressionAHI>10594275 (46)—60 (—)63 (—)Flemons et al [] 1994—; February 1990 to September 1990Logistic and linear regressionNC, hypertension, snoring, and gasping or chokingAHI>1017582 (46)———Vaidya et al [], 1996—; July 1993 to December 1994Logistic and linear regressionAge, BMI, sex, and total number of symptomsRDIg>10309226 (73)—96 (—)23 (—)Deegan et al [], 1996Prospective; —Logistic and linear regressionSex, age, snoring, WA, driving sleepy, alcohol consumption, BMI, number of dips ≥4%, lowest oxygen saturation, and NCAHI≥15250135 (54)———Pradhan et al [], 1996Prospective; August 1994 to February 1995Logistic regressionBMI, lowest oxygen saturation, and bodily pain scoreRDI>1015085 (57)—100 (—)31 (—)Friedman et al [], 1999Prospective; —Linear regressionModified Mallampati class, tonsil size grading, and BMIRDI>20172————Dixon et al [], 2003—Logistic and linear regressionBMI, WA, glycosylated hemoglobin, fasting plasma insulin, sex, and ageAHI≥309936 (36)91 (—)89 (—)81 (—)Morris et al [], 2008Prospective; —Pearson correlationBMI and snoring severity scoreRDI≥15211175 (83)—97 (—)40 (—)Martinez-Rivera et al [], 2008—Logistic regressionSex, waist-to-hip ratio, BMI, NC, and ageAHI>10192124 (65)———Herzog et al [], 2009Retrospective; —Logistic and linear regressionTonsil size grading, uvula size, dorsal movement during simulated snoring, collapse at tongue level, BMI, and ESSh scoreAHI>5622——Female: 98 (—)Female: 22 (—)Yeh et al [], 2010Retrospective; April 2006 to December 2007Linear regressionBMI, NC, and ESS scoreAHI≥1510183 (82)—98 (—)—Hukins et al [], 2010Retrospective; January 2005 to July 2007Linear regressionMallampati class IVAHI>30953297 (31)—40 (36-45)67 (64-69)Musman et al [], 2011—; December 2006 to March 2007Logistic and linear regressionNC, WA, age, BMI, and allergic rhinitisAHI>5323229 (71)———Sareli et al [], 2011—; November 2005 to January 2007Logistic regressionAge, BMI, sex, and sleep apnea symptom scoreAHI≥5342264 (77)80 (—)——Tseng et al [], 2012—Decision treeSex, age, preovernight systolic blood pressure, and postovernight systolic blood pressureAHI≥15540394 (73)———Sahin et al [], 2014Retrospective; —Linear regressionBMI, WCi, NC, oxygen saturation, and tonsil size gradingAHI>5 and symptoms390————Ting et al [], 2014Prospective; —Logistic regression and decision treesSex, age, and blood pressureAHI≥15540394 (73)99 (—)98 (—)93 (—)Sutherland et al [], 2016—; 2011 to 2012Logistic regression and classification and regression treeFace width and cervicomental angleAHI≥10200146 (73)76 (68-83)89 (—)28 (—)Lin et al [], 2019Retrospective; —Linear regressionSex, updated Friedman tongue position, tonsil size grading, and BMIAHI≥5325283 (87)80 (74-87)84 (—)58 (—)Del Brutto et al [], 2020—Logistic regressionNeck graspAHI≥5167114 (68)62 (54-69)83 (75-89)40 (27-54)Haberfeld et al [], 2020—Logistic regression and support vector machineHeight, weight, WC, hip size, BMI, age, neck size, modified Friedman score, snoring, sex, daytime sleepiness, and ESS score—620357 (58)Male: 61 (—)Male: 86 (—)Male: 70 (—)

aOSA: obstructive sleep apnea.

bAUC: area under receiver operating characteristic curve.

cNot available.

dAHI: apnea-hypopnea index.

eNC: neck circumference.

fWA: witnessed apnea.

gRDI: respiratory disturbance index.

hESS: Epworth somnolence scale.

iWC: waist circumference.

As stated in the Methods section, given that all these models only performed development with in-sample validation metrics, they were all considered at high risk of bias in the Analysis domain (). Concerning the Outcome domain, most studies were marked as high risk, as most of them did not have a prespecified or standard outcome definition. In addition, although some were marked as high risk and one as unclear, most included studies were at low risk of bias regarding the Predictors domain, showing that most of the studies did not include predictors after performing PSG. Most studies (15/23, 65%) were identified as unclear for the Participants domain, as almost all studies did not state study design or exclusion criteria. Assessing the applicability aspect of PROBAST, all studies (23/23, 100%) were at low risk of bias for the Participants domain (all studies included patients with suspected OSA), but several were at high risk of applicability for the Outcome domain (OSA definition is not in concordance with current OSA guidelines).

Table 3. Prediction Model Risk of Bias Assessment Tool (PROBAST) for prediction model development without internal or external validation.StudyRisk of biasApplicabilityOverall
ParticipantsPredictorsOutcomeAnalysisParticipantsPredictorsOutcomeRisk of biasApplicabilityViner et al [], 1991abcKeenan et al [], 1993Hoffstein et al [], 1993Flemons et al [], 1994Vaidya et al [], 1996Deegan et al [], 1996Pradhan et al [], 1996Friedman et al [], 1999Dixon et al [], 2003Morris et al [], 2008Martinez-Rivera et al [], 2008Herzog et al [], 2009Yeh et al [], 2010Hukins [], 2010Musman et al [], 2011Sareli et al [], 2011Tseng et al [], 2012Sahin et al [], 2014Ting et al [], 2014Sutherland et al [], 2016Lin et al [], 2019Del Brutto et al [], 2020Haberfeld et al [], 2020

留言 (0)

沒有登入
gif