Identifying factors associated with periodontal disease using machine learning

   Abstract 

Objective: This study aimed to identify combinations of chronic conditions associated with the presence and severity of periodontal disease (PD) after accounting for a series of demographic and behavioral characteristics in a nationally representative sample of US adults. Materials and Methods: A cross-sectional study of the 2013–2014 National Health and Nutrition Examination Survey (n = 4555). Outcome measure: PD using clinical attachment loss (measured as none, mild, moderate, or severe). The main independent variables were self-reported chronic conditions, while other covariates included demographic and behavioral variables. Classification and regression tree analysis was used to identify combinations of specific chronic conditions associated with PD and PD with higher severity. Random forest was used to identify the most important variables associated with the presence and severity of PD. Results: The prevalence of PD was 77% among the study population. The percentage of those with PD was higher among younger and middle-aged (< 61 years old) than older (> 61 years old) adults. Age and education level were the two most important predictors for the presence and severity of PD. Other significant factors included alcohol use, type of medical insurance, sex, and non-white race. Accounting for only chronic conditions, hypertension and diabetes were the two chronic conditions associated with the presence and severity of PD. Conclusions: Sociodemographic and behavioral factors emerged as more strongly associated with the presence and severity of PD than chronic conditions. Accounting for the co-occurrence for sociodemographic and behavioral factors will be informative for identifying people vulnerable to the development of PD.

Keywords: Machine learning, periodontal medicine, periodontal-systemic disease interactions, periodontitis, risk factor(s)


How to cite this article:
Alqahtani HM, Koroukian SM, Stange K, Schiltz NK, Bissada NF. Identifying factors associated with periodontal disease using machine learning. J Int Soc Prevent Communit Dent 2022;12:612-20
How to cite this URL:
Alqahtani HM, Koroukian SM, Stange K, Schiltz NK, Bissada NF. Identifying factors associated with periodontal disease using machine learning. J Int Soc Prevent Communit Dent [serial online] 2022 [cited 2022 Dec 31];12:612-20. Available from: 
https://www.jispcd.org/text.asp?2022/12/6/612/366471    Introduction Top

Periodontal disease (PD) is a chronic multifactorial inflammatory disease. It is associated with dysbiosis plaque biofilms resulting in chronic destructive inflammatory responses.[1] In its mildest form, it affects 45%-50% of adults.[2] Severe PD is the sixth most common disease, and it is estimated to affect 11.2% of the global adult population.[2]

Several risk factors and indicators are shared between PD and several systemic conditions, including cardiovascular disease, obesity, rheumatoid arthritis, prostate cancer, stroke, cognitive impairment, and hypertension.[3] Suggested mechanisms include bacteremia through the epithelium lining of periodontal pockets and elevation of systemic inflammatory cytokines.[4]

Prior studies have confirmed the relationship between systemic conditions and PD.[3],[4],[5],[6] However, the growing presence of people living with multiple chronic conditions is likely to shift priorities in health care management toward conditions affecting systemic health, placing PD care at a lower priority.[7],[8],[9] Consequently, multiple chronic conditions are likely to impact their periodontal condition negatively. Assessing the relationship between PD and a given chronic condition does not account for common co-occurring chronic conditions, underestimating the compounding effects of other chronic conditions.[10] Therefore, we hypothesize that there are combinations of specific chronic conditions associated with the presence of PD and PD with higher severity, accounting for a series of demographic and behavioral characteristics.

To the best of our knowledge, no previous studies have identified co-occurring chronic conditions associated with PD or its severity. Identifying the most prevalent combinations of chronic conditions that are associated with the presence and severity of PD will not only elucidate etiologic and pathophysiologic pathways but will also help to raise awareness among healthcare providers regarding PD, and prompt them to recommend periodic periodontal checkups for people who are most vulnerable to the development or worsening of PD.

This study aimed to identify combinations of chronic conditions that are associated with the presence and severity of the PD, after accounting for a series of demographic and behavioral characteristics.

   Materials and Methods Top

This cross-sectional study uses the publicly available data from the 2013–2014 National Health and Nutrition Examination Survey (NHANES), the most recent years with data on PD, using full-mouth periodontal examination protocol. NHANES is a cross-sectional survey intended to observe the overall health and nutritional status of a nationally representative sample of the U.S population. This study was deemed research not involving human subjects by the Case Western Reserve University Institutional Review Board (# 2021-0469)

Data source and study population

NHANES conducts health interviews as well as examinations ranging from laboratory tests to physiological measurements. NHANES interviews and examinations are performed by trained medical personnel, and the information collected for the NHANES surveys is done via a multistage probability design. Our study population included 4,669 individuals.

Variables of interest

Outcome variable

Our outcome variable was PD (present/absent), using clinical attachment loss (CAL) according to the 2018 American Academy of Periodontology (AAP) classification.[11] Those categorized as PD were further grouped in the category of mild (CAL=1–2 mm), moderate (CAL=3–4 mm), or severe PD (CAL > 5 mm).

Independent variables

Self-reported chronic conditions indicating whether a physician ever told the individual that he or she had, including: hypertension, hyperlipidemia, diabetes, arthritis, coronary heart disease, overweight, stroke, asthma, chronic obstructive pulmonary disease, emphysema, chronic bronchitis, cancer, liver condition, thyroid problems, psoriasis and weak or failing kidneys.

Other variables of interest included: age (<30, 30–34, 35–39, 40–44, 45–49, 50–54, 55–59, 60–64, 65–69, 70–74, 75–79, >80), sex (Male, Female), race/ethnicity (Hispanic, White, Black, Other), marital status (married, widowed, divorced, never married), education (< 9th grade, 9-12th grade, high school graduate or equivalent, some college or associates degree, College graduate or higher), the ratio of family income to poverty (<1, 1–1.99, 2–2.99, 3–3.99, 4–4.99, >5), smoking status (Current, Former, None), alcohol use (Yes, No), body mass index (underweight (<18 kg/m2), normal/ overweight (18.1–30 kg/m2), obese (>30 kg/m2)), vigorous recreational activities (Yes, No), insurance status (Medicare, Medicaid, Private insurance, and All other) and whether their insurance plan provides coverage for dental procedures (Yes, No). For variables with missing values of more than 1% (insurance status, ratio of family income to poverty, and body mass index), we created a missing category. We excluded observations when missing values amounted to less than 1% missing. The three variables that fit that category were education, marital status, and smoking behavior. Our final sample size includes 4555 participants after excluding 114 participants with missing data in our variables of interest.

Statistical analysis

We began our analysis by conducting a descriptive analysis of our study variables. Regarding our outcome variable, we first examined correlates of PD as present/absent; and among those with PD, we identified factors associated with moderate/severe PD. Next, we identified the most common combinations of chronic conditions associated with the presence and severity of PD using a conditional inference regression tree (CTree), described below.[12] Although CTree is similar to the classification and regression tree (CART),[13] it uses a statistical significance test as the splitting criterion. We addressed our aim in two steps. First, we identified which combinations of chronic conditions, sociodemographic and behavioral variables are associated with PD. Subsequently, limiting our analysis to respondents with PD, we will identify the most common combinations of chronic conditions most highly associated with moderate/severe PD.

CTree, a machine learning method, is a recursive binary partitioning of the data with each variable’s ability to be considered a potential split. Each node can split and form two child nodes, which can successively split and create two more child nodes. This process continues until a node can no longer be split given the stopping criteria, thereby creating a terminal node. The following stopping criteria (a maximum tree depth of five splits, a minimum terminal node size of 100 participants, and a p-value threshold of 0.001) were used. For the CTree analysis of PD severity using only chronic conditions with a minimum of 500 positive cases, i.e., hypertension, hyperlipidemia, diabetes, arthritis, asthma, bronchitis, cancer, liver condition, and thyroid problems, 0.001 might be too strict given the smaller sample size of the PD subpopulation; therefore, we used an alpha of 0.05.

We built the CTree model by partitioning the data into training and test datasets. We used the validated dataset to test the accuracy of the CTree model. In addition, we used a random forest approach to select the best predictors to partition the outcome at each node. Random forest is a bootstrap aggregation method that builds a tree using a random variable selection.[14] We created 3,000 trees and sampled three of the explanatory variables at each node split for each random forest model. Next, we compared the two models, CTree, and Random forest, to check the agreement on the most important predictors for PD and moderate/severe PD. We used R version 3.6 and the “partykit” (CTree), “randomForest” (random forest) packages.

   Results Top

The characteristics of the study population by presence and severity of PD are presented in [Table 1]. Of the 4555 participants, 77.4% had PD. Among those with PD, 27.0% presented with mild PD, 54.3% with moderate PD, and 18.7% with severe PD. The percentage of those with PD was higher among middle-aged (two-thirds) than older individuals. There were more men in the severe PD group. Across all PD groups, a higher proportion of participants were married, non-smokers, normal/overweight and obese, and those without vigorous physical activity. Nearly 30% of study participants with severe PD had missing values on their type of health insurance. About 96% of the study population across all PD groups did not have dental insurance.

Findings from [Table 2] showed that hypertension, hyperlipidemia, diabetes, arthritis, and overweight were the most common chronic conditions across all PD groups. Among those without PD, coronary heart disease, stroke, chronic obstructive pulmonary disease, and emphysema were the most common chronic conditions. Out of all chronic conditions, hypertension, hyperlipidemia, diabetes, arthritis, overweight, and asthma had a minimum of 500 positive cases of PD.

[Figure 1] shows the CTree analysis for the presence and severity of PD using chronic conditions, sociodemographic and behavioral variables. The tree shows the different distribution of PD with different combinations of sociodemographic and behavioral variables. The highest percentage of PD was observed among participants who had a non-missing value for alcohol consumption, were privately insured, or had missing values on type of insurance and were 36 years of age or less (nodes 1, 2, 3, 4, and 5), leading to over 90% of respondents with this combination of sociodemographic and behavioral variables reported PD. Conversely, the lowest prevalence of PD was observed among participants who were 61 years of age or younger and had missing values on alcoholic consumption (nodes 1, 2, and 8), leading to about 55% prevalence of PD. The highest percentage of moderate/severe PD (about 90%) was observed among those who had high school graduate education levels or less and were male (nodes 1, 2, and 3). When education levels were some college degrees or higher, co-occurred with an age of 50 years or younger, white race placed them at low risk of moderate/severe PD at nearly 45% (nodes 1, 5, 6, and 7) compared to other types of races.

Figure 1: Conditional inference regression tree analysis to predict the presence of PD (left) and moderate/severe PD among those with PD (right). PD, Periodontitis; No_PD, No Periodontitis; Mild_PD, Mild Periodontitis; Severe_PD, Severe Periodontitis; Alcohol_YN, Alcohol consumption (Yes, No, missing values)

Click here to view

To address our primary research question, we subsequently limited our CTree analysis for the presence and severity of PD to chronic conditions only [Figure 2]. [Figure 2] shows the highest percentage of PD included individuals with no hypertension, no arthritis, and no diabetes (nodes 1, 5, 7, and 9), leading to over 80% of participants with a combination of conditions reported PD. Although the prevalence of PD among participants with different combinations of the chronic condition is generally high, the lowest prevalence of PD was observed among participants with blood pressure and arthritis (nodes 1, 2, and 3), leading to a prevalence of PD slightly higher than 60%. In addition, [Figure 2] shows that the highest percentage of moderate/severe PD (over 80%) was observed among those with diabetes (nodes 1 and 2). In contrast, the lowest percentage of moderate/severe PD (about 60%) was observed among those who had no diabetes but had hypertension and asthma (nodes 1, 3, 4, and 5, leading to the second bar at the left-hand side). The prevalence of moderate/severe PD ranged between sixty and slightly higher than eighty percent among participants with different combinations of the chronic condition.

Figure 2: Conditional inference regression tree analysis to predict the presence of PD (left) and moderate/severe PD among those with PD (right) using only chronic conditions. PD, Periodontitis; No_PD, No Periodontitis; Mild_PD, Mild Periodontitis; Severe_PD, Severe Periodontitis; BP, Hypertension; DM, Diabetes Mellitus

Click here to view

Random Forest analysis shows which variables are the most important in improving the model’s accuracy from a bootstrap sample of 3000 trees for each outcome [Figure 3] and [Figure 4]. Using chronic conditions, sociodemographic and behavioral variables in [Figure 3], the following variables rank in the top three most important variables for PD: Age, alcohol use, and health insurance. For moderate/severe PD, age, education level, type of health insurance, the ratio of family income to poverty, gender, and race are the top six most important variables using chronic conditions, sociodemographic and behavioral variables. The frequent appearance of these variables in the CTree models in [Figure 1] validates those models and demonstrates the importance of sociodemographic and behavioral variables explaining the presence and severity of PD. We show logistic regression analysis for the presence of PD [Table 1S] and moderate/severe PD [Table 2S] in the supplemental material.

Figure 3: Random forest plot ranking the factors that most influence the distribution of PD (left plot) and moderate/severe PD (right plot). BP, Hypertension; CHD, Coronary Heart Disease; COPD, Chronic Obstructive Pulmonary Disease; DM, Diabetes; Alcohol_YN, Alcohol consumption

Click here to view

Figure 4: Random forest plot ranking the factors that most influence the distribution of PD (left plot) and moderate/severe PD (right plot), using only chronic conditions. BP, Hypertension; DM, Diabetes

Click here to view

Table 2S: Logistic regression model predicting moderate/severe PD among those with PD

Click here to view

In [Figure 4], the Random Forest shows the top three most crucial variables for PD and moderate/severe PD using only chronic conditions. Hypertension, arthritis, and diabetes were the critical predictors for PD. For moderate/severe PD, diabetes, hypertension, and asthma are the most important variables. As most of these variables frequently appear in the CTree models in [Figure 2], they further validate those models. It also demonstrates the importance of hypertension and diabetes as factors explaining the presence and severity of PD, respectively.

   Discussion Top

We analyzed the presence and severity of periodontal disease using a machine learning approach on data for a US representative sample of people from the NHANES. The higher prevalence of PD compared to previous estimates is likely due to the decrease in the CAL threshold with the new definition of PD.[11] In addition, the most recent case definition of PD used the CAL value on two non-adjacent teeth as primary criteria. In contrast, the old definition of PD was based on the CAL values and/or probing depth values on adjacent teeth. For comparison, we show the CTree analysis for the presence of PD, using old definition, in the supplemental material [Figure 1S]. Furthermore, the full-mouth periodontal examination protocol in the 2013–2014 NHANES allowed us to examine all teeth, which likely increases the accuracy of diagnosing and classifying PD compared with the use of partial-mouth periodontal examination protocols in earlier NHANES.[15]

Surprisingly, when we accounted for covariates in our CTree models, we showed that sociodemographic and behavioral variables were better predictors than chronic conditions for PD and PD severity. Age and education level were the most crucial variable for PD and moderate/severe PD, respectively. This can be explained by the high proportion of middle-aged adults in our study, who are less likely to have chronic conditions than older adults. In addition, the low prevalence of PD among older participants is likely because our sub-sample of NHANES for participants with periodontal data may be biased, representing a healthier subgroup of older people. For example, the 2013–2014 NHANES included only dentate participants; therefore, the edentate people who may have lost their teeth to PD, may not have participated in the oral health module of the NHANES, and were therefore not represented in our sample. In addition, dentate older people who are more likely to lose their teeth due to periodontal disease may have fewer teeth with better periodontal conditions, explaining the low prevalence of PD among older people. Moreover, the use of self-reported data may underestimate the true prevalence of chronic conditions due to participants forgetting or not being diagnosed. Furthermore, about 45% of the study population had a low level of education, which may put their periodontal care at a lower priority. Other significant covariates included alcohol use, type of medical insurance, sex of participants, and participants’ race. For example, two-thirds of the study population consumed alcohol compared to only one-fifth who were current smokers. This combination of sociodemographic and behavioral factors identified people who are at risk for the development of PD, given that PD is a prevalent disease that starts at an earlier age, regardless of the medical condition.

Given that chronic conditions did not appear in the models, we further developed CTree models including only chronic conditions to identify the characteristics of medically vulnerable people for the development of PD. In these models, hypertension and diabetes were the most important chronic conditions associated with the presence and severity of PD, respectively, as evidenced by the first splitting variable in the classification tree. Arthritis and asthma were the other critical chronic conditions that emerged from our machine learning approach.

Our results do not contradict the current knowledge on the impact of several chronic conditions on the development of PD. However, they highlight the sociodemographic and behavioral factors that may play a critical role in the development of PD at an earlier age and probably before the chronic conditions occurred. Except for a few chronic conditions, the prevalence of chronic conditions among study populations is very low; therefore, sociodemographic and behavioral factors have remarkably emerged among vulnerable people for PD regardless of their chronic condition.

These findings have important implications for clinical practice and research in the periodontal field. In clinical practice, identifying the most common combinations of sociodemographic and behavioral variables associated with the presence and severity of PD will elucidate etiologic, pathophysiologic, and behavioral pathways, beyond plaque biofilm. In addition, it will also help raise awareness among periodontists relative to the importance of including sociodemographic and behavior factors in their evaluation, specifically among middle-aged adults who are less prone to chronic conditions. Moreover, it prompts periodontists to recommend a periodic periodontal checkup for the most vulnerable people to the development or severity of PD earlier in age, regardless of chronic conditions. As a result, patient-centered care rather than disease-centered care is crucial in managing potential etiological factors involved in the development and severity of PD.

Regarding research implications, these findings indicate that accounting for sociodemographic and behavioral factors in addition to the co-occurrence for chronic conditions will be critical in evaluating PD. Research in this area had assessed the relationship between PD and a given chronic condition, but not accounted, for the common co-occurring chronic conditions, thus underestimating the compounding effects of other chronic conditions. As a result, the traditional focus on a single chronic condition limits our knowledge relative to the impact of co-occurring chronic conditions, sociodemographic and behavioral factors on PD.

To our knowledge, this is the first study to evaluate the patterns of co-occurrence of several factors that are associated with the presence and severity of PD using CTree, a novel machine learning method. The innovation of using this analytic approach allows us to discover the emerging combinations of the study factors without any prior hypotheses. CTree can capture the complex relationship and produce an easily interpretable decision tree model to identify specific combinations of the included factors highly associated with the presence and severity of PD. Random forest uses a subset of our data and bootstrapping to measure and rank the most important variable for our outcome. Although random forest cannot identify the most common combination of variables associated with PD, it can determine whether the top identified predictors agree with the most important variables that appear in CTree models. Therefore, both machine learning methods will detect the interaction and nonlinear relationship of our variables automatically. Another important strength of this study is the availability of several chronic conditions, and sociodemographic and behavioral factors in a nationally representative sample of the U.S population, allowing us to obtain deeper insight into the possible etiologies involved in developing PD.

There are several limitations of our study. First, with the use of cross-sectional data, it was not possible to capture worsening chronic conditions or PD. Second, all the chronic conditions are self-reported data, and only a small percentage of the population presented with chronic conditions. Third, CART produces a single tree, whereas Random Forest is a bootstrap aggregation method that produces multiple trees. However, many variables identified in CTree were also identified as the most important ones in Random Forest. Fourth, our models recognized that sociodemographic and behavioral factors were stronger predictors for PD. However, for older people, for example, more expanded models including different types of variables such as tooth loss and oral hygiene may reflect an accurate estimate and identify additional predictors for PD and moderate/severe PD.

   Conclusions Top

Sociodemographic and behavioral factors were better predictors for periodontal disease than chronic conditions. Hypertension and diabetes were the most critical chronic conditions that predict the presence and severity of the periodontal disease. Compared to chronic conditions, accounting for the co-occurrence of sociodemographic and behavioral factors is more informative when identifying people who are at heightened risk to develop PD.

Acknowledgements

None.

Financial support and sponsorship

None.

Conflicts of interest

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. The authors report no conflicts of interest related to this study.

Author’s contributions

Not applicable.

Ethical policy and institutional review board statement

This study was deemed research not involving human subjects by the Case Western Reserve University Institutional Review Board (#2021-0469).

Patient declaration of consent

Not applicable.

Data availability statement

Not applicable.

 

   References Top
1.Papapanou PN, Sanz M, Buduneli N, Dietrich T, Feres M, Fine DH, et al. Periodontitis: Consensus report of workgroup 2 of the 2017 world workshop on the classification of periodontal and peri-implant diseases and conditions. J Clin Periodontol 2018;45(Suppl 20):S162-S170.  Back to cited text no. 1
    2.Global Burden of Disease Study 2013 Collaborators. Global, regional, and national incidence, prevalence, and years lived with disability for 301 acute and chronic diseases and injuries in 188 countries, 1990–2013: A systematic analysis for the global burden of disease study 2013. Lancet 2015;386:743-800.  Back to cited text no. 2
    3.Hajishengallis G, Chavakis T Local and systemic mechanisms linking periodontal disease and inflammatory comorbidities. Nat Rev Immunol 2021;21:426-40.  Back to cited text no. 3
    4.Hajishengallis G, Chavakis T, Lambris JD Current understanding of periodontal disease pathogenesis and targets for host-modulation therapy. Periodontol 2000 2020;84:14-34.  Back to cited text no. 4
    5.Wu CZ, Yuan YH, Liu HH, Li SS, Zhang BW, Chen W, et al. Epidemiologic relationship between periodontitis and type 2 diabetes mellitus. BMC Oral Health 2020;20:204.  Back to cited text no. 5
    6.Sanz M, Marco Del Castillo A, Jepsen S, Gonzalez-Juanatey JR, D’Aiuto F, Bouchard P, et al. Periodontitis and cardiovascular diseases: Consensus report. J Clin Periodontol 2020;47:268-88.  Back to cited text no. 6
    7.Mariotti A, Hefti AF Defining periodontal health. BMC Oral Health 2015;15 Suppl 1:S6.  Back to cited text no. 7
    8.Hajat C, Stein E The global burden of multiple chronic conditions: A narrative review. Prev Med Rep 2018;12:284-93.  Back to cited text no. 8
    9.Boersma P, Black LI, Ward BW Prevalence of multiple chronic conditions among US adults, 2018. Prev Chronic Dis 2020;17:E106.  Back to cited text no. 9
    10.Alqahtani HM, Koroukian SM, Stange K, Bissada NF, Schiltz NK Combinations of chronic conditions, functional limitations and geriatric syndromes associated with periodontal disease. Fam Med Community Health 2022;10:e001733. doi: 10.1136/fmch-2022-001733. PMID: 35998996; PMCID: PMC9403150.  Back to cited text no. 10
    11.Tonetti MS, Greenwell H, Kornman KS Staging and grading of periodontitis: Framework and proposal of a new classification and case definition. J Periodontol 2018;89 Suppl 1:159-72.  Back to cited text no. 11
    12.Hothorn T, Hornik K, Zeileis A Unbiased recursive partitioning: A conditional inference framework. J Comput Graph Stat 2006;15:651-674.  Back to cited text no. 12
    13.Breiman L, Friedman JH, Olshen RA, Stone CJ Classification and Regression Trees. Routledge; 2017. doi: 10.1201/9781315139470.  Back to cited text no. 13
    14.Yoo W, Ference BA, Cote ML, Schwartz A A comparison of logistic regression, logic regression, classification tree, and random forests to identify effective gene-gene and gene-environmental interactions. Int J Appl Sci Technol 2012; 2:268.  Back to cited text no. 14
    15.Eke PI, Thornton-Evans GO, Wei L, Borgnakke WS, Dye BA, Genco RJ Periodontitis in US adults: National health and nutrition examination survey 2009–2014. J Am Dent Assoc 2018;149:576-588.e6.  Back to cited text no. 15
    
  [Figure 1], [Figure 2], [Figure 3], [Figure 4]
 
 
  [Table 1], [Table 2], [Table 3], [Table 4]

留言 (0)

沒有登入
gif