A machine learning approach to predict foot care self-management in older adults with diabetes

The study had a cross-sectional design and data were collected between the November 2023-February 2024 and aimed to identify predictors of lower levels of foot care self-management in older adults with diabetes using a machine learning approach. The study was conducted and presented in line with Strengthening the reporting of observational studies in epidemiology (STROBE) guideline.

Study setting and sample

The study was conducted in a public hospital in Istanbul, Turkey, between November 2023 and January 2024. Patients admitted to the endocrinology and metabolic disorders department of the hospital were included in the study. Convenience sampling was used for the study. The inclusion criteria were as follows: age 65 years or older, diagnosis of diabetes for at least six months or more, absence of cognitive disorders, and willingness to be part of the study. Patients who were diagnosed with diabetic foot ulcers and who had a history of diabetic foot ulcers or amputation were excluded from the study. Forty-two patients were excluded from the study because of a history of diabetic foot ulcers. Two patients were excluded from the study because of a history of cognitive impairment. Both patients were diagnosed with dementia by a neurologist, which was confirmed by the patient’s electronic records. Written permission was obtained from the medical research ethics committee of Uskudar University under protocol number 2023-54.

Data collection

Data were collected by the researchers in the outpatient clinic of The Department of Endocrinology and Metabolic Disorders at a university hospital. For the sociodemographic and disease-related characteristics of the patients, a patient introduction form was developed by the researchers in accordance with the literature. The Foot Care Scale for Older Diabetics (FCS-OD) was used to assess foot care self-management skills in older adults.

Patient identification form

the form included 18 items and was developed by the researchers in accordance with relevant literature [11,12,13,14]. The patient’s age, gender, education, marital status, income level, place of residence, duration of diagnosis, type and duration of treatment, HbA1c level, smoking and alcohol consumption, comorbidities, hospital admissions, and follow-ups are recorded in the form. Additionally, the patients self-assessed their overall health status and quality of life on a scale of 0 (worst) to 100 (best).

Foot Care Scale for Older Diabetics (FCS-OD)

The tool was developed for use with older Japanese adults. Sable-Morita et al. (2021) developed the scale, which has two versions: the long version and the short version. The long version comprises 22 items and six subscales: skin condition, nail clipping, attention to wounds, relationships with others, attention to feet, and self-efficacy. The short version consists of nine out of 22 items and four subscales; skin condition, nail clipping, attention to wounds and relationships with others. The scales’ Cronbach’s Alpha values were 0.797 for the short version and 0.879 for the long version [14]. The scale was adapted to the Turkish population by Toygar et al. (2024). The long version of the scale was used in this study to predict the foot care self-management ability of older adults. No cut-off score was reported for the scale, therefore in order to identify patients with low levels of self-management, the mean score of the population was used as a reference point in this study.

Data analysis

The study presents the sociodemographic and disease-related characteristics of the patients using percentages (%), numbers (n), means (m), and standard deviations (SD). To compare the sociodemographic and disease-related characteristics between the patients who had an FCS-OD score below or above the average, chi-square and independent sample t-tests were used. IBM SPSS v27 was used to compare the frequency and mean scores between the groups.

The machine learning analyses were conducted using Ddsv4-series Azure Virtual Machines with 32 vCPUs and 128 GiB of memory. The results and parameters of the best model obtained from the analyses conducted in Azure Automated ML. Three models were used to predict foot care self-management: XGBoost, LightGBM, and Random Forest.

We have evaluated predictive model performance with 10-fold cross validation (a training set 70% and a test set 30%) [15] and all ML algorithms’ parameters optimized hyperparameters optimization method [16].

XGBoost

In the context of gradient boosting for regression, the fundamental building blocks are regression trees. Each regression tree maps an input data point to one of its leaf nodes, where a continuous score is assigned. XGBoost employs an objective function that undergoes regularization through the inclusion of both L1 and L2 terms. These regularization terms are integrated into XGBoost’s objective function to control the complexity of individual trees, mitigating overfitting and promoting model generalization.

The objective function unites a convex loss function, responsible for quantifying the disparity between predicted and target outputs, with a penalty term aimed at addressing model complexity, specifically the functions represented by the regression trees.

The training process in XGBoost unfolds iteratively. It commences with the addition of new trees that predict the residuals or errors of previous trees. Subsequently, these new trees are harmoniously integrated with the existing ensemble of trees to make the final prediction. The term ‘gradient boosting’ is aptly attributed to XGBoost as it harnesses a gradient descent algorithm to minimize the loss when introducing these new models [17] (Fig. 1).

Fig. 1figure 1

Training and classification phases of different machine learning models

It generally provides very high accuracy rates. Trees are optimized sequentially using Gradient Boosting. It is fast due to optimization techniques and hardware accelerations (e.g., using GPU). It can work with various loss functions and offers a streamlined model-building process. Hyperparameter tuning can be complex and time-consuming. Due to its high flexibility, there is a risk of overfitting if not carefully adjusted [18].

LightGBM

LightGBM is a high-performance gradient boosting algorithm that utilizes a tree-based learning approach. It was developed by Microsoft Research Asia as part of the Distributed Machine Learning Toolkit (DMTK) project in 2017 (source: https://lightgbm.readthedocs.io/en/latest). This algorithm presents several advantages over other boosting algorithms, including more effective resolution of prediction problems related to big data, efficient utilization of resources (RAM), high prediction performance, and parallel learning. Its rapid processing speed is reflected in its name, “Light.” In the article titled “A Highly Efficient Gradient Boosting Decision Tree,” LightGBM was found to be 20 times faster than other algorithms [19] (Fig. 1).

This algorithm provides high-speed training thanks to its histogram-based algorithm, which is especially effective on large data sets. It works very well with large and high-dimensional datasets. It may be less effective than other methods on small data sets, where it separates the data points into groups. Tuning hyperparameters can be complex and requires careful optimization [18].

Random Forest

The Random Forest is an ensemble learning algorithm extensively employed for both classification and regression tasks in machine learning. It functions by creating numerous decision trees during training and provides the mode of the classes for classification tasks or the mean prediction for regression tasks based on the individual trees’ outputs [20] (Fig. 1).

Random Forest is a simple and easy-to-use algorithm. It is easier to tune hyperparameters when performing model optimization compared to other algorithms. It is an algorithm that is robust to overfitting because it increases the generalization ability of the model by averaging multiple decision trees. Since each tree is created independently, it can be calculated in parallel, which reduces the computation time. On the contrary, it can be slow due to the approach of creating a large number of decision trees in large data sets. Due to the number of trees, the model can be large, and memory usage increases [18].

留言 (0)

沒有登入
gif