Auxiliary Diagnosis of Children With Attention-Deficit/Hyperactivity Disorder Using Eye-Tracking and Digital Biomarkers: Case-Control Study


IntroductionBackground

Attention-deficit/hyperactivity disorder (ADHD) is a common neurodevelopmental disorder in school-aged children, characterized by deficits in attention, hyperactivity, and impulsivity. Globally, the estimated prevalence of ADHD in children and adolescents is approximately 5.29% []; in China, the prevalence is approximately 6.4% []. People with ADHD typically exhibit deficiencies in various cognitive domains, and these symptoms can persist into adolescence and adulthood, which can result in academic underachievement and societal issues, such as substance abuse and violence []. Therefore, early identification, diagnosis, and intervention for ADHD are essential.

Despite recent advances, the diagnosis of ADHD relies heavily on subjective judgments based on the observations of children’s behavior. Consequently, this can lead to both over and underdiagnosis, as well as inappropriate treatments. Therefore, there is an urgent need to develop methods to identify reliable ADHD biomarkers. Furthermore, given that poor academic performance is the most common concern of individuals with ADHD, it is crucial that we improve awareness and understanding of ADHD among parents and teachers to ensure timely identification of ADHD. However, on the one hand, most nonmedical professionals cannot be expected to gain specialized medical expertise, and on the other hand, physicians cannot frequently visit campuses to aid in ADHD assessment. This situation has resulted in delays in diagnosing children with ADHD. Developing mobile screening equipment will enable on-campus ADHD screening to facilitate timely identification and diagnosis of ADHD.

Eye-tracking technology is particularly suited for the assessment and diagnosis of ADHD because it offers an objective measurement of children’s neuropsychological behavior. Studies have shown that there is a significant overlap between the neural networks responsible for attention and those responsible for eye-movement control []. Children with ADHD experience difficulties with spatial perception and visual-motor integration [], and these neurophysiological features associated with ADHD can be identified using eye-tracking assessments. In addition, children with ADHD often find lengthy and complex assessments challenging, particularly if they are required to wear additional equipment. Eye-tracking technology surpasses other neurophysiological techniques in its ability to record the neuropsychological activity of participants in a more natural setting []. This leads to better cooperation of children during assessments and higher reliability and generalizability of results.

Recent advances in computational psychiatry have enabled the extraction of eye-tracking metrics to discern behavioral alterations in children with ADHD [-]. These metrics encompass various aspects of visual attention, such as fixation duration, saccade velocity, and gaze entropy [-], which may serve as digital biomarkers for neurodevelopmental disorders [,]. By analyzing the temporal and spatial characteristics of eye movements, computational models can capture differences in visual behaviors between ADHD and typically developing (TD) children. Machine learning (ML) techniques have emerged as powerful tools for processing and interpreting large amounts of eye-tracking data [-]. Training ML models on labeled eye-tracking metrics has allowed the construction of robust and accurate classifiers to identify whether individuals belong to an ADHD or a TD group. Precise eye-tracking measurements and digital biomarkers hold great promise as objective and automated screening tools for ADHD, which will facilitate the development of early intervention strategies and improve the clinical outcomes of affected children [,,]. Moreover, the evolution of mobile eye-tracking technology and devices, coupled with portable computing sources, such as smartphones and tablets, will allow the implementation of eye-tracking assessments in various scenarios and thus address the need for ADHD screening in the community [-].

Related Work

Neuroimaging studies have shown that children with ADHD have multidimensional brain function abnormalities. The impairment of inhibitory control is a fundamental factor contributing to cognitive and executive functioning deficiencies in individuals with ADHD []. However, these individuals also have motor coordination difficulties, poorer spatial perception [-], reduced auditory sensitivity, and problems with attentional integration of audiovisual stimuli [].

Recently, there has been a growing interest in exploring the use of eye-tracking technology to study the neurophysiological features of ADHD. A meta-analysis of the various behavioral tests developed over the last 5 decades to evaluate eye movement and cognitive control [] revealed that eye-tracking evaluations of children with ADHD yielded the most reliable and consistent outcomes when eliminating bias. Most of these tests focused on saccade, which is one of the most crucial type of eye movement. Children with ADHD perform significantly worse than TD children across all tasks, with greater variability for each metric in the antisaccade task [].

To ensure that the screening method is appropriate for children with ADHD, we must use a paradigm that is brief and simple to perform yet capable of highlighting cognitive deficits. In addition, the extracted eye-movement metrics should be able to comprehensively characterize children’s task performance. Several recent studies have used eye tracking to explore the characteristics of ADHD. Lemel et al [] incorporated spoken-word recognition accuracy, gaze duration, and the number of transitions in response to a phonological competitor to analyze spoken-word processing in adverse listening conditions in individuals with ADHD. However, this paradigm was complex and required word recognition and was thus more suited to adult patients. Another study used a paradigm to assess children’s working memory; however, the task took 30 minutes to complete [], which is not conducive to task completion in children with ADHD. Siqueiros et al [] used the antisaccade task, which is a simple and reliable paradigm that suits children. However, only directional errors and expected eye movements were assessed; moreover, the paradigm was not sufficiently comprehensive to assess children’s task performance.

Objectives

Studies conducted to date have provided valuable insight into automatic screening approaches for ADHD in children using eye-tracking devices. However, these studies have drawbacks that have hindered the development of a more robust and accurate auxiliary diagnostic system. For example, the paradigms were too time-consuming or complex for clinical ADHD screening, and the extracted metrics were not sufficiently comprehensive. ML models used in previous studies have typically achieved only modest accuracy and sensitivity, which limits clinical applicability. Furthermore, small sample sizes have limited the robustness of the results.

To address the aforementioned challenges, we aimed to develop an accurate and reliable auxiliary diagnostic system for ADHD in children using eye-tracking technology. Specifically, the objectives of this study were as follows:

To design an eye-tracking assessment paradigm that is easy to implement and can identify differences in eye-movement patterns between children with ADHD and TD children.To extract effective eye-tracking metrics as digital biomarkers that quantitatively represent various aspects of eye-movement behaviors and use these biomarkers to construct and validate ML models to enable automatic screening of children for ADHD.To achieve high accuracy and reliability of the ML model using a large dataset, which will facilitate early screening for ADHD and timely intervention for children with ADHD and thus contribute to improving the effectiveness of the health care system.
MethodsParticipants

To ensure the representativeness of the ADHD and TD groups in this case-control study, we recruited participants from hospitals and schools separately. Children with ADHD were recruited from an outpatient clinic at a public pediatric hospital in Shanghai, China, whereas TD children were recruited from 2 general public elementary schools in Shanghai (one from an urban area and another from a suburban area). The children were divided into 3 age groups: group 1 (5-6 years), group 2 (7-8 years), and group 3 (9-10 years).

The inclusion criteria for the ADHD group were children in grades 1 to 3 with a clinical diagnosis of ADHD who were not currently receiving treatment. The inclusion criteria for the TD group were children in grades 1 to 3 with a negative assessment on the Swanson, Nolan, and Pelham Rating Scale (SNAP-IV) [].

The exclusion criteria were children with a full-scale score of <75 on the Wechsler Intelligence Scale for Children; children who had a history of severe traumatic brain injury, neurological disorders, severe physical illnesses, and psychiatric illnesses (eg, mood disorders and schizophrenia); and those unable to undergo eye-tracking examinations.

From December 2022 to April 2023, a total of 100 children with a clinical diagnosis of ADHD were recruited. Of these, 4 participants with a history of severe traumatic brain injury, neurological disorders, and other severe physical and psychiatric disorders and 2 participants who were unable to tolerate the eye-tracking assessment were excluded. This resulted in 94 participants in the ADHD group.

A total of 150 children were randomly selected as the TD group. Of these, 15 children refused to participate in the program. In addition, 2 children with a history of severe traumatic brain injury, neurological disorders, and other severe physical and psychiatric disorders and 11 children who were considered to have ADHD after the interviews and evaluations were excluded. Finally, 122 children were included in the study as the TD control group.

All personnel involved in administering the assessments in this study were full-time child health practitioners who had been working in child health care for more than 3 years. Standardized survey administration training was provided before the tests were administered.

Ethical Considerations

Before the assessment began, the purpose of the project was explained to the children and their guardians, and written informed consent was obtained from the guardians. All participants could withdraw at any stage of the study. Interviews were then conducted with the guardians to gather data on the basic conditions of the children. Children who fulfilled the inclusion and exclusion criteria were formally enrolled in the study and underwent the SNAP-IV and eye-tracking assessments. All data will be stored in a deidentified form. No participants will receive any benefit from participating in this study, but they will receive a booklet reporting the results of the assessments involved in this study as a souvenir.

The study protocol and informed consent form were approved by the Shanghai Children’s Hospital Institutional Review Board (2022R126-F01).

Paradigm DesignOverview

Eye movements were recorded at a sampling rate of 1200 Hz using the Tobii Pro Spectrum eye tracker (Tobii Pro AB), a screen-based eye tracker that captures eye movements and pupillary responses. Visual stimuli were presented at a screen response rate of <5 milliseconds on a 24-inch monitor with a resolution of 1920×1080 pixels (16:9 ratio). The Tobii Pro Lab software (version 1.194; Tobii Pro AB) was used to set up the experiment.

The assessment procedure was performed in a quiet room with only 1 overhead light source (). Participants were seated in a special seat with a chest shield to limit upper body movement and help stabilize the head. The cushion was adjusted to ensure that the center of the screen was at the same level as the participant’s head. The participant was seated in a position in which they were unable to observe the assessor’s screen or operations to minimize distractions. Participants maintained a distance of 65 cm from the screen and began the formal assessment following a 5-point calibration. Before each task, a prompt screen appeared, and the assessor provided detailed instructions to ensure that the participant fully understood the task content before proceeding with formal testing.

Figure 1. Eye-tracking assessment scenario settings.

During the assessment, participants were asked to complete 3 saccade tasks sequentially (): prosaccade, antisaccade, and delayed saccade. The stimulus was 5 cm high and 5 cm wide and randomly appeared on the left or right side of the screen. There was a central fixation cross in the middle of the screen, and the stimuli were set at 7°, 15°, and 20° away from the central cross for the different eccentricities. For each trial, a stimulus would randomly appear twice at one of the aforementioned 6 positions.

Figure 2. The eye-tracking assessment paradigm. Prosaccade Task

Prosaccade, also known as reflexive saccade or visually guided saccade, is an abrupt eye movement triggered by the sudden appearance of a stimulus []. It is primarily induced by exogenous stimuli and serves as a baseline measure. In the prosaccade task, participants were instructed to initially fixate on the central fixation cross. After 1500 milliseconds, a stimulus appeared randomly in one of the aforementioned 6 positions. Participants were required to quickly shift their gaze toward the stimulus. Once participants fixated on the stimulus area (SA) for more than 300 milliseconds, the next trial was started automatically.

Antisaccade Task

In the antisaccade task, participants were required to first fixate on the central fixation cross. After 1500 milliseconds, 1 stimulus appeared randomly in one of the 6 aforementioned positions. Participants were required to quickly shift their gaze to the target area (TA), which was the location symmetrically opposite to the stimulus relative to the central fixation cross. Upon maintaining fixation at the TA for more than 300 milliseconds, a white feedback cross automatically appeared at the TA position to indicate success before proceeding to the next trial. If the participant decided to abandon the trial, the assessor pressed the space bar to skip the trial, and a white cross was displayed at the TA before moving on to the next trial. Previous studies have used a paradigm in which the central fixation cross disappears when the stimulus is presented []. However, this can make accurately localizing the TA more challenging, which may result in children being unable to complete the task. Therefore, in this study, the central cross was retained to assist participants in locating the TA.

Delayed Saccade Task

The delayed saccade task, based on the go–no-go paradigm [], was adapted to the cognitive abilities of children with ADHD. This task not only directly assesses inhibition but also requires participants to combine auditory discrimination and visuomotor modulation. Thus, the task assesses the multisensory integration and coordination capacity of individuals with ADHD. During the task, participants were instructed to fixate on the central fixation cross. After 1500 milliseconds, 1 stimulus appeared randomly in one of the 6 aforementioned positions. Participants were asked to maintain fixation on the central cross until they heard a sound cue after 1000 milliseconds, after which they were required to shift their gaze toward the SA as fast as possible. Then, after another 3000 milliseconds, the next trial was started automatically.

For each saccade task, there were 12 formal trials (2 trials for each position). Before the formal test, practice trials were provided, where stimuli were presented randomly in the 6 positions, to allow participants to familiarize themselves with the task.

Area of Interest Division Across Tasks

To quantify the eye movements made during the different tasks, we divided the area viewed by participants into different areas (): the TA, the SA, the center area (CA), the unrelated area (UA), the proper-side area (PSA), and the wrong-side area (WSA). The TA represented the area that participants were required to fixate on, and the SA represented the area of the stimulus. For the delayed saccade task, we further divided TA into TA during the proper period (TA-P) and TA during the wrong period (TA-W) to represent the TA area in the proper or wrong time periods, respectively (). The TA and SA were the same in the prosaccade and delayed saccade tasks, whereas in the antisaccade task, they were horizontally symmetrical. The CA represented a 5 cm × 5 cm area around the central fixation cross. The UA was unrelated to the task requirements and expected to attract minimal attention during the tasks. The PSA and WSA were defined for the antisaccade task only and represented the proper and wrong areas, respectively, besides the CA.

Figure 3. Illustration of the division of areas for extracting area-based eye-tracking metrics. CA: center area; PSA: proper-side area; SA: stimulus area; TA: target area; UA: unrelated area; WSA: wrong-side area. Figure 4. The different completion statuses in the delayed saccade task. From 0 to 1500 milliseconds, participants were asked to gaze at the center area (shaded area in a). If fixation fell into the shaded area in b, this indicated the occurrence of an intrusive saccade. From 1500 to 2500 milliseconds, participants were asked to maintain their fixation on the center area (shaded area in c) until they heard the cue. Thus, if fixation fell into the shaded area in d during this period, this was defined as a target area during the wrong period fixation (ie, saccade to the target area (TA) but during the wrong period). At 2500 milliseconds, the sound cue was presented, and participants were required to fixate on the TA (shaded area in e) as fast as possible. Fixation on the shaded area after 2500 milliseconds was defined as a target area during the proper period fixation (ie, saccade to the TA during the proper period). Extraction of Digital BiomarkersOverview

On the basis of the eye-tracking paradigm, we calculated 28 digital biomarkers from the raw data recorded by the eye tracker. These biomarkers quantitatively reflect various behaviors of participants during the task, which were divided into 5 categories: general metrics (8/28, 29%), pupil-based metrics (4/28, 14%), area-based metrics (11/28, 39%), search-based metrics (3/28, 11%), and entropy-based metrics (2/28, 7%). For each assessment trial, we recorded 4 trial attributes (ie, task: prosaccades, antisaccades, and delayed saccades, target side: left and right, target eccentricity: 7°, 15°, and 20°, and trial order: first and second) and 6 participant attributes (ie, name, ID, category [ADHD and TD], sex [male and female], age, and age group). summarizes these biomarkers in terms of category, symbol, description, and task.

Table 1. Descriptions of the digital biomarkers.Category and symbolDescriptionTaskGeneral metrics
NFix.Total number of fixationsAlla
NSac.Total number of saccadesAll
TTotalTotal duration of the trialAll
TFix. Avg.Average fixation durationAll
TSac. Avg.Average saccade durationAll
VSac. Avg.Average saccade velocityAll
VSac. PeakPeak value of saccade velocityAll
ASac. Avg.Average saccade amplitudeAllPupil-based metrics
DPupil Avg.Average pupil diameterAll
DPupil Max.Maximum pupil diameterAll
DPupil Min.Minimum pupil diameterAll
DPupil Sd.SD of pupil diameterAllArea-based metrics
BTA Fix.Boolean value to signify the occurrence of fixations in the TAb (TA-Pc for the delayed saccade task)All
LTA Fix.Fixation latency of the TA (TA-P for the delayed saccade task)All
NUA Fix.Number of fixations in the UAdPe and Af
NTA Fix.Number of fixations in the TA for the whole periodDg
NTA-P Fix.Number of fixations in the TA for the proper periodD
NTA-W Fix.Number of fixations in the TA for the wrong periodD
NSA Fix.Number of fixations in the SAhA
BPSA Fix.Boolean value to signify the occurrence of fixations in the PSAiA
BWSA Fix.Boolean value to signify the occurrence of fixations in the WSAjA
BPSA Fix. 1stBoolean value to signify if the first fixation located in the PSAA
BIntrusive Sac.Boolean value to signify the occurrence of intrusive saccade during the center fixation periodDSearch-based metrics
BSearchBoolean value to signify the occurrence of the search behaviorA
NSearchNumber of search behavior occurrencesA
TSearchTotal duration of search behaviorAEntropy-based metrics
SGEnormNormalized stationary gaze entropyAll
GTEnormNormalized gaze transition entropyAll

aAll: all tasks, including prosaccade, antisaccade, and delayed saccade tasks.

bTA: target area.

cTA-P: target area during the proper period in the delayed saccade task.

dUA: unrelated area.

eP: prosaccade task.

fA: antisaccade task.

gD: delayed saccade task.

hSA: stimulus area.

iPSA: proper-side area.

jWSA: wrong-side area.

General Metrics

Human eye-movement patterns can be divided into fixations, saccades, and pursuits [], of which the former 2 patterns are the focus of our paradigm. Using the Tobii Pro Lab software, we extracted the fixations and saccades of participants in chronological order from the raw gaze data. Subsequently, we calculated the total number of fixations (NFix.) and saccades (NSac.) and their average durations (TFix. Avg. and TSac. Avg.), which reflects participants’ holistic visual behavior. The velocity and amplitude of saccades were automatically recorded by the software. We calculated the average and peak saccade velocity (VSac. Avg. and VSac. Peak) and the average saccade amplitude (ASac. Avg.) for each trial. These values reflect the scanning and information retrieval process, respectively. In addition, the total time taken for each trial (TTotal) was recorded.

Pupil-Based Metrics

Pupil size is a crucial physiological measure that reflects autonomic nervous system activity, cognitive load, and emotional arousal. It has been applied extensively to various research fields [-]. The eye tracker continuously recorded participants’ pupil diameter during each trial. We preprocessed the raw data and extracted pupil-based metrics following 5 steps () [].

Textbox 1. Preprocessed raw data and extracted pupil-based metrics.

Step 1: We removed samples labeled by the eye tracker as “invalid” and pupil diameters that fell outside the feasible range of 1.5 to 9.0 mm.

Step 2: We calculated pupil dilation speed to remove samples with a disproportionately large change in pupil size, which was usually caused by blinks or system errors. Because of the inconsistent sampling intervals, pupil diameter changes were not directly comparable between adjacent samples. Therefore, we calculated the normalized dilation speed between samples using the formula:

si = max ( | (pi − pi−1) / (ti − ti−1) |, | (pi+1 − pi) / (ti+1 − ti) | ), (1)

where pi and ti are the pupil diameter sequence and timestamp sequence, respectively. To detect outliers in the dilation speed sequence (si), we calculated the threshold, T, using the median absolute deviation (MAD):

MAD = median ( | si – median ( si ) | ), (2)T = median ( si ) + n ∙ MAD, (3)

where the scalar n was chosen as 1.5. Samples with an si larger than T were removed as outliers. Because the eye tracker simultaneously collected data from both the left and right pupils, we performed steps 1 and 2 for each pupil separately.

Step 3: We excluded samples in which data of 1 pupil was missing and calculated the mean data sequence of the left and right pupil diameters.

Step 4: Because of nonuniform sampling and the presence of noise, we used a size 20 sliding window to resample and smooth the data sequence at 500 Hz. This involved an exponential moving average based on the timestamp and skipped data gaps ≥50 milliseconds.

Step 5: Following the above preprocessing steps, we obtained a valid, uniform, and smooth sequence of pupil diameter data. We then calculated the average (DPupil Avg.), maximum (DPupil Max.), minimum (DPupil Min.), and SD (DPupil Sd.) pupil diameter values of the sequence for each trial, which reflect various aspects of the pupil state of participants.

Area-Based Metrics

We extracted a range of metrics according to the area of interest (AOI) divisions. A Boolean value for fixation incidence (BTA Fix.) was recorded to signify the completion of the task by detecting whether the TA (or TA-P for the delayed saccade task) contained any fixations. The latency of the first fixation in the TA (or TA-P) was recorded as the fixation latency (LTA Fix.). The number of fixations was counted for the SA (only in the antisaccade task), UA (in the prosaccade and antisaccade tasks), TA-P (only in the delayed saccade task), and TA-W (only in the delayed saccade task), which were denoted as NSA Fix., NUA Fix., NTA-P Fix., and NTA-W Fix., respectively. For the delayed saccade task, fixations outside of the CA during the center fixation period were defined as intrusive saccades and thus recorded as a Boolean value (BIntrusive Sac.). For the antisaccade task, if fixations were detected in the PSA (BPSA Fix.) or WSA (BWSA Fix.), these were recorded as Boolean values. We also used a Boolean metric to signify that the first fixation that occurred after the stimulus appeared was located in the PSA (BPSA Fix. 1st).

Search-Based Metrics

During the antisaccade task, participants may have had difficulty determining the correct fixation position, which may have led to a series of consecutive fixations around the TA before finally reaching the TA. In practice, we detected fixations in the surrounding area outside the TA and within a distance of 1.5 ∙ LTA from the TA center, where LTA is the length of the TA edge. Therefore, the consecutive sequences of ≥2 detected fixations were extracted as search behaviors. For each antisaccade trial, we recorded the following search-based metrics: the occurrence of search behaviors (BSearch), the number of search behaviors (NSearch), and their total duration (TSearch).

Successful antisaccade trials required both a reversed saccade as well as an accurate landing position. Therefore, these metrics based on search behavior represent participants’ vision control and distance perception abilities.

Entropy-Based Metrics

Entropy in information theory [] suggests that gaze entropy reflects the degree of uncertainty or predictability exhibited by the human eye during visual exploration. Thus, gaze entropy can provide valuable insight into the cognitive processes involved in visual perception and attention. There are 2 types of gaze entropy: stationary gaze entropy (SGE) and gaze transition entropy (GTE) []. SGE evaluates the spatial distribution of fixations, with a higher value indicating a more dispersed eye-movement pattern []. GTE focuses on the randomness of eye movements between fixations and reflects the flexibility and complexity of the scanning pattern.

As shown in , the images were divided into n different areas, which served as the individual state spaces of a discrete system. We calculated the proportion of fixations located in each area, denoted as pi for the i-th area, which formed the approximate probability distribution of the states [,]. On the basis of the entropy equation by Shannon [], SGE was calculated as follows:

SGE = – sumi ( pi ∙ log2pi ). (4)Figure 5. Division of areas for the calculation of gaze entropy metrics. It should be noted that the areas here are different from those for the area-based metrics shown in Figure 3.

Applying the first-order Markov transition matrix [], we derived p(j|i) from the fixation sequence, which represented the conditional probability of a gaze transitioning from the i-th to the j-th area. Then, GTE was computed based on the conditional entropy equation [,] as follows:

GTE = – sumi ( pi ∙ sumj ( p(j|i) ∙ log2p(j|i) ) ). (5)

The maximum entropy of a system is determined by the number of available state spaces, which occurs when they are equally distributed []. To enable a comparison between different tasks, we used the corresponding maximum value, Hmax = log2n, to normalize the computed SGE and GTE into a range from 0 to 1:

SGEnorm = SGE / log2n, (6)GTEnorm = GTE / log2n. (7)

As introduced earlier, n represents the number of areas, where n=6 for the prosaccade and delayed saccade tasks, and n=8 for the antisaccade task.

Statistical Analysis

We reviewed and uniformly numbered basic information and scale data. After eliminating data with incomplete information, data were entered in duplicate using the Chinese version of EpiData 3.1 (The EpiData Association), and Excel (version 2019; Microsoft Corp) was used to clean and organize the data.

The Tobii Pro Lab software was used to analyze basic eye-movement metrics and export data. Participants with >80% valid data were included in the analysis. Python (version 3.8) was used to extract the eye-tracking metrics.

All data were tested for normality and homogeneity of variance. Samples conforming to a normal or approximately normal distribution are represented as means and SDs, and nonnormally distributed data are described as means and 95% CIs. Count data are expressed as n (%), and differences between groups were calculated using the chi-square test. For visual harmonization, 4 valid digits were retained for the eye-tracking metrics. We used independent samples 2-tailed t tests to compare normally distributed data between the 2 groups. To compare nonnormally distributed data between the 2 groups, we used the Wilcoxon Mann-Whitney U test, and the Kruskal-Wallis test was used to compare among multiple groups. Paired comparisons for significant multiple-group comparisons were performed using the Bonferroni method. A 2-sided P<.05 was considered statistically significant.

ML AnalysisOverview

To validate the effectiveness of the proposed digital biomarkers, we conducted an ML analysis of the eye-tracking metrics to classify the ADHD and TD groups. First, we preprocessed the extracted metrics to meet the requirements of ML analysis and sequentially performed variable filtering, model construction, and model evaluation to verify the effectiveness of the extracted biomarkers. To ensure the reliability and generalizability of the model, we applied 5-fold cross-validation.

Data Preprocessing

The eye-tracking metrics were subdivided into multiple variables according to trial attributes (ie, task, target eccentricity, target side, and trial order). For each metric, we performed an average calculation for the target side and trial order, while maintaining different values for different task types and target eccentricities. For example, the metric NFix. was obtained from the prosaccade, antisaccade, and delayed saccade tasks with 7°, 15°, and 20° target eccentricities, respectively, which were subdivided into 9 variables as follows: P7NFix., P15NFix., P20NFix., A7NFix., A15NFix., A20NFix., D7NFix., D15NFix., and D20NFix. This ensured that the variability of the metrics would be reasonably preserved. The preprocessing resulted in 183 eye-tracking variables, and each participant became 1 data point for the ML analysis.

Model Construction

Before model training, we performed filtering to remove redundant variables and enhance computational efficiency. Variables that were significantly different between groups, compared using the Mann-Whitney U test, were retained.

To predict the categories of participants, we used the extreme gradient boosting (XGBoost) algorithm as the classification model. XGBoost is an advanced implementation of the gradient boosting decision tree framework, which sequentially builds an ensemble of decision trees to refine the prediction. The learning process minimizes the gradient of the loss function, thereby enhancing the model’s performance. The XGBoost algorithm applies regularization techniques to efficiently boost the model and has thus demonstrated superior performance than the conventional gradient boosting decision tree framework in similar studies [,]. We implemented the XGBoost model in Python (version 3.8) using the packages xgboost (version 2.0.1) and scikit-learn (version 1.3.0). The hyperparameter settings of the model are listed in , which are mainly the default values without adjustment to objectively illustrate the model’s performance.

Model Evaluation

The 5-fold cross-validation method with 500 repeats was applied to evaluate classification performance. The model was trained with 173 samples and tested with 43 samples for each fold. To evaluate the models, we used the receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC), which consider the trade-off between the true positive rate and false positive rate at various classification thresholds and provide a holistic assessment of the model’s classification performance. We used the evaluation metrics of accuracy, sensitivity, specificity, precision, and F1-score to quantify classification performance.

Variable Importance

When training the XGBoost model, the split gain was calculated at each node of the decision tree, which indicated the contribution of variables to the model. After the training process, the split gain was aggregated for each variable among all the decision trees to provide a comprehensive measure of the variable’s relative importance in the classification of ADHD or TD groups.


ResultsCharacteristics of the Participants

A total of 216 participants (n=122, 56.5% in the TD group and n=94, 43.5% in the ADHD group) were enrolled in the study (). Overall, there was no significant difference in age (t214=–0.30; P=.76); full-scale IQ (t214=1.14; P=.25); or verbal IQ (t214=0.03; P=.98) between the TD and ADHD groups. However, the ADHD group scored significantly lower than the TD group for performance IQ (t214=2.08; P=.04). On the SNAP-IV, children in the TD group scored within the normal range, whereas the ADHD group scored significantly higher than the TD group on all 3 core symptoms (all P<.001).

Table 2. The basic information of the participants.VariablesTDa (n=122)ADHDb (n=94)t test or chi-square test (df)cP valueSex, n (%)37.28 (1)<.001
Male61 (50)84 (89.4)


Female61 (50)10 (10.6)

Age (y), mean (SD)7.18 (1.19)7.24 (1.39)–0.30 (214).76Age group, n (%)0.63 (2).73
Group 1 (5-6 y)44 (36.1)36 (38.3)


Group 2 (7-8 y)45 (36.9)37 (39.4)


Group 3 (9-10 y)33 (27)21 (22.3)

IQ, mean (SD)
Verbal IQ97.36 (12.51)97.41 (12.90)0.03 (214).98
Performance IQ103.02 (13.25)99.01 (15.02)2.08 (214).04
Full-scale IQ100.06 (12.44)98.11 (12.43)1.14 (214).25SNAP-IVd, mean (SD)
Inattentive0.63 (0.25)15.09 (0.75)–199.20 (214)<.001
Hyperactivity or impulsive0.50 (0.30)11.59 (0.82)–137.43 (214)<.001
Oppositional defiant0.36 (0.12)7.66 (0.56)–118.54 (214)<.001

aTD: typically developing.

bADHD: attention-deficit/hyperactivity disorder.

ct-tests were used for variables presenting means and standard deviations (Age, IQ, and SNAP-IV scores), and chi-square tests were used for variables presenting numbers and percentages (Sex and Age group).

dSNAP-IV: Swanson, Nolan, and Pelham Rating Scale.

Comparison of Digital Biomarkers Between the ADHD and TD GroupsEye-Tracking Metrics Across the 3 Tasks

The analysis of the biomarkers identified for all 3 tasks (; and ) showed that for completion, there were significant differences in TA fixation incidence (calculated based on BTA Fix.) and LTA Fix. between the ADHD and TD groups for all 3 tasks (both P<.001). ASac. Avg. of the ADHD group was significantly smaller than that of the TD group in the prosaccade and antisaccade tasks (all P<.001), whereas VSac. Avg and VSac. Peak of the ADHD group was significantly slower than those of the TD group for all tasks (all P<.001). DPupil Sd. of the ADHD group was significantly greater than that of the TD group for all tasks (P=.03 for the prosaccade task, P<.001 for the antisaccade task, and P=.02 for the delayed saccade task).

In terms of attention control, in both the prosaccade and antisaccade tasks, more irrelevant fixations (ie, NUA Fix.) occurred in the ADHD group than in the TD group (all P<.001). In addition, the ADHD group fixated more frequently on the UA during the antisaccade task than in the prosaccade task.

Figure 6. Comparisons of eye-tracking metrics between the attention-deficit/hyperactivity disorder (ADHD) and typically developing (TD) groups. Results of the corresponding data analyses are presented in Multimedia Appendices 3 and 4. *P<.05, **P<.01. Fix.; fixation; GTE: gaze transition entropy; PSA: proper-side area; SA: stimulus area; Sac.: saccade; SGE: stationary gaze entropy; TA: target area; UA: unrelated area; WSA: wrong-side area. Eye-Tracking Metrics of the Antisaccade Task

The heat maps () of the analysis of the different target eccentricities () revealed that the TD group’s fixations were concentrated along the horizontal position where the SA and TA were located, whereas the ADHD group’s fixations were more widespread. Moreover, the TD group was more accurate than the ADHD group in fixating on the TA, whereas the ADHD group showed more erroneous localization deviations in both the 7° and 15° trials. Interestingly, in the 20° trial, we noted that the fixation concentration of the ADHD group deviated from the stimulus: there was a longitudinal distribution of fixations along the edge of the correct side of the screen, which suggested that the ADHD group did not localize fixation according to the logic of symmetry; rather, they relied purely on the edge of the screen to assist in their fixation positioning.

Figure 7. Heat maps of fixations of the typically developing (TD) and attention-deficit/hyperactivity disorder (ADHD) groups for stimuli of different target eccentricities in the antisaccade task.

As shown in and , the ADHD group had more WSA fixations (calculated from BWSA Fix.) and fewer PSA fixations (calculated from BPSA Fix.) than the TD group (all P<.001). Among the 3 eccentricities, the number of WSA fixations during the 15° and 20° trials were significantly different between the groups (U=81,316 for 15°, U=80,812 for 20°, all P<.001), whereas in the 7° trials, both groups showed a higher number of WSA fixations (U=87,841, P=.52) than PSA fixations. However, the TD group had more PSA fixations in the 7° trials and a higher incidence of the first fixation in the PSA (calculated from BPSA Fix. 1st) than the ADHD group (all P<.001).

Comparisons of search incidence (calculated from BSearch), NSearch, and TSearch between the ADHD and TD groups showed that the ADHD group was significantly higher than the TD group for all 3 metrics (P<.001, P<.001, and P=.008, respectively). Both SGE and GTE were significantly higher in the ADHD group than in the TD group (all P<.001).

Eye-Tracking Metrics in the Delayed Saccade Task

As shown in and , TA-P fixation incidence (calculated from BTA Fix.) and LTA Fix. were significantly different between the 2 groups at all eccentricities. Moreover, the TD group had a lower NTA-W Fix. than the ADHD group (all P<.001).

As the stimulus eccentricity increased from the center point, only the TD group showed an improvement in performance. The TD group showed a lower NTA-W Fix when the eccentricity was 15° than when the eccentricity was 7°, whereas the decrease in NTA-W Fix in the ADHD group from an eccentricity of 15° to 20° was more gradual than that in the TD group.

The assessment of intrusive saccades for stability of eye movements showed that the ADHD group had more intrusive saccades (calculated from BIntrusive Sac.) and less stable eye-movement patterns than the TD group (P<.001).

Comparisons of Digital Biomarkers Among Age Groups

We discovered that several digital biomarkers showed consistent changes with age (; and ). In the prosaccade task, the overall TTotal of both groups showed a decreasing trend with age (P=.02 for ADHD, P<.001 for TD). In addition, an age-related decrease in ASac. Avg. was observed in the TD group only (P=.007), whereas VSac. Avg. and VSac. Peak remained stable in both groups (P=.71 for VSac. Avg. and P=.46 for VSac. Peak). In the antisaccade task, both the TD and ADHD groups showed an increasing trend for accuracy (P<.001 for ADHD, P=.63 for TD) and efficiency (P<.001 for ADHD and TD) in completing the task. In fact, the ADHD group showed significantly greater improvement than the TD group (P<.001). The ADHD group also exhibited a propensity for DPupil Sd. to decrease with age (P<.001). Across all age groups, the ADHD group had a higher NUA Fix. than the TD group (P<.001), and this did not significantly improve with age; although the NSA Fix. significantly dropped with age (P<.001). We also found that there was a greater tendency for SGE and GTE to decline with age in the TD group than in the ADHD group (P<.001 for SGE and P=.001 for GTE).

The TA-P fixation incidence (P=.06) did not significantly differ with age in the ADHD group for the delayed saccade task. This was true despite the ADHD group showing improvements in LTA Fix. (P=.01), NTA-W Fix. (P=.005), and intrusive saccade incidence (calculated from BIntrusive Sac.; P=.003) with age.

Figure 8. Comparisons of eye-tracking metrics among age groups. Letters above the bars indicate the results of the post hoc tests using Bonferroni correction among different age groups in the attention-deficit/hyperactivity disorder (ADHD) and typically developing (TD) groups. Lower case letters indicate P<.05; upper case letters indicate P<.01. *P<.05, **P<.01. Fix.; fixation; GTE: gaze transition entropy; SA: stimulus area; Sac.: saccade; SGE: stationary gaze entropy; TA: target area; UA: unrelated area. ML Analysis With the Proposed Digital Biomarkers

The evaluation metrics (AUC, accuracy, sensitivity, specificity, precision, and F1-score) are reported as means (95% CIs). The XGBoost model trained on the eye-tracking variables achieved an AUC of 0.965 (0.964-0.966), an accuracy of 0.908 (0.907-0.910), a sensitivity of 0.877 (0.874-0.880), a specificity of 0.932 (0.930-0.934), a precision of 0.913 (0.910-0.915), and an F1-score of 0.892 (0.890-0.894). The averaged ROC curve is shown in , which illustrates the effectiveness of the proposed digital biomarkers for discriminating the ADHD and TD groups. The 10 most important variables for the model are reported with their scores in .

Figure 9. Receiver operating characteristic curve of the classification model. AUC: area under the receiver operating characteristic curve.

留言 (0)

沒有登入
gif