The autism biomarkers consortium for clinical trials: evaluation of a battery of candidate eye-tracking biomarkers for use in autism clinical trials

Autism biomarker consortium for clinical trials (ABC-CT) protocol

The first ABC-CT study was a five-site observational study involving clinician, caregiver, and lab-based measures as well as a battery of electroencephalography (EEG) and ET tasks. Participants were school-age children with ASD or typical development (TD) assessed across three timepoints: Time 1 (T1), Time 2 (T2: T1 + 6 weeks), and Time 3 (T3: T1 + 24 weeks), with each timepoint conducted over two days. ET tasks were administered on both days at each timepoint. This report focused on data from T1 and T2, as the six-week span between the two timepoints approximates the duration of many clinical trials and is relevant to understanding short term stability. T3 data are being analyzed elsewhere in the context of longer-term developmental change and change in clinical status.

Informed consent/assent was obtained from all guardians and participants after procedures were fully explained and the opportunity to ask questions offered. The protocol was approved and overseen by a central IRB at Yale University.

An overview of the ABC-CT history and protocol is available in [23], with data acquisition and quality control details in [25]. More extensive protocol, participant, and ET methodological details are provided in Supplemental Information. Study data are available in [26].

Participant characteristics

Participants were children 6;0 to 11;6 years old at T1, an age range selected to constrain age-related developmental heterogeneity and increase likelihood of successful biomarker data acquisition [23]. Children in the ASD group (n = 280) met DSM-5 diagnostic criteria for ASD [1] based on gold-standard research diagnostic criteria with the ADOS-2 and the ADI-R and had full scale IQ between 60 and 150. TD children (n = 119) were screened for the presence of ASD, emotional and behavioral disorders (based on [27] and medical history), and had full scale IQs between 80 and 150. Exclusions for both groups included genetic or neurological conditions, or sensory challenges that would impact protocol completion. In the ASD group, medications were stable for 8 weeks prior to enrollment. See Supplemental Information for additional inclusion, exclusion, and assessment details. Groups did not differ by age (t = 0.199, p = 0.843) nor sexFootnote 1 (Χ2 = 2.19, p = 0.139) but differed in diagnostic and clinical characterization (Table 1). Patterns of results were unchanged when considering subsets of participants with valid data for each ET biomarker (Additional file 1: Tables S1ab).

Table 1 Participant characteristics. Mean and standard deviation are presented for clinical assessments for the full sample at T1. For characterization associated with subsets completing ET tasks, see Additional file 1: Tables S1ab. For clinical variable descriptions, see Additional file 1: Table S2Data acquisition

ET data acquisition was stringently standardized [25], with all sites achieving and maintaining protocol fidelity through rigorous training, manualization, and quality control procedures overseen by the Data Acquisition and Analysis Core (DAAC) of the ABC-CT. Manuals (see Supplemental Information) are available upon request.

Equipment

Sites used SR Research Eyelink 1000 Plus binocular remote eye trackers operating at 500 Hz. Stimuli were presented on 24″ 1920 × 1200 pixel 60 Hz monitors and controlled via identically configured presentation computers using Neurobehavioral Systems Presentation v18.1. Video cameras recorded the face and upper torso of the child and were multiplexed with video feeds from the ET control (host) computer and the presentation screen for subsequent behavioral review and quality assurance. See [25] for additional equipment details.

Protocol

ET sessions began with children seated (eye-to-monitor distance: 65 cm) in front of the stimulus presentation monitor. No head supports/restraints were used. A child-appropriate movie was played to capture the child’s attention, followed by a 5-point ET calibration procedure, and then administration of ET tasks.

Site behavioral assistants added supplemental verbal directions (e.g., “Sit back”, “Talk later”, “Watch TV”) and behavioral supports appropriate to the cognitive level and behavioral needs of children.

ET sessions were conducted on both days of each timepoint, with each session lasting approximately 14.5 min (involving 9.7 min/54 trials of experiments; see Additional file 1: Table S3 for experimental task administration details). Trials from ET tasks were interleaved in blocks to reduce fatigue and optimize child engagement. Validation targets were periodically administered to facilitate error estimation and scanpath recalibration. Task order was counterbalanced across participants.

Acquisition metrics, quality control, and derived variables

Subsequent to transfer of data from sites to the ABC-CT Data Coordinating Core, acquired ET data were processed centrally by the DAAC to extract acquisition metrics and derived variables.

Trial validity criteria for ActivityMonitoring, SocialInteractive, StaticScenes, and Biomotion tasks were percent of acquired ET data relative to stimulus presentation time (%Valid Data) ≥ 50% and calibration error (Cal Error) ≤ 2.5° (visual degrees, 1° = 42 pixels). For PLR, additional criteria were imposed to ensure rigor of latency and constriction size estimates.

Data from an ET session (single day) were invalidated if experimental counterbalancing errors, technical malfunctions, or non-standardized verbal cues (e.g., specific direction of attention to the stimuli) occurred. Data from an ET timepoint (both days) were invalidated if fewer than 25% of trials were valid (%Valid Trials). The OMI biomarker (made up of ActivityMonitoring, SocialInteractive, and StaticScenes tasks) was considered valid only if all constituent sub-tasks (ActivityMonitoring, SocialInteractive, and StaticScenes) were valid. Aggregated acquisition metrics at the task-level were: %Valid Data, Cal Error, and %Valid Trials.

Derived measures for each individual at each timepoint were averaged over all valid trials for that task. OMI, ActivityMonitoring, SocialInteractive, StaticScenes, and Biomotion involved region-of-interest (ROI) analysis (Additional file 1: Figure S1), where presented scenes were divided into zones associated with semantic labels and the proportion of valid gaze data within those zones calculated (e.g., %Face for percentage of time spent looking at faces). For PLR, latency and relative pupil constriction were computed as in [28].

All quality control (QC) criteria and derived variable definitions were formulated before ABC-CT main study enrollment and maintained throughout the entirety of the study. See Supplementary Information for additional details regarding QC, acquisition metrics, derived variables, and pre-hypothesized effects.

Experimental tasks

Five experimental ET tasks were administered (Fig. 1). Based on preliminary findings from the ABC-CT Feasibility Study [25], conducted prior to the main study reported here, an additional biomarker, the Oculomotor Index of Gaze to Human Faces (OMI), was constructed as the average of %Face from ActivityMonitoring, SocialInteractive, and StaticScenes tasks. See Additional file 1: Table S3 and Supplementary Information for details regarding experimental tasks including OMI derivation (Additional file 1: Tables S4-5).

Fig. 1figure 1

Experimental Tasks. (Top row) Tasks comprising the Oculomotor Index of Gaze to Human Faces (OMI): ActivityMonitoring (AM, videos depicting two actors engaged in a shared activity), SocialInteractive Scenes (SI, videos depicting two children involved in interactive and parallel play activities), and StaticScenes (SS, Social Static Scene images showing everyday scenes involving social interactions). (Bottom row) Biomotion (BM, Biological Motion preferential looking videos with point-light displays of human actions paired with non-human control conditions. Lines in human figure added for illustrative purposes only), and Pupillary Light Reflex task (PLR, images depict frames in the video sequence including the bright screen flash)

Activity monitoring (activitymonitoring)

This task [29, 30] showed interleaved eight trials of static images (10 s each) and eight trials of dynamic videos (20 s each) of two actresses playing with children’s toys. During static image trials, a wordless soundtrack was played. During video trials, the actresses spoke in child-friendly language and directed their eyes to each other (mutual gaze) or the joint activity (activity gaze). The primary dependent variable was percentage of time spent looking at the heads and faces of the actresses (%Face), relative to the amount of validly acquired ET data during a trial. Secondary variables included percentage of valid time spent looking at actress activities (%Activity).

Social interactive task (socialinteractive)

This task [31] showed silent 15-s videos of two school-aged children engaged in parallel (11 trials) or cooperative play (11 trials) with toys. The primary dependent variable was percentage of valid time spent looking at heads and faces of actors (%Face). Secondary variables included percentage of valid time spent looking at any part of the actors (%Social: sum of face, body, and activity regions).

Static social scenes task (staticscenes)

This task showed, for 20-s each, six photographs of solitary and social interactions of children or of children and adults [32]. It was repeated on each day of each timepoint, with images flipped horizontally on the second day. Like the SocialInteractive task, the primary variable was %Face, and secondary %Social.

Oculomotor index of gaze to human faces (OMI)

A principal component analysis of ET derived variable data from the Feasibility stage of the ABC-CT study (see 23) revealed a primary component dominated by %Face variables from ActivityMonitoring, SocialInteractive, and StaticScenes tasks. As the weights for all of these variables were comparable, we created the OMI biomarker as a composite score averaging ActivityMonitoring, SocialInteractive, and StaticScenes %Face with equal weights.

Biological motion preference task (biomotion)

The Biological Motion Preference task involved 40 trials of soundless point light displays of human biological motion side-by-side with a non-biological motion control based on [33]. Human biological motion included primitive motor, affective, communicative, tool-oriented, or goal-oriented movements from [34]. Control conditions were either rotating or scrambled point light displays. The primary variable was biological motion preference percentage (%Bio, time looking at biological motion divided by time looking at biological motion or control). Secondary variables included biological motion preference from affective stimuli (%BioAffect).

Pupillary light reflex task (PLR)

The Pupillary Light Reflex task included 18 trials of a dark screen with a small, 0.7 degree animation at the center, then a flash of white for four frames, followed by the return of the dark screen and central animation [28]. A sound effect accompanied the animation throughout each trial. The primary variable was latency to minimum pupil size acceleration (Latency). Secondary variables included relative pupil constriction (Constrict) [28, 35].

Analytic plan

Analyses were pre-specified as highlighted in [23, 25]. Notably, examination of distributional characteristics of biomarker outputs [25] did not reveal statistical pathologies that would interfere with analytical interpretation. Nonetheless, ANOVA methods used heteroskedastic consistent covariance matrices to accommodate unequal group variances; correlations relied upon Spearman rank correlation coefficients for robustness against potential leverage effects due to outliers or severe non-normality. See Supplemental Information for additional details on correlation method rationale.

As a primarily descriptive study, no controls for multiple comparisons were enacted. However, we note that hypotheses for primary analytical aims were pre-specified; secondary analyses are presented primarily in Supplemental information.

Acquisition

For each ET biomarker, we examined rates of data acquisition (percentage of children generating any data) and data validity (percentage of children whose data passed all quality control criteria) (Tables 2, S6ab). We considered > 70% data validity in both ASD and TD groups to index suitability for clinical trials based on data acquisition rates reported in prior published experimental studies, consultation with statistical and biomarker-domain experts, and consensus across project stakeholders and external reviewers. Diagnostic group and potential site differences in acquisition rates were assessed with chi-square tests. Differences in acquisition metrics (%Valid Trials, %Valid Data, and Cal Error) were assessed with univariate ANOVA (Additional file 1: Table S7). Relationships among acquisition metrics and child characteristics were assessed using Spearman’s rank correlation (Additional file 1: Table S8). Analyses were conducted both unadjusted and adjusted for age, IQ, and site.

Table 2 Biomarker properties. For extended data see Supplemental Tables.Construct validity

To ascertain whether tasks successfully tapped constructs of interest, we examined pre-defined hypotheses for each task in the TD group (Tables 2, S11a). These hypotheses primarily served to verify that tasks were eliciting expected responses from TD children based on their intended design. ActivityMonitoring, SocialInteractive, and StaticScenes tasks were all designed wholly or in part to examine attentional predispositions for directing gaze toward social information as present in faces, motivated by studies indicating that faces are a privileged target for visual attention in TD individuals [36, 37]. For these tasks, we used one-sample t-tests of %Face against the scene percentage occupied by the Face region, examining whether completely randomly directed attention could explain the proportion of time spent by TD children looking at faces. As a stronger benchmark, we also used a variation of the most well-studied low-level computational model of visual saliency [38], extended for motion saliency calculation [39, 40], to compute gaze probability fields (see Supplemental Material for additional notes on Construct Validity). For Biomotion, construct validity tested biological motion preference, i.e., greater than chance looking at biological compared to control motion (one-sample t-test against 50%), reflecting attentional preferences for biological movements as expected in typically developing individuals [41, 42]. For PLR, we tested whether the pupil constricted after the screen flash (one-sample t-test against 0), indicating expected behavior of the pupil to light [43].

Six-week stability

In the ASD and TD groups, we assessed short-term stability of individual biomarkers from T1 to T2 (~ 6 weeks) using intraclass correlation (ICC, via two-way mixed models with absolute agreement) (Table 2). We defined ICC ≥ 0.5 as a moderate relationship and ICC ≥ 0.75 as a high relationship across 6 weeks. To examine whether participant age or IQ influenced stability within the ASD group, we also examined children younger and older than 8.5 years of age and with IQs below or above 75. We distinguish six-week stability from a focus on test–retest reliability, which would require repetition of the biomarker assessments in close temporal proximity on the scale of hours or days.

Group discrimination

We examined group discrimination at T1 and T2 using ANOVAs (Tables 2, S12ab) with heteroskedasticity consistent covariance matrix (HC3) correction due to unequal group variances. To verify that results were not driven by age, IQ, site, or %Valid Data, we included them as simultaneous covariates in follow-up models. We note that the development of a discrimination biomarker is not the primary intention of this analysis. Rather, examination of between-group discrimination serves two purposes. First, because biomarkers were selected on the basis of prior findings and preliminary studies, it is necessary to replicate prior findings so as to verify the reproducibility and generalizability of targeted constructs. Because the foundational literature associated with ET paradigms all involve between-group differences in biomarker performance, this process served as a “secondary construct validity criteria,” providing evidence that ET biomarkers were performing “as expected.” Second, because the selected ET biomarkers were developed to investigate mechanistic phenomena, the presence of between-group differences (especially in reference to a typically-developing control population) signifies atypical function of associated mechanisms at a group level in ASD. These differences are not expected to have effect sizes at the level of individual diagnostic precision, but rather to associate with broad group-level distributional asymmetries in biomarker performance. These asymmetries, in turn, are expected to point to the presence of more homogeneous subsets within the heterogeneity of the autism spectrum, allowing for the indexing of individuals within the autism spectrum with specific patterns of outlying biomarker performance.

Clinical correlations

To examine the extent biomarkers could explain known heterogeneity and areas of vulnerability in ASD, we examined relationships between biomarkers and clinical and behavioral characteristics at T1 in the ASD group (Tables 2, S13a). As with acquisition measure correlations with clinical phenotype, analyses were conducted using Spearman’s correlations both with and without partialing for age, IQ, and %Valid Data (with comparisons of Pearson’s and Kendall’s correlation in Additional file 1: Tables S13a1 and S13a2, respectively).

留言 (0)

沒有登入
gif