Toward More Accessible Fully Automated 3D Volumetric MRI Decision Trees for the Differential Diagnosis of Multiple System Atrophy, Related Disorders, and Age-Matched Healthy Subjects

One of the major objectives of the current study was to develop an accessible approach to the differential diagnosis of MSA from other clinically related NMD diagnoses. We envision that our approach might be adapted to a community setting where 3D MRI is readily available. To facilitate this adaptation, we focused on a fully automated approach that employed open-source software and a statistical method that is unbiased, intuitive, and widely translatable (i.e., decision tree). Moreover, we purposely included patients from across a large tertiary level hospital center to create a more clinically heterogeneous cohort that had been subjected to a range of MRI scanners and methodologies. We reasoned such a cohort would better mimic a community-based population.

The post-hoc analysis of presenting symptom at the time of initial MRI revealed mixed or atypical presenting symptoms in 24% MSAc, 32% MSAp, and 12% PSP patients (Table 1). Taken together with the 4% of the cohort with non-motor symptoms only at time of initial MRI, these findings illustrate that early in the course of NMD, differentiation on clinical grounds can be quite difficult, underscoring the importance of creating objective methods to assist early differential diagnosis of NMD patients that are independent of clinical differentiation between ataxic and parkinsonian presentation.

For differentiating NMD, a novel measure, the 3D-PMR, proved valuable. Drawing inspiration from the 2D literature in which midbrain and pons areas on mid-sagittal plane were utilized on their own, as a ratio, or as part of a formula, we developed this measure as a ratio of pons to midbrain volumes [5,6,7,8,9,10]. Interestingly, 3D-PMR was not only the key upstream node for distinguishing MSA from other NMDs likely related to pontine atrophy in MSA but also useful in distinguishing PSP and PD, likely due to midbrain atrophy observed in PSP (Fig. 4). MSA patients who did not meet the criteria of low 3D-PMR were correctly classified further downstream in the decision tree by excluding thalamic atrophy and with suggestion of midbrain atrophy, which correlates with findings from prior literature of midbrain atrophy in MSAp [31]. Overall, the presence of 3D-PMR at multiple points of the tree suggests that it can be served as a gradient biomarker for differentiating multiple NMDs, rather than a binary decision maker.

Notably, our two decision-tree approaches (Figs. 3 and 4, respectively) were intended for two distinct purposes. Our initial approach — to distinguish our NMD from healthy subjects (Fig. 3) — was designed to simulate the type of initial filter one might employ among community clinicians and imagers to identify NMD patients among the wide variety of patients with various pathologic and functional disorders potentially mimicking NMDs such as essential tremor, Wernicke’s syndrome, communicating hydrocephalus, and spinal disorders among others. The screening decision-tree produced a high sensitivity/specificity of 84/94% and cross-validated accuracy of 84.4% in our relatively technically heterogeneous cohort. This supports the hypothesis that our approach may be able to help community physicians facing cohorts of patients with movement issues of various etiologies to identify NMD patients for specialist referral. The tree may be useful even before motor symptom onset as it showed an even higher accuracy of 88% for predicting the presence of a NMD in patients with MRI before motor symptom onset. Both of these results will need to be validated on a larger and even more heterogeneous cohorts incorporating patients with various movement-related impairments unrelated to NMDs.

This screening tree (Fig. 3) identified lower volumes — likely from atrophy — of the cerebellar white matter, striatum, putamen, and midbrain in NMD patients compared to healthy subjects. For thalamus, however, large volume suggested NMD, which may be due to thalamic enlargement observed in PD [32]. The consistency of these results with previous literature and with the generally accepted neuropathology and pathophysiology of NMDs is very encouraging for clinical translation because it suggests that the correlations embedded in this decision-tree method are biologically relevant and thus potentially generalizable.

The second decision tree (Fig. 4) was designed to assist neurology specialists and movement disorder sub-specialists in differentiating among the various NMDs, once patients with unrelated movement issues have been excluded. This tree provided more granular understanding of the specific volumetric differences that had emerged in our first decision tree: the small pons volumes were driven by pontine atrophy in MSAc and small striatum volumes by putaminal atrophy in MSAp. Thalamic atrophy suggested PSP and distinguished from PD as well as MSA, a finding, as noted above, that is consistent with previously reported enlarged thalami in PD in addition to previously reported thalamic atrophy in PSP [32, 33]. SCP atrophy suggested PSP as observed in prior literature [34, 35].

The second tree (Fig. 4) demonstrated specificity of 83.8–95.9% and sensitivity of 72.0–94.4% for pairwise comparison of MSA vs PD vs PSP. Further branching of this tree designed to differentiate MSA into MSAp and MSAc was also attempted, which differentiated MSAp with high specificity (91.9%) but a low sensitivity of 63.6%. We presume this less effective performance resulted from our much smaller sample size for MSAp patients. Additionally, overlapping phenotypes between MSAp and MSAc as well as MSAp and PD patients likely contribute to the lower sensitivity of MSAp group, which again emphasize the difficutly in differential diagnosis [35, 36]. Consistent with this possibility, 5 of the 8 incorrectly classified MSAp patients were classified as MSAc and 3 as PD in the second decision tree (Fig. 4). Furthermore, for our autopsy-proven MSAp cases, 1/4 was incorrectly classified by the decision tree as MSAc. The high performance in patients with autopsy-confirmed diagnoses (82%) and MRIs before motor symptom onset (88%) is notable. While the numbers are small, this result suggests that our approach may prove robust in identification of patients with NMDs, particularly MSA, early in the disease course. This would be highly desirable, not only to decrease patient suffering and healthcare costs but also to triage patients for therapeutic trials at the early stages where disease-modifying interventions are most likely to work. This will by synergistic with the efforts for early and accurate diagnosis, for example through CSF analysis for α-synucleinopathies or skin biopsy for PD (also see accompanying submission, Ndayisaba, Pitaro, Willett et al. The Cerebellum, this issue) [37, 38].

Finally, our supplementary decision tree (Supplementary Fig. 1) generated with data from a sub-group of patients who had parkinsonism resulted in SCP as its first node with atrophy suggesting MSA, PD, or PSP. Without SCP atrophy, a larger 4th ventricular size, which is an indirect measure of cerebellar atrophy, suggested MSAp rather than PD. Smaller 3D-PMR suggested MSA, among which pontine atrophy pointed towards MSAc. Larger 3D-PMR, along with larger 3rd ventricular size, which is an indirect measure of thalamic atrophy, suggested PSP. Although underpowered, this tree demonstrated relatively high specificity (79–99%) which suggests that with additional data, this approach can assist with differentiation of those presenting with parkinsonism, which is among the most challenging for differential diagnosis in NMDs.

Although our sample size is relatively small, especially in the MSAp cohort, this is still one of the largest comprehensive cohorts yet published, consistent with the rarity of atypical parkinsonism-ataxia spectrum disorders (Table 2). Future studies with still larger cohorts and multi-center cohorts are needed. Heterogeneity of the disease cohort, such as the older age of the PSP and PD groups, is a potential source of bias, although the inclusion of age and sex-matched healthy subjects should somewhat mitigate against this. Among the sources of this heterogeneity, the older age of the PSP and PD groups likely reflects later average symptom onset in these diseases compared to MSA. This may have degraded the performance of our classifiers. However, because this age distribution should more closely simulate a real community-clinic populations, the strong overall performance of our decision trees suggests that our method will continue to perform well in prospective community application. The longer interval from symptom onset to MRI in the PD group is another limitation that may mirror future clinical use because of the slower progression of symptoms in PD. Interestingly, the two PD patients with earlier MRIs available preformed before any symptom onset were correctly classified as PD, supporting the notion than the classifier may perform better in the critical early detection task. Prior to prospective clinical use, studies with longitudinal analysis of patients with MRIs at multiple time points will be needed to understand the effect of symptom duration on MRI volumetric diagnosis of NMDs [17]. Variability in the timing of MRI in relation to symptom onset is another limitation, inherent to a retrospective study. While this likely decreases the overall measured sensitivity and specificity of the decision trees, it reflects the variability in practice patterns among providers ordering MRIs. As such, the reports’ sensitivities and specificities are likely to approximate those that will be encountered in clinical use of the trees.

As noted, we included only 3D-PMR and volumes of midbrain, pons, SCP, caudate, putamen, striatum, thalamus, pallidum, cerebellar white matter, hippocampus, amygdala, lateral ventricles, third ventricle, fourth ventricle, and choroid plexus in the input set when performing the decision tree analysis. This excluded a number of other structures that are known not to be involved in the NMD of interest, or for which low segmentation reliability is either known from prior literature or was identified in our study. In particular, we excluded cerebellar gray matter, cerebral cortex and white matter, and corpus callosum volumes which might otherwise have been of interest, because of known limitations of Freesurfer v6.0 accuracy for cerebellar gray matter and inaccurate segmentation of the supratentorial volumes in a number of patients on visual review. We also eliminated nucleus accumbens, a relevant structure of interest, as an independent volume, merging it with putamen and caudate in our striatum volume, because the accumbens boundaries cannot be discerned readily with T1-weighted imaging. Notably, we used a more advanced version of Freesurfer, v6.0 (Table 2), which was a major upgrade from the previous v5.3 with improvements in registration, segmentation, and classification. In particular, this version has improved putamen segmentation and automatic brainstem sub-segmentation module [17, 20, 39]. For these reasons, the resulting classifier may prove to be more generalizable than earlier approaches.

Our study illustrates the major advantage of decision-tree: it is a more interpretable and transparent statistical method compared to more complex SVM and deep learning approaches that can appear as a “black box” to the broader clinical community. The ability to understand the basis for individual patient classifications is one of the main advantages of this approach. Put another way, it is a simple, interpretable, and transparent statistical analysis and modeling technique. Such methods reduce the likelihood of data overfitting, classifier performance drift, and performance degradation in application to external datasets while at the same time engender physician acceptance and trust in the method. Bayes’ theorem dictates that the positive predictive value of any test depends heavily on the pre-test prevalence of disease in the test population [40]. Therefore, accuracy of any classifier is expected to decrease when it is applied to less carefully selected populations, and careful stepwise validation in progressively more heterogeneous populations is required to translate diagnostic methods developed in carefully characterized patient cohorts to broader use. Validation across multiple independent datasets is warranted for this technology to become widely accepted.

留言 (0)

沒有登入
gif