TraMiner: Vision-Based Analysis of Locomotion Traces for Cognitive Assessment in Smart-Homes

In this section, we report our experimental evaluation which was carried out with real-world locomotion data acquired in an instrumented smart-home from a large set of seniors, including cognitively healthy seniors, PwD, and persons with MCI. Therefore, in the following subsections, we explain the used dataset, the experimental setup, and the achieved results. We also experimentally compare our technique with an existing state-of-the-art method.

Fig. 5figure 5

The smart-home layout used in our experiments [43]

Dataset

The experiments were carried out considering real-world trajectories acquired from 153 individuals in a smart-home of the CASAS test-bed [44] and annotated by the researchers of the Center for Advanced Studies in Adaptive Systems (CASAS) at Washington State University (WSU) [22]. The smart-home layout is represented in Fig. 5. The smart-home is a two-story apartment equipped with different kinds of sensors, and its floor plan includes a living/dining room, three bedrooms, a kitchen, and a bathroom. For our experiments, we relied on passive infrared (PIR) motion sensors and door sensors to track the movements of the individuals in the home. PIR sensors are mounted on the ceiling and their accuracy is about one meter. In total, the apartment included 51 motion sensors and 16 door sensors.

Participants were recruited by advertisement and physician referrals. After obtaining informed consent, participants underwent multidimensional clinical assessment by neuropsychologists, in order to assess the cognitive health status. For the sake of anonymity, explicit identifiers of the individuals involved in the study were removed, and quasi-identifier personal data, such as age, were generalized to age ranges. The protocol of recruiting and data collection was approved by the Institutional Review Board of WSU [22]. As a consequence of clinical examination, each participant was classified as either PwD, person with MCI, or cognitively healthy person.

Participants were asked to individually execute scripted Day-Out Tasks (DOTs) in the smart-home test-bed. DOTs are naturalistic tasks involving the execution of interleaved activities for reaching a certain goal [45]. Each participant executed the DOTs in a single day for an average of three hours. Data collection occurred in the morning or in the early afternoon. During the execution of DOTs, the sensor infrastructure acquired the sensor data triggered by their actions and movements. The detailed descriptions of DOTs, together with the collected dataset, are available on the WebFootnote 1. The smart-home setup is described in detail in [46].

While 40 PwD were recruited, only part of them were able to participate to the data collection in the smart-home. Hence, in our experiments we considered only PwD who were able to carry out activities in the home (19 individuals). In our experiments, we considered also the data acquired from the 80 seniors aged 60 to 74 years old, and from the 54 persons with MCI.

Comparison with State-of-the-Art Methods

In order to experimentally compare our method with the state of the art, we implemented both a baseline numeric feature extraction method, and the image-based method proposed by Gochoo et al. in [37].

State-of-the-Art Numeric Feature Extraction (NFE)

As a comparison, we implemented a baseline feature extraction method, in which each feature corresponds to a locomotion-based clinical indicator of cognitive decline proposed in the literature. We call this method numeric feature extraction (NFE). We consider the following indicators:

Pacing [13] travel pattern, defined in the Martino-Saltzman model: this feature counts the number of observations of this pattern in the last day.

Lapping [13] travel pattern, defined in the Martino–Saltzman model: This feature counts the number of observations of this pattern in the last day.

Random [13] travel pattern, defined in the Martino-Saltzman model: This feature counts the number of observations of this pattern in the last day.

Jerk [15] is computed as the first time derivative of acceleration. This features represents the average jerk observed in the individual’s trajectories in the last day.

Straightness [16] represents the average straightness computed on the individual’s trajectories of the last day.

Sharp angles [6] feature counts the number of sharp angles observed in the individual’s trajectories during the last day.

In order to compute the above-mentioned features, we adopt the algorithms recently presented in [47].

State-of-the-Art Visual Feature Extraction (GVFE)

In the original paper where the method proposed by Gochoo et al. was presented [37], the GVFE technique was used to recognize those abnormal locomotion patterns that are strong indicators of neurocognitive diseases according to the Martino–Saltzman model; i.e., pacing, lapping, and random patterns. The authors experimented their GVFE technique in the same smart-home environment used in our experiments, with data acquired from a cognitively healthy senior during 21 months. They obtained very good results, achieving accuracy above 97%. Those results indicate that the GVFE technique is effective in recognizing abnormal travel patterns that indicate cognitive impairment. For this reason, we experimentally compare the GVFE technique with our visual feature extraction method.

In the GVFE technique, images in each trajectory are prepared based on the sequence of activation of the position sensors. Each trajectory is converted to a binary image, where the x axis represents the temporal order of the sensor activation, and the y axis represents the numeric identifier of the fired sensors. For example, suppose that the temporal sequence of activated motion sensors in a trajectory is: M005, M003, M005, M010, M011, M001. Then, the only non-zero pixels in the image correspond to the following coordinates: (1,5), (2,3), (3,5), (4,10), (5,11), (6,1).

Fig. 6figure 6

Example of images generated through the GVFE feature extraction method. The left-hand side image is extracted from a trajectory using \(T_s = 90 \, s\). The right-hand side image is extracted from a trajectory using \(T_s = 150 \, s\)

Also for this feature extraction method, we apply trajectory segmentation using different values of the threshold \(T_s\). The width of the corresponding images is chosen based on the maximum length of trajectories. For example, in our dataset, using \(T_s = 60 \, s\), the maximum length of trajectory for all considered individuals is 32. Hence, images computed using that threshold value have 32 pixels width.

As mentioned before, the y axis represents the numerical identifier of the fired sensor. Since in our dataset we consider 67 sensors (i.e., 51 PIR sensors and 16 door sensors), the y axis includes 67 different values. Consequently, the images have 67 pixels height, irrespectively from the value of the threshold \(T_s\). For thresholds \(T_s\) set to \(30 \, s\), \(60 \, s\), \(90 \, s\), \(120 \, s\), and \(150 \, s\), the image size is \((14 \times 67)\), \((32 \times 67)\), \((60 \times 67)\), \((110 \times 67)\) and \((169 \times 67)\), respectively. Figure 6 shows two samples of images obtained using different threshold values.

State-of-the-Art DNN (DCNN)

In order to compare our DNN architecture with a state-of-the-art one, we consider the Deep Convolutional NN (DCNN) used by Gochoo et al. in [37]. As illustrated in Fig. 7, that network has three zero padding convolution layers and three fully connected layers which are followed by max-pooling layers and feature filters size of \(5 \times 5\). The pooling window size is set to 22 and, since max-pooling creates smaller version of input maps, the output images become two times smaller than the input. The first convolution layer has 32 kernels, the second one, which receives the output of the first max-pooling layer as inputs, has 128 feature filters, and the last convolutional layer receives the second max pooling layer output and convolutes them with 256 feature filters.

Fig. 7figure 7

The state-of-the-art deep convolutional neural network (DCNN) used by Gochoo et al. in [37]

Finally, in the fully connected part, the layers are flattened, and the output of the third max-pooling layer is converted into a feature vector. In this part, the first, second, and third fully connected layers have 512, 128 and 64 neurons, respectively, and neurons of the last fully connected layer are connected to all three outputs; i.e., cognitively healthy, MCI, PwD. In order to find probability the distribution of the classes, the softmax function is applied. Since the reference paper [37] does not mention internal details such as the used optimizer and its rate, or the chosen loss function, we used the same parameters chosen for our proposed DNN configuration described in Section 5.

Experimental Setup

We developed all the algorithms in Python. The code is available on the WebFootnote 2. For experimenting the NFE technique, we used the machine learning algorithms implemented by the Weka toolkit [48]. The code for extracting the NFE features is available onlineFootnote 3. We have used the Python Keras neural network libraryFootnote 4 to develop the proposed DNN classification systems. In order to support scalability, in the general architecture of our system we envision the use of a cloud-based system for training the DNN model. However, given the relatively small size of the training set used in our experiments, we trained the DNN on a departmental server. We have run experiments on a Linux server with four NVIDIA Tesla p6 graphic boards, a single NVIDIA Pascal GP104 graphics processing unit (GPU), and 16 GB GDDR5 memory. To evaluate the effectiveness of our TRAJ and SPEED visual feature extraction techniques, we have experimented with different values of the \(T_s\) threshold for trajectory segmentation, ranging from 30 seconds up to 180 seconds. We also developed the feature extraction method and the DNN used by Gochoo et al. [37] to experimentally compare our methods with the state of the art.

In all the experiments, we applied a leave one person out cross-validation approach: we used the data of one individual for the test set, and the data of the other persons for training and validation, iterating on each person to execute the tests on the whole considered participants. With this approach, the data of the same person is never used both for training/validation and test at the same time. For tuning the hyper-parameters of the DNN, training trajectories are split by a fraction of 10% of each category for validation and 90% of them for training.

For the overall performance evaluation, we have used the metrics of macro-precision, macro-recall, and macro-\(F_1\) score. These metrics are standard ones for imbalanced problems. In particular, macro-\(F_1\) score is a reliable metric in imbalanced cases, since it gives equal weights to the different classes, despite their size. Since the ’accuracy’ performance measure is inappropriate for imbalanced classification problems like the one we are tackling, we do not consider that measure in our evaluation. Indeed, depending on the degree of imbalance, the majority class accuracy value would overcome the accuracy value of the minority classes.

Fig. 8figure 8

Trajectory images classification: macro-\(F_1\) score for the different techniques

Results of NFE Technique

At first, we evaluated the numeric feature extraction technique explained in Section 6.2.1. For each participant, we built the feature vector using all the trajectory data collected during the day. Each feature vector was labeled with the cognitive status of the individual; i.e., cognitively healthy, MCI, or PwD. We experimented several classifiers:

The well-known Naive Bayes [49] classifier;

Logistic regression classifier [50], relying on a multinomial logistic regression model with a ridge estimator;

Multilayer perceptron (MLP) feed-forward artificial neural network algorithm;

Support Vector Machines [51]

k Nearest Neighbours (kNN) [52] lazy classifier, with \(k=5\);

Ripper [53] propositional rule learner;

C4.5 [54] decision tree;

Random tree [55] classifier.

Random forest (RF) [56] classifier.

Results are reported in Table 1. Overall, the results achieved by the NFE technique are poor. Indeed, all classifiers achieved a macro-average \(F_1\) score close to the one of a random classifier. The classifier achieving the best performance in this pool of experiments is the Random forest algorithm, with \(F_1\) score of 0.37. These results seem to indicate that, in a home context, numerical statistics about locomotion-based indicators of cognitive decline are ineffective for automatic cognitive assessment. This fact is probably due to the high level of noise introduced by two factors that influence the movement patterns; i.e., obstacles in the home, and execution of daily living activities.

Table 1 Results of numeric feature extraction (NFE) methodResults of Single Trajectory Image Classification

At first, we experimentally compare the effectiveness of our TRAJ feature extraction method with the one achieved using the Gochoo’s et al. visual feature extraction method (named GVFE). For this experiment, we used the DCNN used by Gochoo’s et al., named DCNN. In this regard, we want to assess the ability of correctly recognizing the cognitive status of a person by the observation of a single trajectory walked by him/her.

According to the results shown in Table 2, for both techniques, the best results are achieved with the trajectories obtained setting \(T_s = 120 s\). As it can be observed in Fig. 8, the technique using TRAJ and DCNN slightly outperforms the one relying on GVFE and DCNN in terms of macro-\(F_1\) score.

Table 2 Trajectory images classification: results obtained using our TRAJ feature extraction method and DCNN vs GVFE and DCNN

In a second set of experiments, we compare the performance of the TRAJ vs SPEED feature extraction methods, using our MLP DNN. Table 3 shows the achieved results. The two feature extraction methods achieve comparable results. As it can be observed in Fig. 8, for both methods, the best results in terms of average macro-\(F_1\) score are obtained using \(T_s = 120 s\). It is evident that the results obtained using the TRAJ feature extraction method with our MLP DNN strongly improved with respect to using the DCNN. Indeed, using the MLP DNN we achieve a macro-\(F_1\) score larger than 0.57 with less computation time, while the best macro-\(F_1\) score obtained using the more complex DCNN was close to 0.36. We believe that this result may depend on the relatively limited size of the training set. Indeed, using a larger training set, the more complex DCNN could possibly outperform the MLP DNN, at the cost of additional time and resource consumption. Due to these results, in the rest of the experiments we use our MLP DNN for performing the classification tasks.

Table 3 Trajectory images classification: results of our TRAJ vs SPEED feature extraction methods, using our MLP DNNResults of Two Input Trajectory Images Classification

Since the results obtained using separately the TRAJ and SPEED techniques are encouraging, we perform additional experiments using both TRAJ and SPEED trajectory images as input to our proposed MLP DNN. Indeed, since TRAJ and SPEED images represent different features of trajectories, their combined use may increase recognition rates. This setup corresponds to the TraMiner architecture shown in Figs. 2 and 4.

The achieved results are presented in Table 4. Like in the previous experiments, the best results in terms of macro-\(F_1\) score are obtained using \(T_s = 120 s\). Also in this experiment, the worse results are obtained using threshold values of 30s and 180s. Considering these results, as also shown in Fig. 8, it is evident that the combined use of TRAJ and SPEED features significantly improves the recognition performance with respect to the use of the single feature extraction methods. Indeed, the best achieved \(F_1\) score is larger than 0.71, while the single feature extraction methods achieve \(F_1\) scores close to 0.57.

The detailed results can be inspected through the confusion matrices reported in Fig. 9. By observing the confusion matrices, it is evident that the total number of samples changes depending on the chosen value of \(T_s\). For instance, with \(T_s = 30 s\) we have 5, 195 trajectories, while with \(T_s = 180 s\), we have only 540 trajectories. Indeed, in general, the lower the threshold for trajectory segmentation, the larger the number of generated trajectories. Hence, by using larger values of \(T_s\) we obtain a smaller number of samples, but we can encode more information in the single trajectory images, since trajectories are generally longer. On the negative side, too large values of \(T_s\) may determine very involved images, that may confuse the DNN. On the contrary, too small values of \(T_s\) may determine very short trajectories, that do not encode enough information for the DNN. According to our experiments, the value \(T_s = 120 s\) provides a good trade-off in this sense.

Table 4 Trajectory images classification: results of our TRAJ+SPEED model and MLP DNN Fig. 9figure 9

Trajectory images classification: confusion matrices of our TRAJ+SPEED model and MLP DNN

However, the achieved results, which are computed on the classification of single trajectories in isolation, are not sufficient for providing a reliable hypothesis about the cognitive status of the individual. For this reason, in the following experiments we evaluate the performance of the module for long-term trajectory analysis, which considers the whole history of trajectories acquired during a certain period of time.

Results of Long-Term Trajectory Analysis

In these experiments, we apply the algorithm for long-term analysis described in Section 5.2 with all the different techniques for trajectory image classification evaluated in Sections 6.5 and 6.6. Figure 10 provides an overview of the achieved results.

As expected, the results achieved using DCNN with the TRAJ or GVFE feature extraction methods are rather poor. Indeed, those techniques achieve the lowest recognition rates for trajectory image classification. In particular, the TRAJ method with DCNN achieves an \(F_1\) score slightly larger than 0.4 with \(T_s = 120 s\). Its \(F_1\) score is lower with the other values of the threshold. The best result of the GVFE method with DCNN is similar, but obtained with \(T_s = 60 s\). Overall, these two techniques do not provide significant results for cognitive assessment, even in the long term.

Fig. 10figure 10

Long-term analysis: macro-\(F_1\) score for the different techniques

Table 5 Long-term analysis: results obtained using two input trajectory images classification (TRAJ + SPEED features) with our MLP DNN

Results are significantly better using our MLP DNN with the TRAJ or SPEED feature extraction methods. For both techniques, the best results are obtained using \(T_s = 120 s\). In particular, the TRAJ method achieves a macro-\(F_1\) score of 0.7, while the SPEED method achieves 0.65 macro-\(F_1\) score. With these techniques, the increase of recognition performance introduced by the long-term evaluation algorithm is evident. Indeed, both techniques obtain a lower macro-\(F_1\) score for trajectory image classification, which is close to 0.57. However, the results obtained with these techniques are still insufficient for providing reliable hypothesis of diagnosis about cognitive assessment.

Fig. 11figure 11

Long-term analysis: confusion matrices obtained using two input trajectory images classification (TRAJ + SPEED features) with our MLP DNN

The best results in this pool of experiments are achieved using two input trajectory images classification (TRAJ + SPEED features) with our MLP DNN. Indeed, the best results are achieved with \(T_s = 120 s\), with a macro-\(F_1\) score of 0.873. The detailed results are shown in Table 5. As it can be observed, the best results with \(T_s = 120 s\) are achieved for the class of cognitively healthy subjects (\(F_1\) score = 0.92), which is the most frequent one. The class of MCI subjects obtains \(F_1\) score = 0.87, while the class of PwD people achieves \(F_1\) score = 0.83.

By closely inspecting the results in the confusion matrices shown in Fig. 11, we can observe that, with \(T_s = 120 s\), 17 PwD subjects out of 19 are correctly recognized. Among the other ones, one subject is classified as a person with MCI, and one as a cognitively healthy person. Hence, the false negative rate of PwD is very low. Regarding false-positive predictions of dementia, we observe that three persons with MCI out of 54 are classified as PwD. Out of 80 cognitively healthy subjects, only two are classified as PwD. Hence, the false positive rate of PwD is also low. Regarding the 54 persons with MCI, 43 are correctly recognized, while 8 are classified as cognitively healthy, and 3 of them as PwD. We consider these results positive, since MCI is an intermediate state between cognitive health and dementia, which is difficult to diagnose, especially with automatic tools. Among the 80 cognitively healthy seniors, we achieved only 4 false positives. Indeed, two of them were classified as persons with MCI, and two of them as PwD.

Dashboard for Clinicians

In order to allow clinicians inspecting the predictions of our system, we have developed a user-friendly dashboard, using the Google Data Studio framework. The dashboard allows inspecting the predictions obtained through the various techniques experimented in our work, achieved using the threshold value \(T_s = 120 s\). The dashboard can be freely accessed on the WebFootnote 5.

Fig. 12figure 12

A screenshot of the TraMiner dashboard

A screenshot of the dashboard is illustrated in Fig. 12. The user can select the patient through a drop-down list. The actual diagnosis for the current patient is shown in the left-hand side of the dashboard. On the right-hand side, the user can inspect the predicted diagnosis based on the five different methods: ’GVFE and DCNN’, ’SPEED and MLP DNN’, ’TRAJ and DCNN’, ’TRAJ and MLP DNN’, and ’TRAJ+SPEED and MLP DNN’. We remind that the latter is the actual method implemented by TraMiner, while the other ones are shown only as a reference.

By selecting a patient, the lower part of the dashboard shows the history of all input trajectories. For each trajectory, the user can visualize the extracted visual features, according to the three experimented methods: ’TRAJ’, ’SPEED’, and ’GVFE’. Those images are available in a table, and can be opened in a separate window through a hyperlink. A sample of three images in the dashboard for one trajectory is shown in Fig. 13.

Fig. 13figure 13

A sample of images for one trajectory in the TraMiner dashboard. From left to right, the images are obtained using the TRAJ, SPEED, and GVFE feature extraction methods, respectively

Discussion and Limitations

Our experimental evaluation shows that the use of our combined visual features (TRAJ and SPEED), coupled with the use of the MLP DNN, outperforms a state-of-the-art method based on a different visual feature extraction technique and on a more complex DNN. The advantage of our solution consists in the encoding of additional features, such as speed, intersections, and low-level anomaly indicators, that are not captured by existing solutions relying on images. Indeed, those locomotion features are known in the literature to be reliable indicators of cognitive diseases.

In general, our system achieves better results with more frequent classes. Hence, we expect that recognition rates may improve using a training set composed of larger sets of individuals with MCI and PwD. The achieved macro-\(F_1\) score suggests that TraMiner may be a useful support for the clinicians to provide a clinical evaluation of the cognitive health status of the elderly. However, this hypothesis should be confirmed by a large trial with the support of clinicians and deployment of our system in real-world conditions.

The experiments show that the recognition performance of TraMiner strongly improves by considering the whole history of trajectories. In this respect, we recall that, despite the dataset having been acquired from more than 150 individuals, each person was monitored only for few hours in one day. Such short observation period may be insufficient to reliably predict the cognitive status of all individuals. Hence, we expect to achieve more accurate predictions by considering a longer history of observations. However, this intuition needs to be verified by additional experiments on a large trial.

A limitation of our vision-based method is the difficulty of manually analyzing the reasons that determined the actual hypothesis of diagnosis by TraMiner. Indeed, it is not straightforward for a human observer to distinguish the trajectory images produced by our feature extraction techniques for the different classes of seniors. In order to provide explainable AI capabilities, TraMiner should be complemented with other methods for cognitive assessment, possibly considering other behavioral models of cognitive decline based on overt [57] or subtle anomalies [58], which are easier to interpret by a domain expert. In order to recognize those behavioral anomalies, TraMiner should be extended with additional sensors and algorithms to recognize activities at a fine-grained level.

Given the nature of the dataset used in our experiments, all the patients’ data were acquired in the same smart-home context. Whether the learned model is portable to a different context is still an open research question. We believe that advanced transfer learning methods specifically designed for image classification may enable training data portability [59], but this aspect should be confirmed by additional experiments with other datasets. Even without the use of transfer learning methods to enhance data portability, our system could be applied to important domains. In particular, a residence for elderly people may consist of several similar apartments. The DNN model could be trained based on the trajectories walked by those inhabitants for which a cognitive status diagnosis is known. That model could be used for the cognitive assessment of the other inhabitants.

留言 (0)

沒有登入
gif