Application of machine learning to predict transport modes from GPS, accelerometer, and heart rate data

The diversity in the availability of methodologies (machine learning techniques vs. deep learning approaches), data sources (public vs. tailored for study), sensors and instruments, and approaches in pre-post-processing opens a multitude of avenues in transport mode detection. Our study used RF models to predict transport modes at the minute level. We examined the added impact of heart rate in the prediction, assessed the impact of splitting observations at the participant level rather than at the observation level during the estimation procedure, and we investigated the influence of bandwidth size during the post-processing moving average on the final prediction rate. Various recent studies have approached transport mode detection with different methods like classical machine learning techniques (RF, SVM) [40, 41], Convolutional Neural Network (CNN) [42], Long Short-Term Memory (LSTM) [43], Temporal Convolutional Network (TCN) [44], Multilayer Perceptron (MLP) [45]. Some have used multiple algorithms as an evaluation of their chosen method in their study [40,41,42, 44,45,46]. For example, Alotaibi, in their article, presents an ensemble method that utilized a combination of three different machine learning algorithms for classification or model learning [45]. They used two additional algorithms stacked with the ensemble fed into a neural network architecture named multilayer perceptron (MLP), in order to predict five different transport modes. While the use of this multilayer algorithm for model building and prediction outperformed the predictions of twelve other independent machine learning algorithms in their study, the use of smartphone motion sensors to create a 5-s window size for their dataset vastly differs from ours where we use minute level observations of GPS, accelerometer and heart rate.

Similar use of smartphone sensors at 4-s window size was presented by Mantellos et al. where they proposed a smartphone application to automatically predict the transport modes only using motion-based sensors and a rule-based algorithm known as PART [47]. Their study, focusing on the development of a smartphone application to create environmental awareness to enable people to follow a sustainable way of life, also gave significant attention to the comparison of different algorithms to select a method that was simple to use for the automatic recognition of transport modes using smartphones. The study also reported a difficulty similar to ours to differentiate between being in a car vs. in a bus. We attempted to overcome this problem by using heart rate data, which Mantellos et al. [47] report as a potential improvement for their future work by following a hierarchical classification approach like RF, and for classifying different motorized and non-motorized transportation modes. Some recent studies have also used shorter window sizes (ranging from 5 to 8.7 s) [43, 44, 46] while others used trips [40,41,42] for the transport mode detection in their study.

Most of the recent studies in our literature review have used accelerometer, gyroscope, and magnetometer data derived from smartphones [42,43,44,45,46,47], with one study using GPS only [40], and one using the GPS, accelerometer, and heart rate data from a smartphone and smartwatch as in our study [41]. Although smartphones make it easy to collect mobility data, sensor information may vary across different smartphone devices compared to bespoke sensor devices (like the ones used in our study) whose primary function is to collect sensor data. For reliable accelerometer measures, the smartphones would have to be well attached to a fixed place on the body (for example, on the hips) and basically not be used as a smartphone. Moreau et al. and Wang et al. used a real-world dataset for their study, which contains multi-modal data collected using a body-worn camera and multiple smartphones fixed at typical body locations [42, 44]. Using this dataset, Moreau et al. [42] predicted six transport modes with 98% precision using CNN, while Wang et al. [44] predicted eight transport modes with a precision of 87% using TCN. Deep learning methods require more pre-processing, have complex training processes, and are computationally expensive compared to RF. According to Hasan et al. RF was the most accurate in identifying different transport modes when comparing Extreme Gradient Boosting, RF, SVM, and ANN in their study, where they used GPS, accelerometer, and heart rate data at trip level [41]. They employ a similar web-based application for trip generation as we used to identify trips and trip stages in the GPS-based mobility survey. One difference though is that we further disaggregated the trips into minutes of observation, because we did not want to assume that the trips’ start and end times were a priori known. To the best of our knowledge, no recent study has attempted to predict the use of transport modes at the minute level while accounting for GPS, accelerometer, and heart rate with post-processing done via homogenization.

Main resultsParticipant-level vs. observation-level split of observations

In the model without heart rate, the naïve out-of-bag prediction rate for being in transport in the Training sets was 77%. As expected, the overall prediction rate for being in transport was lower (74%) in our Test sets. In the Training sets, RF splits minute-level observations from the same participants between the train and test samples. Thus, the naïve out-of-bag prediction rate in the Training set is upwardly biased because it is based on observations from the same participants that were used to train the model. The overall prediction rate from our Test sets provides more accurate information on the model’s performance as it is derived from a different participant than those used to grow the model, which is definitely what we aim for with our prediction efforts. Because the participant-level split between the Training set and Test set was repeated 126 times, the unbiased prediction rate derived from the Test sets takes into account information from all participants.

Brondeel and Chaix achieved a prediction rate of 90% using GPS, accelerometer, and GIS data in an RF model at the trip level [19]. However, because observations from the same participants were used to grow the model and validate it, we expect our reported prediction rates to be upwardly biased. Similarly, using a Bayesian Belief Model, Feng et al. obtained a prediction rate of approximately 90% at the trip level using GPS, accelerometer, and survey data [9]. It is indicated without further details that 65% of observations were assigned to the calibration set while 35% were used as a validation set. If the same participants contributed observations to both the calibration and validation sets, the reported prediction rate should be seen as overestimating the correct prediction rate when applied to a new set of participants.

Ellis et al. used a combination of strategies to predict a mix of body posture and transport mode with their RF model based on GPS and accelerometer data [17]. Some of their strategies were unbiased, such as when models were grown and tested in different samples and when a “leave-one-day-out cross-validation” was used (each time, data from one day was used for testing). Shafique et al. used a mix of data (GPS, accelerometer, personal attributes, and Google Maps information) to reach an impressive 99.6% correct prediction rate using Random Forest with Stepwise Feature Inclusion [20]. Their prediction accuracies ranged from 95.44% to 99.84% for four transport modes (walk, bicycle, car, and train). The authors adopted a more complicated moving average filter employed during the pre-processing to account for the variability in the accelerometer data. The near-perfect accuracy in their study can possibly be attributed to the fact that the train-test split was done at the outcome level (similar to the observation level), where 70% of the data from each transport mode was randomly included in the training set and the rest in the test set, with the same participants providing data to both sets.

Other studies by Gong et al. [48] and Chen et al. [49] have yielded prediction rates of 82.6% and 79.1% at the trip stage level, respectively. Because they used GIS rule-based algorithms, the methodological issue related to RF that we raise does not apply to their work.

Heart rate in addition to GPS and accelerometer data

Previous studies have used a combination of GPS, accelerometer, and heart rate data for the prediction of other outcomes such as energy expenditure and physical activity [50,51,52,53], but to the best of our knowledge, only very rarely for the prediction of transport modes e.g., for the prediction of cycling or for identification of the most significant variables in transport modes prediction [41]. Our a priori hypothesis that heart rate data would substantially contribute to a better distinction between motorized transport modes (especially between driving a car and using public transport) was not confirmed overall. Probably heart rate provided information strongly correlated to the one provided by accelerometers. Given that accelerometer measurement is easier to implement than heart rate measurement, we did not examine whether accelerometer variables added to prediction accuracy based on GPS and heart rate data. It should be emphasized, however, that heart rate seemed to improve the prediction for biking which is easily understandable as waist-worn accelerometers are unable to adequately capture biking physical activity.

Predictive contribution of variables

Predictors that were tested included variables from the accelerometer (55 variables), the GPS receiver (51 variables), and the heart rate monitor (12 variables). The variable importance plots (for models including heart rate data) give a glimpse of which variables contributed most in terms of the overall accuracy of the prediction and related performance of the model (mean decrease in accuracy) as illustrated in Fig. 3, and in terms of purity of the final subgroups in the tree through splits with this variable (mean decrease in Gini) as depicted in Fig. 4.

Fig. 3figure 3

Variable importance plot: Mean decrease in accuracy

Fig. 4figure 4

Variable importance plot: Mean decrease in Gini

Half of the predictors out of the top 30 predictive variables contributing to the model’s accuracy belonged to GPS, followed by 13 accelerometer variables, one heart rate, and one time-related variable. It should be noted that none of the accelerometer variables made it to the top 10 contributing predictors. The standard deviation of speed (from GPS data) contributed most in terms of the predictive accuracy of the models (mean decrease in accuracy). The information on weekday vs. weekend was ranked second, followed by maximum speed, heart rate, and then by other speed and GPS indicators. The mean number of satellites used was ranked in the ninth position. While the importance of weekday/weekend (ranked second) was surprising, the importance of speed was expected since speed, and also the variance in speed, differ between transport modes, even between motorized transport modes (for example, public transport vehicles may have a more constant speed and more regular stops, compared to private motorized vehicles). It is interesting to see how heart rate contributed to the predictive accuracy of the entire model even if it was not found to improve the prediction rate in addition to other variables, which is likely due to substitutions among variables. Variables related to the number of satellites in view are likely relevant for predicting public transport, which is often underground in the Paris area. It was assumed that accelerometer data measuring body acceleration would be important, but the model reported otherwise.

In contrast, Ellis et al. and Brondeel et al. found that accelerometer variables contributed more in terms of predictive power than the GPS data [18, 19]. This is difficult to explain, given that the latter study was based on data collected by our team with comparable methods for another sample over similar territory.

A posteriori homogenization of predictions

Our a posteriori homogenization systematically improved the reported prediction rates as we expected. Ellis et al. reported significant improvements in their final prediction rates after using the moving average filter [18]. Similarly, Prelipcean et al. observed higher accuracy growth using their “Explicit-consensus methods” compared to other performance metrics used in their study [54]. This method also used a voting principle on each point of the trip segment but instead took into account the whole trip segment than a window of points before and after the point in question, which strategy is difficult to apply when the trip start and end is not known beforehand as in our case.

In our study, using the moving average filter, the successful prediction rate of transport modes (overall) was increased by 6 percentage points. In comparison, the prediction of public transport use was improved by 4 percentage points and that of using a private motorized vehicle by 10 percentage points. Thus, our study demonstrates that the posteriori homogenization reasonably improved the final prediction rates, although not substantially, and that it is a useful step in a prediction process.

There was evidence that excessively large homogenization windows tended to obscure the prediction of walking episodes, which are typically shorter than those with other modes. Thus, our work suggests that investigators need to pay close attention to the size of the homogenization window and that a unique window size may not identically apply to all transport modes.

Overall and mode-specific prediction rates

Although the overall prediction rate (in our Test sets, after applying a posteriori homogenization) seemed high (90%), it was greatly influenced by the extended stays at places visited that are relatively easy to predict (91%). We addressed this issue in two ways, first by deriving an overall prediction rate for transport modes (excluding stays at visited places), which was 80% at its peak, and second by calculating mode-specific prediction rates. For the latter approach, we applied class-wise weights based on the observed proportion of the modes among all trips, which prevents rare modes from having their prediction rate penalized due to their low prevalence [19, 37]. Our final prediction rate of transport modes (80%) suggests that there is room for improvement for our model of prediction of being in trips rather than a visited place and of transport modes. Biking achieved a higher prediction rate than the other modes. Particular efforts are needed in the future for predicting public transport use, for example, taking into account the location of public transport stations to aid the prediction.

Strengths and limitations

The main strength of this study is that it relies on a large sample of accurately identified trips using GPS tracking and a GPS-based mobility survey. Our sample includes a large number of trip configurations from participants in free-living conditions, along with a large number of different types of personal motorized vehicles and public transport vehicles. Thus, it is logical to expect a lower rate of correct prediction in our study than in others with less variability in trip conditions. Still, the relatively low final prediction rate is the major limitation of this work, which will have to be improved in the future. Moreover, the region of Paris has a specific transport system with a densely connected public transport network with highly walkable areas. Our particular prediction model is likely not generalizable to contexts with different transport and urban systems; however, our data collection and data processing methodology is the second strength is that our study is one of the first to include heart rate data for the prediction of transport mode. However, the contribution of heart rate data to improve the prediction of private and public transport modes, as hypothesized, was not verified in our analyses, and the inclusion of heart rate data only very slightly improved the prediction of biking. A third related strength is the large set of potentially relevant predictors generated for our modelling from GPS, accelerometer, and heart rate data.

The fourth strength is that we developed an algorithm for minute-level prediction, which assesses both being at a visited place and being on trips. Thus, contrary to our previous work [19], the present algorithm is a standalone algorithm that does not require a pre-identification of trips. In the present work, we tested an alternative two-step approach where we first used an algorithm based on GPS speed to identify trip stages, and in a second step, attempted to predict transport mode at the trip stage level and then compared the predictions to the ground-truth from our GPS-based mobility survey. However, this approach yielded abysmal prediction rates due to the combination of uncertainty in the first step (identifying trips) and the second step (identifying modes). The fifth strength of the work is related to the methodological developments implemented, including improving over the straightforward split of observations in RF ignoring the nesting of observations within individuals and investigating the impact of the window size for the a posteriori homogenization of predictions.

Limitations to overcome in the future include the restricted age range of our sample population. Future considerations should be given to expanding the selection of participants to include a large age range (preferably 18 years and above). Also, the inclusion of participants’ sociodemographic characteristics should be considered to determine whether personal information can contribute to prediction accuracy. Finally, the application of our method to mobility data from various and differing urban settings would provide the level of variability our algorithm needs to improve the generalizability of our findings.

Conclusions

Our study shows that it is feasible to use sensor-based prediction models of transport modes. Our work suggests that GPS and accelerometer data provide relevant information for the prediction and that heart rate adds minor information only for specific transport mode. Finally, our work demonstrates that a two-phase approach, including RF prediction and a posteriori homogenization, improves over RF prediction only.

留言 (0)

沒有登入
gif