Fusion of Video and Inertial Sensing Data via Dynamic Optimization of a Biomechanical Model

Accessible motion tracking could transform rehabilitation research and therapy. The traditional marker-based approach is limited to specialized laboratories equipped with expensive optical motion tracking systems that require infrared cameras and trained personnel. Inertial sensing and computer vision approaches applied to standard red-green-blue (RGB) videos offer greater flexibility, given their low cost and portability, but collective understanding of the strengths and weaknesses of kinematics estimation algorithms associated with each technology is still evolving (Table 1). Additionally, efforts to merge the strengths of these complementary technologies are sparse.

Vision-based methods using RGB cameras are successful in camera-dense environments, but occlusion continues to pose challenges in reduced-camera settings (Joo et al., 2019). Although now widely used in robotics applications, adoption of vision-based methods to human movement sciences lags behind due to accuracy limitations (Seethapathi et al., 2019). Computer vison models are data-driven and typically not constrained to satisfy physiological constraints. Biomechanical modeling has been considered as a possible approach for improving the accuracy of computer vision approaches and making them more accessible to the biomechanics community (Kanko et al., 2021, Strutzenberger et al., 2021, Uhlrich et al., 2022). Although comparisons with marker-based data suggest that the accuracy of these methods ranges widely between 3° – 20°, depending on the degree-of-freedom, no study to date has systematically discerned how this accuracy compares to alternative approaches and to what degree the incorporation of biomechanical models improves results.

Similarly, converting multimodal time series data from inertial measurement units (IMU) into accurate joint kinematics remains challenging due to the many possible sources of uncertainty, including bias noise, thermo-mechanical white noise, flicker noise, temperature effects, calibration errors, and soft-tissue artifacts (Park and Gao, 2008, Picerno, 2017). Traditional sensor fusion filters used to mitigate drift (Madgwick, 2010, Mahony et al., 2008, Sabatini, 2011) typically rely on magnetometers, which are susceptible to ferromagnetic interferences (de Vries et al., 2009). The results of sensor-fusion filters have been refined with biomechanical models (Al Borno et al., 2022), but whether findings will translate to natural environments remains uncertain because marker-based motion capture has been used for sensor-to-body calibration and the effect of soft-tissue motion has been partly eliminated by attaching IMUs to solid marker cluster plates, helping the IMUs move rigidly with the marker clusters. Deep learning has been proposed as an alternative (Mundt et al., 2020, Rapp et al., 2021) but has been limited by datasets that are not representative of all activities and clinical populations. Constrained optimization via biomechanical modeling, both static and dynamic, has also been used for estimation of both kinematics and kinetics. Static optimization approaches rely on zero-velocity detection algorithms from joint constraints, external contacts, and additional sensors (GPS, RF-based local positioning sensors, barometers, etc.) to correct the position of the model at each step (Karatsidis et al., 2019, Roetenberg et al., 2013), while dynamic optimization approaches currently require that the motion be periodic (Dorschky et al., 2019), both of which limit ease of implementation and generalizability.

IMU and vision data have complementary strengths that can be leveraged to overcome their individual limitations, but it is unclear if fusion via a dynamically constrained biomechanical model would improve estimation of kinematics over unconstrained optimization (Halilaj et al., 2021). Inertial sensing can compensate for occlusions in videos, videos can compensate for drift in inertial data, and biomechanical models can add physiological plausibility and dynamical robustness. Here we fuse video and IMU data via dynamic optimization of a nine degree-of-freedom (DOF) model (Fig. 1) and investigate the circumstances under which this approach outperforms (1) standard computer vision techniques using video data, (2) dynamic optimization of a biomechanical model using IMU data, and (3) fusion of IMU and video data via unconstrained optimization (i.e., without a biomechanical model). In addition to comparing these methods using experimental data, we quantified their sensitivity to IMU and video data noise by scaling each subject’s unique noise backgrounds. We hypothesized that fusion of video and IMU data with biomechanically constrained optimization would improve estimation of kinematics over the alternatives under all the noise profiles. We have shared a MATLAB library to encourage testing of these techniques with additional data and the exploration of new scientific questions.

留言 (0)

沒有登入
gif