In recent days, Yoga is gaining more prominence and people all over the world have started to practice it. Performing Yoga with proper postures is beneficial. Hence, an instructor is required to monitor the correctness of Yoga postures. However, at times, it is difficult to have an instructor. This study aims to provide a system that will act as a personal Yoga instructor and practitioners can practice Yoga in their comfort zone. The device is interactive and provides audio guidance to perform different Yoga asanas. It makes the use of a camera to capture the picture of the person performing Yoga in a particular position. This captured pose is compared with the benchmark postures. A pretrained deep learning model is used for the classification of different Yoga postures using a standard dataset. Based on the comparison, the practitioner's posture will be corrected using a voice message to move the body parts in a certain direction. As the device performs all the operations in real-time, it has a quick response time of a few seconds. Currently, this work aids the practitioners in performing five Asanas, namely, Ardha Chandrasana/Half-moon pose, Tadasana/Mountain pose, Trikonasana/Triangular pose, Veerabhadrasana/Warrior pose, and Vrikshasana/Tree pose.
Keywords: Human posture recognition, Mediapipe, pose detection and pose correction, real-time, Yoga
How to cite this article:Yoga is a popular kind of workout practiced all over the world for its physical, mental, and spiritual benefits. Performing the wrong posture by pushing themselves beyond their flexibility limit and Practicing Yoga inaccurately without professional guidance could lead to pain and added muscular problems. Hence, performing Yoga postures accurately is an important aspect. While practicing Yoga a trainer monitoring, the correctness of the performance could be beneficial, but the challenges involved could be a shortage of time due to work pressure or having an instructor for personal classes could be expensive.
The COVID pandemic has created the awareness of the health benefits of practicing Yoga and has also made people apprehensive about taking assistance from Yoga practitioners. Computer vision techniques used for Yoga posture estimation and correction could be a promising solution but are seldom used in the domain of health and exercise due to limited literature.[1] In this study, an artificially intelligent system has been created which is able to identify the performed posture and also guide the user visually and through audio on the correctness of the posture in real-time with various steps of the yogic cycle.[2] OpenCV, an optimized open-sourced library for computer vision, machine learning, and image processing supporting multiple languages including python was used for the task of detection and identification.[3] A model was trained to estimate the closeness of the pose performed in real-time or in images and videos. The pose estimation technique involves a method of determining the key points or position of the joints to form the skeleton which is used to determine the angle between the points to give corrective feedback on the pose performed. A dataset of standard postures each for Ardha Chandrasana/Half-moon pose, Tadasana/Mountain pose, Trikonasana/Triangular pose, Veerabhadrasana/Warrior pose, and Vrikshasana/Tree pose was created and used for training and testing the model. The dataset has approximately consisted of about 6000 images of the above five postures out of which 75% of the dataset is used in training the model while 25% is used for testing. Further, human body modelling and human pose estimation are described below.
Human Body ModellingThe human body could be modeled for estimating the pose by determining the position of the joints in the human skeleton drawn over the image. Commonly, kinematic models are used which carry information about the body by its limbs and joints. Human body modeling could be done by different methods such as the skeleton-base model, where the key points represent the joints in the human body or by the planar model represented by multiple rectangular boxes showing the shape of a human body or by a volumetric model which represents 3D model of the human body with shapes and poses, as shown in [Figure 1].
However, skeleton-based model does not represent the texture or shape of the body. The challenges faced could be in identifying the joints during pose estimation which depends on the background effect, clothing, lighting, and the view angle.
Human Pose EstimationComputer vision is used to estimate the human pose by identifying human joints as key points in images or videos, for example, the left shoulder, right knee, elbows, wrist, etc.[4] Pose estimation tries to seek an exact pose in the space of all performed poses. It can be done by single pose or multi-pose estimation: A single object is estimated by the single pose estimation method, and multiple objects are estimated by multi-pose estimation.[5] Human posture assessment can be done by mathematical estimation called generative strategies, also pictorially named discriminative strategies.[6] Image processing techniques use AI-based models such as convolutional neural networks (CNNs) tailoring the CNN architecture for human pose inference.[7] An approach for pose estimation can be done either by bottom-up/top-down methods. In the bottom-up approach, body joints are first estimated and then grouped to form unique poses, whereas top-down methods first detect a boundary box, and only then estimate body joints.[8]
[TAG:2]Pose Estimation Methods [/TAG:2]
Pose estimation with deep learning
When it comes to object detection, deep learning systems outperform traditional computer vision techniques. As a result, pose estimation can be significantly improved by deep learning approaches.[9],[10] Epipolar Pose, Open Pose, Posenet, and Mediapipe are a few of the popular pose estimation techniques. From a 2D snapshot of a human stance, the Epipolar Pose creates a 3D structure. This architecture's primary benefit is that no ground truth data are needed.[11] First, a 2D picture of the human stance is taken, and then a 3D pose estimator is trained using an epipolar geometry.[12] The requirement for at least two cameras is its principal drawback. Another 2D method for position estimation is called Open Pose. Additional sources of input photos include webcams and CCTVs. The benefit of Open Pose is the simultaneous identification of important spots on the torso, face, and limbs.[13] PoseNet can estimate single or many poses from video inputs and is independent of picture size, so it provides accurate results whether the image is enlarged or contracted.[14],[15],[16] A trustworthy posture estimate technique called Mediapipe architecture identifies 33 important locations in a color image. It can estimate poses from videos, detect sign language and gestures, as well as physical workouts like Yoga, dance, and other fitness postures. It can also understand gestures used for control.[17]
Pose assessment techniques for gestures relating to fitness are difficult since they involve multiple postures with a lot of room for interpretation and depend on the garment being worn at the time. The literature contains claims that Mediapipe is quick, precise, and trustworthy. We concur with the research because we have already stated that Mediapipe is superior to several of the current techniques. The Mediapipe, however, is unable to identify the neck critical point. It has problems with lighting and background contrast and requires a lot of processing time.
In a different study, we investigated different pose estimation techniques, including Epipolar Pose, Open Pose, Posenet, and Mediapipe and found that Mediapipe provided the highest level of accuracy. In contrast, we exclusively discuss posture estimation and correction in this work using Mediapipes.
Pose Estimation with MediapipeFor Pose Estimation, the input image is fed to the Mediapipe library for extracting the key point for detection. A set of coordinates in the X, Y, and Z-axis for 33 major key points of the human body is generated. The major portion of the body from the input image is identified from the extracted coordinates and a skeleton is formed on the image. The key points extracted are indexed from 0 to 32 out of which the first 11 landmarks are used to detect the facial region, the next 11 landmarks determine the upper part of the body such as shoulders, elbows, wrists, hands, and an estimate of three fingers, namely little finger, index finger, and thumb on both hands, and finally, the 11 key points define the lower body consisting of the hips, knees, legs, and foot. All the key points together give a complete orientation of the body in 3D space. A skeleton of the human pose is drawn with the help of these points, which is then used to derive angles between these points thus enabling us to effectively correct the user's Yoga poses. As our work involves pose estimation of full-body, we have not extracted the facial features, instead, 14 key points other than facial features are extracted.
Literature ReviewPracticing Yoga without an instructor is the need of the hour, but improper practice could lead to injury. Several researchers have reported on Yoga posture estimation and correction methods to solve this problem. A work on providing concise feedback was reported for Natarajasana, Trikonasana, Vrikshasana, Virbhadrasana 1 and 2, and Utkatasana. They have achieved a classification accuracy of 95% for pose identification and have provided feedback to improve the posture performed. In this work, they have identified the difference in the location of each key point with respect to that of its neighboring key point and tested for any mistakes performed.[18] A system called Pose Tutor has been developed which is an AI-based explainable pose recognition and correction system which combines vision and pose skeleton models to predict the pose. An angle detection mechanism was used for pose predictions and to detect wrongly formed joints.[19] TAGteach, an acoustic guiding system involving auditory feedback such as the generation of sound when the desired behavior occurs has been implemented to correct posture in sports, dance, and, walking. However, no correction procedure has been discussed in their work.[20]Yoga Tutor system has been developed capable of capturing user motion through a Mobile camera and have sent it to the pose detection system implemented using Open pose method for the detection using time-distributed CNN, LSTM, and SoftMax regression to analyze and predict user pose.[21] A similar pose detection method has been implemented using only a PC camera.[22] A similar efficient and low-cost, Human Pose Estimation technique has been reported based on computer vision.[23]Yoga hand gesture identification for five postures using XGBoost with Random Search CV models has been presented and have achieved an accuracy of 99.2% accuracy,[24] Correction using cosine technique has been proposed using open pose architecture. All human joints have been extracted and connected using the greedy technique. The Euclidean angle between the body parts has been calculated and compared with the reference angles.[25] However, due to limited literature reports on using Mediapipe for full body posture estimation, in our work, we have explored all possibilities of using Mediapipe architecture for full body posture estimate and have performed correction by dividing the image into four quadrants and compared them using the rule-based algorithm.
MethodologyThe participants involved in creating the database of selected Yoga postures involved both men and women who are graduates and undergraduates from the medical backgrounds. The participants included in the database were either BNYS, naturopathy, BSc/MSc Yoga degree holders, and Yoga practitioners aged 18–35 years. The participants who were mentally and psychologically fit and who did not have a history of any serious diseases and who did not have any significate addition to alcohol or smoking were considered.
An online questionnaire related to practicing Yoga, the presence of any ailment, and if addicted to alcohol or smoking was taken along with a consent form of willingness for including their images in the creation of a database was taken. During the selection of participants, it was ensured that only those who regularly practiced any form of Yoga for over 2–3 years were considered. However, physically inactive and people with serious health issues were excluded.
Procedure of Data CollectionThe collection of huge data was done due to the unavailability of datasets related to chosen Yoga postures. For dataset creation, the participants were asked to perform Ardha Chandrasana/Half-moon pose, Tadasana/Mountain pose, Trikonasana/Triangular pose, Veerabhadrasana/Warrior pose II and Vrikshasana/Tree pose. As the data set had 6000 postures a majority of the images were taken from a high-definition Panasonic camera, web camera or mobile camera of participants in their convenient place and time. Yoga pose videos and images were captured at 4–5-m distance in front of the camera.
The Yoga pose dataset was created comprising of both males and females performing at different locations at their convent place and time. To make the data realistic and to train the model for real-life environments, the images and videos were captured in the living room, garden area, terrace, and in studios. Deliberately few images without proper illusion were also considered to enhance the ability of the model during training.
The method of the complete posture correction system is shown in [Figure 2]. Initially, the image of a Yoga practitioner performing an asana is captured and fed to the Mediapipe, which is a pretrained pose estimation model which detects human postures in images or videos by extracting the key points. A rule-based algorithm in which the input image is divided into four quadrants and the key points lying within the divided quadrants was compared with standard key points. Using the trained dataset, real-time pose estimation and correction are implemented. If it does not match any of the selected Yoga postures (asana) from the database, an error was shown.
Data analysis
A step-by-step description of the implementation of the proposed artificial intelligent system
Obtaining the dataset of 6000 images and dividing them into testing (20%) and training (80%) datasetsClassifying the images for 5 Yoga poses and labelling themPre-training a deep learning model using a Google teachable machine with 100 epochs, 32 batch sizes, and a learning rate of 0.001Mediapipe was used to extract key points and in this work the image is divided into four quadrants and the key points lying within the divided quadrants were compared with standard key points extracted from the reference imagesTest image is captured using a camera and then given as input to the pretrained model to detect all the key pointsThe key points detected from the pretrained model give a skeletal view of the poseIn the correction model, the slope formula and tan formula is used to find the angles between the key pointsThe angles from the key points of the test image are compared with the reference imageThe difference in the angle between the test and the reference image is used to correct the posture if the difference is positive the correction direction is downward and if negative it is upwardPose correction is performed by voice and text communication. This method of key extraction along with Google text-to-speech and speech-to-text was used for assistance. ResultsPose estimation for five Yoga postures has been done using Mediapipe for the five Yoga postures (asana) used. For simplicity, the images of the same individual are shown (after taking consent) for all estimations and comparisons. The five Yoga poses considered for posture estimation are
Ardha Chandrasana/Half-moon pose:Subject has a Yoga block handy at the front right-hand corner of the mat. Start in Warrior 2 with right foot at the front of the mat, front knee in line with your toes. Place left hand on the hip and reach out and then down with right arm, place fingertips in front of right toes. Step back foot a bit forward, and shift weight into right leg. As subject press right foot down, begin to extend the standing leg, as the left leg floats up in line with the hips. Place right hand on block directly under the shoulder, toward the little-toe side of right foot. To find stability in this pose, bring left leg slightly more forward rather than backward, as it will have the tendency to float in the space behind.
Tadasana/Mountain pose: Tada means a mountain. Sarna means upright, straight, unmoved. Sthitiis standing still, steadiness. Tadasana, therefore, implies a pose where one stands firm and erect as a mountain.In Tadäsana, the arms are stretched out over the head, but for the sake of convenience, subject placed them by the side of the thighs.
Trikonasana/Triangular pose:Stand straight. Separate your feet comfortably wide apart. Turn right foot out 90° and left foot in by 15°. The center of the right heel with the center of the arch of left foot. Ensure that the feet are pressing the ground and the weight of the body is equally balanced on both feet.
Inhale deeply and as exhale, bend body to the right, downward from the hips, keeping the waist straight, allowing your left hand to come up in the air while right hand comes down toward floor. Keep both arms in a straight line.
Veerabhadrasana/Warrior pose II:Stand in Tadäsana. Raise both arms above the head; stretch up and join the palms. Take a deep inhalation and with a jump spread the legs apart sideways 4–4 1/5 feet. Exhale, turn to the right. Simultaneously turn the right foot 90° to the right and the left foot slightly to the right. Flex the right knee till the right thigh is parallel to the floor and the right shin perpendicular to the floor, forming a right angle between the right thigh and the right calf. The bent knee should not extend beyond the ankle, but should be in line with the heel. Stretch out the left leg and tighten at the knee. The face, chest, and right knee should face the same way as the right foot, as illustrated. Throw the head up, stretch the spine from the coccyx and gaze at the joined palms.
Vrikshasana/Tree pose:In this posture, the subject bends the right leg at the knee and place the right heel at the root of the left thigh. Rest the foot on the left thigh, toes pointing downward. Balance on the left leg, join the palms and raise the arms straight over the head. The same was repeated with the right leg.
The results of the pose estimation using Mediapipe are given in [Figure 3].
Figure 3: Key point detection by Mediapipe for the postures mentioned from a-e DiscussionPose estimation of the five Yoga postures has been done for Mediapipe architecture and is shown in [Figure 3]. As described in the data analysis, sample images were captured in real-time and fed individually to the model and estimated the posture accuracy. The average value of accuracy is tabulated in [Table 1]. Here, the strategy utilized for calculating the exactness is the classification score which is the proportion of the number of redress forecasts (CP) made to the overall number of expectations (TP) (i.e. add up to a number of forecasts = the whole of CP and the number of off-base forecasts [WP]).
It may be observed from the table that the exactness of expectation using Mediapipe is around 85%. The estimation accuracy is better compared to Epipolar Pose, Open Pose, and Posenet which we have reported in another work.[26] Mediapipe is preferred as it does not need a platform to be deployed and it can also detect better in low light and employs low-light filtration. It has the best library for detecting key points of the whole body accurately using single camera. Further investigation would be required to extend this procedure for other progressed stances for posture estimation and redress utilizing the same technique which includes basic apparatuses with way better exactness to help people practicing Yoga stances as a self-evaluation as well as a bio-feedback technique.
Pose correction
Pose correction is done by first extracting the 14 coordinate values using Mediapipe as shown in [Figure 4]. The angle at each joint, for example, between shoulder, elbow, and wrist can be calculated. Let P, Q, and R represent the three joints (points) where point q is the common point. Let the lines PQ and QR intersect at Q, then the angle between PQ and QR can be calculated by using the basic slope formula.
Referring to [Figure 4] PQ and QR can be considered as two bones or skeletal structures of the human body, assuming the line PQ as the elbow and line QR as the hand, the angle made between the elbow and hand can be calculated. On further applying this analysis to all the other joints, we can calculate the angles made at each joint. Referring to one of the selected Yoga poses, i.e., as an example various angles after attaining final posture in the left and right direction for Warrior II pose (shown in [Figure 5]) is as shown in [Table 2].
During the data preparation stage, the angles made for each Yoga pose is calculated prior and stored in the database. This analysis is done to get all the angles for a particular Yoga pose. These angles are calculated for all the five Yoga poses prior to usage and are stored in the database for reference.
Giving feedback to the user of the performed pose is of utmost importance. This helps in guiding the user to correct posture if wrongly done and thus learning to practice the Yoga pose correctly. The feedback regarding the performance of the user is provided in real-time via the display or audio messages. When the user deviates beyond the threshold value the user is notified. Based on the varying levels of flexibility one can set the threshold as 10° or 20° in either direction. Users can observe the correction and make necessary adjustments to accurately practice the Yoga routine. The feedback is given as a visual alert on the screen and also an audio message through a speaker so that the user need not turn their head in order to read the message on the screen. Also based on the Yoga pose the user may be away from the screen and may not be able to correctly read the text shown on the screen. Thus a Bluetooth-connected speaker could be a good alternative to send the feedback message to the user regarding the posture.
Step by step description of the result from the viewpoint of the end-user:
The voice assistant greets you with a good morning/good afternoon/good evening message and displays the Yoga poses available for the user to performVoice input is obtained from the user regarding the pose her/she wishes to performOnce the input is obtained, a demo video of the selected asana is displayed to the userThe voice assistant also gives the steps to be followed to perform the poseIn addition, an image of the selected pose is displayed for user convenienceNow, the user has all the information about the asana he/she wants to performThe voice assistant now asks the user for the pose out of the available to be performedThe user has to input the timing required to perform the asana (a trained person may require less time compared to others). The time selected is displaced to the user using a down counterWhen the user performs the pose, a screenshot at the 0th s is captured and used for predictionThe key points on the user pose are highlighted and at the back end, it is used for comparison with the reference imageThe correctness of the pose is checked. If there are any variations with respect to the reference image, then the correction in the form of voice and text is conveyed to the user. For example, if your accuracy is 85%, raise your hand by 10° up/downBased on the correction the user will be able to perform the Yoga poses effectivelyHence pose correction is performed.Limitations and future suggestions
It is observed that the accuracy of prediction using Mediapipe is around 85%. As this work involves capturing images from only one camera the accuracy of a pose is less. Also, the accuracy of a few Yoga postures in the Mediapipe are also less because the Mediapipe does not detect the neck key point. The accuracy of each of these could be increased further with an increase in the training dataset.
Further research would be needed to expand this technique for other advanced postures for pose estimation and correction using the same methodology which involves simple tools with better accuracy to assist individuals practicing Yoga postures as a self-evaluation as well as a bio-feedback mechanism.
ConclusionThe advancements in the technology in the field of machine learning, artificial intelligence, and computer vision have made it possible to implement human pose estimation and correction which can be effectively used in the health and fitness sector. Yoga is popular and widely accepted all over the globe, an assistive system that can guide a person to perform Yoga on their own premises without the need for a trainer has been implemented. Pose estimation for fitness applications is challenging and involves creating a huge database of asanas. Also, the challenge is due to the variety of appearances or outfits used while creating the database. In this work, a complete pose correction system using voice assistance and display message is implemented. As further development, this work can be extended to areas such as gym, Zumba, aerobics, physiotherapy for particular health conditions and effectively treating a few chronic diseases through proper Yoga practice.
Financial support and sponsorship
Nil.
Conflicts of interest
There are no conflicts of interest.
References
Correspondence Address:
D Mohan Kishore
No. 19, Ekanth Bhavan, Gavipuram Circle, K.G. Nagar, Bengaluru - 560 019, Karnataka
India
Source of Support: None, Conflict of Interest: None
CheckDOI: 10.4103/ijoy.ijoy_137_22
留言 (0)