Learning-based control approaches for service robots on cloth manipulation and dressing assistance: a comprehensive review

Application overview

The interest toward service robots which are involved in dressing task has grown and we decided to collect the papers concerning dressing dividing them according to the task the robot is doing. Particularly, out of the fully selected papers, 8 papers (20.51%) were published before 2015 and 31 papers (79.49%) were published within the past five years (Table 1).

In Fig. 2, the robot assisted dressing process is described. The first step is the cloth detection, followed by cloth classification and manipulation planning. The human position is then tracked to avoid the robot hurting the patient during the dressing task. Finally, different tasks accomplished by the robot are described in green, while the ones that should be investigated in the future are crossed in red. The tasks accomplished in several.

Fig. 2figure 2

Robot assisted dressing process

Fig. 3figure 3

a Robot folding cloths using a SL strategy [23]

papers by robots are putting a t-shirt or a jacket on the arm of the user, putting a t-shirt on the head of a person, putting a shoe on the feet of the user, or dressing trousers. The tasks that should be accomplished in the future are, for example, folding a complex shape or ironing a garment.

Cloth folding/untangling/coveraging

In this section, the papers concerning cloth folding, untangling or coveraging are evaluated and divided into subsections according to the control approach applied.

Supervised learning

In SL, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the supervisory signal). An SL algorithm analyses the training data and produces an inferred function, which can be used for mapping new examples [17]. Bersch et al. [24] used a DL approach with a PR2 robot for cloth manipulation and specifically for laundry folding. The quality of each grasp pose was evaluated using a function that calculates a score based on a set of geometric features and the score function was automatically trained using an SVM. Other strategies belonging to SL were the ones developed by Lui al. [25] that used a learning algorithm based on max-margin learning to manipulate deformable objects such as ropes with a PR2. Starting with a point-cloud obtained from an RGB-D camera, the authors designed appropriate features that their learning algorithm uses to first infer the rope’s knot structure and then chooses an appropriate manipulation action to untangle the knots in the ropes. Concerning humanoid robots, Yang et al. [23] also used DL to let a humanoid robot achieve folding task skills. The proposed approach was a real-time user interface with a monitor and provided a first-person perspective using a head-mounted display. Through this interface, teleoperation was used for collecting task operating data, especially for tasks that are difficult to be applied with a conventional method. A DL model was also utilized in the proposed approach. A deep convolutional autoencoder extracted image features and reconstructed images, and a fully connected deep time delay neural network learnt the dynamics of a robot task process from the extracted image features and motion angle signals (Fig. 3).

Tanaka et al. [14] used, a particular NN called Encode-Manipulate-Decode Network (EMD Net) for cloth folding. This EMD Net is essentially a 3D convolutional auto-encoder (providing the encoder and decoder modules), with a fully connected network (the manipulation module) inserted at the bottleneck layer.

Furthermore, in Hu et al. [26], limitation of movements of the user (modelled with the Gaussian Process Latent Variable Model) were studied and related to the online update of the dressing trajectory. The authors validated their idea by letting the robot fold towels.

A different approach was used by Jia et al. [27]. The authors used a random forest approach. They used imitation data consisting of visual features and control signals to learn a random forest controller that maps the observed visual features from an RGB-D camera to optimal control signals of a robotic manipulator to manipulate clothes. The controller parameters are learnt in two steps: online dataset sampling and controller optimization. The dataset is generated from an expert (a ground-truth hard coded control algorithm in their case but can also be a human) performing the manipulation task and RGB-D images from a camera collected which are then transformed into a low dimensional feature space by computing HOW features [27]. The random forest imitation learning controller parameters are learnt in an online fashion where a set of cloth simulation trajectories are first generated. During each time step of these trajectories, they query their expert for an optimal control action. This action is combined with the action proposed by the random forest controller and fed into the simulator to generate a new observation. This process is repeated until the imitation learning has converged to an optimal solution. The authors validated their approach by folding towels.

Finally, Corona et al. [28] used a hierarchy of three CNNs with different levels of specialization to grasp a garment and fold it using a Wam Robot. First, one robot arm grasps a garment from any point and shows it to an RGB-D camera and the cloth is recognized using the first CNN. Then, the visibility and locations of two reference grasping points are identified using the second CNN. Next, they located the second point of grasping with a third, more specialized, CNN.

Learning from demonstration

Learning from demonstration is conceived to provide and transfer assistive skills from non-expert users to the robot. It can be achieved using a kinaesthetic teaching or motion capture system, that demonstrations of the task executed in several situations can be used to adapt new situations rapidly. For these reasons, LfD is widely used for robotic manipulation tasks such as assistive dressing, towels and ropes folding.

Sannapaneni et al. [29] proposed an algorithm that folds cloth using amrita dual anthropomorphic manipulator (ADAM). Cloth coordinates (composed of four points) are extracted using depth images and are used to classify the cloth shape as a trouser and t-shirt. The main algorithm is to use marker coordinates along with cloth dimension and type stored. The marker is composed of four picks and place points, and then it is applied for folding the cloth by simple geometry calculation. It is implemented for cloth folding, but the limitation of this proposed LfD algorithm can only be used for a specific shape of robotic clothing assistance. To develop more complex assistive dressing algorithms, well-known LfD algorithms, which represent Dynamic Movement Primitives (DMP) and hidden Markov model (HMM) with a combination of the traditional methods, are applied.

Reinforcement learning

Balaguer et al. in [30] and in [31] were one of the first groups of researchers to formulate the cloth assistance problems as an RL problem. They combined imitation and RL to learn a control policy for two independent manipulators, working collaboratively, to achieve a towel folding task. Imitation training data was acquired by motion capture system detecting tracking the reflective markers placed on the towel and a human performing momentum folds—the kind of fold where the force applied to grasping points on the towel is used to give momentum to the towel and lay half of it flat on the table. Rewards were computed as the exponential function of the negative smallest error between an observation and training samples. This error was calculated by the Iterative Closest Point ICP (algorithm). PoWER [32], which was a state-of-art algorithm RL algorithm, was used to learn a policy based on the human samples.

Yaqiang et al. [33] also targeted to accomplish a t-shirt folding task while it is surmounted on the chest of a human being. They teach the general motion expected of a dual-arm robot by a teaching approach. A human demonstration of the expected folding behaviour is captured by a 3D range image capture system. Coloured markers placed on the shirt help recognize the state and shape of the cloth. Since the final state of the cloth is explicitly defined by the marker positions, the problem is reduced to a search problem and so the PILCO algorithm [34] is used for policy search.

Finally, Wu et al. [35] proposed a conditional learning approach for learning to fold deformable objects, improving sample complexity.

Putting a cloth on user’s arm

In this part, the papers concerning wearing a cloth on the user's arm are evaluated and divided into subsections according to the control strategy applied.

Supervised learning

Zhang et al. [36] proposed the offline learning of a cloth dynamics model by incorporating reliable motion capture data and applied this model for the online tracking of human-cloth relationship using a depth sensor. The authors tested the approach using a robot that wears a cloth on the user's arm. Furthermore, Chance et al. [37] used the Support Vector Machine (SVM) to dress a jacket onto a mannequin or human participants, considering several combinations of user pose and clothing type. In detail, their SVM method involved searching for an optimal hyperplane that separates the data by class and is optimized by finding the largest margin at the boundaries.

Moreover, Stria et al. [38] used the SVM for the classification of garment categories and focuses particularly on putting a shirt on a user's arm.

Erickson et al. [39], used a fully connected NN that estimated the local pose of a human limb in real time. A key benefit of this sensing method is that it can sense the limb through opaque materials, including fabrics and wet cloth creating a robot that can assist a person during dressing and bathing tasks. The authors tested their approach by putting a hospital gown on the user's arm.

Finally, Gao et al. [40] used a random forest approach. They presented an end-to-end approach to build user specific models for home-environment humanoid robots to provide personalised dressing assistance (a robot puts a cloth on the user's arm). By mounting a depth camera on top of the head of a Baxter humanoid robot, they recognised the upper body pose of users from a single depth image using randomised decision forests. From sequences of upper-body movements, the movement space of each upper-body joint is modelled as a mixture of Gaussian learned by an expectation–maximization (EM) algorithm. The experimental results showed that their method of modelling upper-body joint movement of users, combined with real-time human upper body pose recognition enables a humanoid robot to provide personalised dressing assistance and has potential use in rehabilitation robotics and long-term human–robot interactions.

Learning from demonstration

Pignat et al. [41] proposed a different approach that encodes a joint distribution in a hidden semi-Markov model (HSMM) for adaptive dressing assistance. The parameters of this model, which represents the sequence of complex behaviours, were learned from human demonstration data using an EM algorithm. This method provided a solution for movement primitives (MPs), which are usually encoding only motor commands. Also, it increased the performance of robot behaviour that could be controlled both time-dependently and independently. Also, another HMM [42] method was used to classify the time series of forces robot-assisted dressing [43]. To classify the force, HMM was used for pattern recognition of the forces. Mainly raw forces were measured and the movement of the end-effector in the x and z direction was provided as the dataset from 12 human participants. The performance of the HMMs was validated using univariate and bivariate models with force in the x-direction. The limitation of these methods (DMP and HMM) was that the workspace where the robot can move for assisting dressing was inadequate compared to human body movement. In addition, demonstrations for a specific task like the one-to-one relationship had the restriction that motor commands were always linked to a unique perception distribution. To overcome this problem, the combination of each demonstration and point cloud scene was developed for folding towel manipulation. First, the method recorded the demonstrated pose and force trajectories. During the demonstration, the authors found that five demonstrations were sufficient for achieving generalization. Also, the point cloud of the scene was retrieved using a Kinect depth sensor at the beginning of each demonstration. The thin-plate spline robust point matching (TPS-RPM) algorithm [56] was used to match from each of the demonstrations to the current point cloud scene. After the demonstration, a mean trajectory and a sequence of time-varying feedback gains were extracted, and the gains were learned using a joint Gaussian distribution. This method is beneficial for dressing from small demonstrations, and point cloud scene well recognizes the new situation, but it still needs to obtain optimal gains to optimize the task.

Kapusta et al. [44] provided evidence for the value of data-driven haptic perception for robot-assisted dressing through a carefully controlled experiment. To design an informative and replicable experiment, they deliberately focused on a representative sub-task of dressing with a commonly used article of clothing, and they tested their approach by wearing a hospital gown on the user's arm.

Reinforcement learning

Clegg et al. [45] approached the dressing problem differently by viewing the long horizon task of dressing as a sequence of smaller sub tasks. They have argued that learning to dress is challenging because humans rely heavily on haptic information and the task itself is a prolonged sequence of motions which are very costly to learn together especially in the right order. They have thus proposed to learn a unique policy for each subtask and have introduced a policy sequencing algorithm that matches the output state distribution of one subtask to the input state distribution of the other subtask while the transitions between the different subtasks are managed by a state machine. To deal with a high dimensional observation space typically associated with dressing tasks, they defined their observation space as a 163-dimensional vector which includes information on the human’s joint angles, garment feature (e.g. a sleeve opening) locations, haptics (contact forces between human and cloth), surface information (information on the inner and outer surfaces of the garment) and a task vector. The reward function is then defined as the weighted sum of the progress reward (extent to which a limb is dressed), deformation penalty (penalization of undesired cloth deformation), geodesic reward, reward for moving the end effector in the direction of the task vector and another reward that attracts the character to a target position. With these definitions of the reward and state which are queried from a dressing simulation, Trust Region Policy Optimization algorithm (TRPO) [46] was used to update the policy parameters represented by a neural network. They validated their approach by putting a hospital gown on a virtual user. The same authors presented a DRL based approach for modelling collaborative strategies for robot-assisted dressing tasks in simulation. Their approach applied co-optimization to enable distinct robot and human policies to explore the space of joint solutions to maximize a shared reward. In addition, they presented a strategy for modelling impairments in human capability. They demonstrated that their approach enables a robot, unaware of the exact capability of the human, to assist with dressing tasks.

Other methods

Chance et al. [47] created strategies for an assistive robot to support dressing using a compliant robotic arm on a mannequin. A tracking system is used to find the arm position of the mannequin and it supports trajectory planning using waypoints. Torque feedback and sensor tag data provide failure detection. Also, speech commands are allowed for correction of detected dressing errors successfully. The authors tested on ten different poses of the mannequin with the proposed method, and it showed that assistive dressing tasks could be developed without complex learning algorithms. Further, the method investigated has the advantage of using a small number of low-cost sensors which can be used to sense unplanned movement in smooth trajectories. The problem of this strategy was to not have force feedback from the mannequin that is important to know (people could be hurt by the robot). They validated their approach by putting a t-shirt and a jumper on the user's arm.

Erickson et al. [48] showed how task-specific LSTMs can estimate force magnitudes along a human limb for two simulated dressing tasks. At each time step their LSTM networks took a 9-dimensional input vector consisting of the force and torque applied to the end effector by the garment and the velocity of the end effector. The networks then output a force map at each time step consisting of hundreds of inferred force magnitudes across the person’s body. Their work was tested on a simulated robot that puts a shirt on a virtual user’s arm.

Putting a cloth on user’s head

In this section, the papers concerning wearing a cloth on the user's head are evaluated and divided into subsections according to the control strategy applied.

Supervised learning

Koganti et al. [49] proposed a data-efficient representation to encode task-specific motor-skills of the robot using Bayesian non-parametric latent variable models to learn a dynamics model of the human-cloth relationship and use this model as a prior for robust tracking in real-time. They reduced their policy search space by first learning a low dimensional latent space using the BGPLVM [44]. A dataset of successful clothing assistance trajectories was then used to train a latent space that encodes the motor skills. Each of the trajectories were then transformed into a sequence of points in the latent space forming latent space trajectories followed by searching for policy using the PoWER algorithm [32]. The authors validated their idea by wearing a t-shirt on a person. The same authors learnt the underlying cloth dynamics using the shared Gaussian Process Latent Variable Model and by incorporating accurate state information obtained from the motion capture system into the dynamics model. Shared GP-LVM provides a probabilistic framework to infer the accurate cloth state from the noisy depth sensor readings. The experimental results showed that shared GP-LVM was able to learn reliable motion models of the T-shirt state for robotic clothing assistance tasks. They also demonstrated three key factors that contribute to the performance of the trained dynamics model. The advantage of using GP-LVM is that a corresponding latent space manifold can be learned for any representation used in the observation spaces.

Saxena et al. [50] also used SL for grasping point detection and for garment recognition; the challenge of their work was to use the Kinect camera near the garment to try the algorithm with an occluded vision of the object. The authors tested their approach by putting a t-shirt on a person.

Learning from demonstration

Joshi et al. [51] (Fig. 4) presented a framework for robotic clothing assistance by DMP on a Baxter robot. The authors divided the dressing task into three phases (reaching, arm dressing, and body dressing) and each phase was applied for different skills. The reaching phase was to move the robot arm in a specific location without collision, thus it can be achieved through point-to-point trajectory while the arm dressing phase was to reach the ends at the elbow position. To make the robotic arm reaching the position DMP, which can be applied for a global trajectory modification, was used. DMP parameters can be acquired from the kinaesthetic demonstration, and support generating a trajectory globally using the start and goal parameters of DMP. Compared to reaching the elbow position, generating a trajectory to the torso position is more complicated, thus the authors introduced the Bayesian Gaussian Process Latent Variable Model (BGPLVM) as the body dressing phase. They applied BGPLVM to encode complicated motor-skills to generalize trajectory in latent space and modify the trajectory locally. The authors validated their idea using a manipulator that puts a t-shirt on a person.

Fig. 4figure 4

LfD example where a Baxter robot is dressing a man with a T-shirt [51]

Reinforcement learning

Koganti et al. [52] used a depth sensor to extract and filter a point cloud of the t-shirt collar and sleeve which in turn were detected by a colour extraction method. Once retrieved, both the point clouds were approximated with an ellipse followed by computing the same topological relationship but this time, in real-time. They also modified the reward function to now calculate the Mahalanobois distance between the current and the target states to account for the different scales of different state variables. The authors tested their model using a robot that puts a t-shirt on the head of the person. Twardon et al. [53], instead, made a dual-arm robot, attached with anthropomorphic hands, and learned to put a knit cap on a styrofoam head. They modelled the head as an ellipsoid using point cloud data and constructed a head-centric policy space where the policy search takes place. The policy was then defined in this space as the parameterized end-effector trajectories (parameterized as B-splines) from the back of the head (back pole) to its front (front pole). They then defined an objective function which gives the robot a fixed reward for successful task completion while supporting the robot to find a trade-off between minimizing the risk of early failure and establishing contact between the fabric and the head. All this setting allowed the authors to use a gradient-free direct policy search approach to find the optimal policy by minimizing the objective function Active-CMA-ES algorithm [80].

Furthermore, Tamei et al. [54] presented a novel learning system for an anthropomorphic dual-arm robot to perform the clothing assistance task. The keys of their system were to apply a reinforcement learning method for coping with the posture variation of the assisted person, and to define a low-dimensional state representation utilizing the topological relationship between the assisted person and the non-rigid material. With their developed experimental system for T-shirt clothing assistance, including an anthropomorphic dual-arm robot and a soft mannequin, they demonstrated the robot quickly learns to modify its arm motion to put the mannequin’s head into a T-shirt.

Additionally, Matsubara et al. [55] and Shinoara et al. [56] proposed a novel learning framework for learning motor skills for interacting with non-rigid materials by RL. Their learning framework focuses on the topological relationship between the configuration of the robot and the non-rigid material. They constructed an experimental setting with an anthropomorphic dual-arm robot and a tailor-made T-shirt for the robot. They both applied the method to the robot to perform the motor task of wearing a T-shirt.

Other methods

Klee et al. [57] focused on the motion interaction between the robot and the person. The authors found a solution involving manipulator motions and user repositioning requests. Specifically, the solution allows the robot and user to take turns moving in the same space and is cognizant of the user’s limitations. To accomplish this, a vision module monitors the human’s motion, determines if they are following the repositioning requests, and infers mobility limitations when they cannot. The learned constraints were used during future dressing episodes to personalize the repositioning requests. Their contributions included a turn-taking approach to human–robot coordination for the dressing problem and a vision module capable of learning user limitations. They validated their approach using a robot that puts a hat on the user’s head.

Putting a shoe on user feet

In this section, the papers concerning wearing a shoe on the user's feet are evaluated and divided into subsections according to the control strategy applied.

Learning from demonstration

Canal et al. [58] defined a method to guide a planner to choose the preferred actions by the user. The user model was included in the planning domain as predicates, and the actions’ associated costs depend on them, the costliest actions being those that do not satisfy the user model. Moreover, they used a stochastic planner with NID rules that contemplate the possibility of different action outcomes and failures. The initial user model was inferred by asking two simple questions to the user, related to his/her confidence and comfortability. A Fuzzy Inference System (FIS) was then used to translate the answers to planning predicates. To make the planner adapt to user behaviour change and to cope with wrongly inferred user models, each rule’s probabilities and costs were updated. First, an initial refinement was performed to favour the inferred user model. Then, after each task completion, the satisfaction of the user was used to refine each rule cost, and the outcome of each action was used to refine the success’ probabilities. This defines a separation between the user model and the action outcomes, as the user delight should not be measured only by the success of the actions, which may fail due to events unrelated to the users’ preferences. Moreover, the system was able to plan with task related actions as well as with interaction actions, asking the user to move when needed and informing them regarding the next action when this increased the success rate of the action. They showed how the system was able to adapt to user behaviour changes, as well as how the use of feedback to update the action costs with the decreasing m-estimate produced a more stable behaviour and faster convergence to the preferred solution.

Putting an item on user leg

In this part, the papers concerning wearing a cloth on the user's leg are evaluated and divided into subsections according to the control strategy applied.

Other method

Yamazaki et al. [59] focused on a different task: the actions by which the robot can pull a pair of trousers along the subject’s legs. These actions are frequently demanded by humans requiring dressing assistance and which are potentially automatable. To overcome this problem the authors implemented the dressing procedure using a life-sized humanoid robot. Estimating the shape of the legs from images captured by a three-dimensional range camera, they proposed a method of modifying the trajectory from the basic one estimated from statistical human-body data.

Multiple tasks

In this section, the papers multiple tasks are evaluated and divided into subsections according to the control strategy applied.

Learning from demonstration

Lee et al. [5] presented an approach for generalizing force-based demonstrations of deformable object manipulation skills to novel situations. Their method uses non-linear geometric warping based on point cloud registration to adapt the demonstrations to a novel test scene, and then learns appropriate feedback gains to trade off position and force goals in a manner consistent with the data, providing for variable impedance control. Their results showed that including forces in the manipulation tasks allows for significantly greater generalization than purely kinematic execution: knots could be tightened more tightly in ropes with greater length variation and could be tied to a pipe without slipping off, towels of varying geometries could be stretched and laid flat, and whiteboards could be erased effectively. They chose their tasks to include both phases that were determined primarily by pose, such as positioning the gripper to grasp the rope, and phases that were primarily force-driven, such as tightening the knot. Performing such tasks kinematically is unreliable, because some parts are defined primarily by the force exerted on the object, while others require precise positioning. Automatically determining whether force or pose is important at each phase is essential for effectively generalizing demonstrations of such tasks. The authors validated their work using a robot that tied a knot, folded a towel, erased a whiteboard, and tied a rope to a pipe.

Reinforcement learning

Tsurumine et al. [60] (see Fig. 5a, b) proposed two DRL algorithms: deep policy network and duelling deep policy network structure which combine the nature of smooth policy update with the capability of automatic feature extraction in deep neural networks to enhance the sample efficiency and learning stability with fewer samples. To exploit the nature of smooth policy update, they used dynamic policy programming [61] which considers the Kullback–Leibler divergence between current policy π and baseline policy π̄ into the reward function to minimize the difference between the current and baseline policy while maximizing the expected reward. A DDQN inspired novel architecture was also presented that learned separate value and advantage functions and then used human demonstrations to drastically reduce the exploration space for their RL agent. Their state was defined as raw RGB images which are then mapped to optimal actions by the neural network. Results reported, indicated a stable and sample efficient learning for cloth manipulation tasks such as folding a t-shirt and flipping a handkerchief when compared to deep Q-learning (DQN) [62] algorithm while simultaneously earning higher total reward. The robot in this approach tied a knot, folded a towel, erased a whiteboard, and tied a rope to a pipe.

Fig. 5figure 5

a RL example of a robot that is folding a T-shirt [57] and (b) RL example of network implementation with folding steps of the T-shirt [57]

Matas et al. [63] instead, proposed a task agnostic algorithm based on Deep RL which bypasses the need to explicitly model cloth behaviour and does not require reward shaping to converge. The agent was able to learn 3 long horizon tasks: folding a towel to a tape mark, diagonal folding of face towel and draping a small towel over a hanger. Training was seeded with 20 demonstrations and happened entirely in simulation with a couple of adaptations to account for imperfections in experimental deformable body support, and with domain randomization to enable easy transfer of the policy. The learning algorithm incorporated 9 improvements proposed in the recent literature and they presented ablation studies to understand the role of these improvements. The robot in this approach folded up a towel up to a mark, folded a face towel diagonally, and draped a piece of cloth over a hanger.

留言 (0)

沒有登入
gif