Creating autonomous robots that can learn to assist humans in situations of daily life is a fascinating challenge for machine learning. While this aim has been a long-standing vision of artificial intelligence and the cognitive sciences, we have yet to achieve the first step of creating robots that can learn to accomplish many different tasks triggered by environmental context or higher-level instruction. The goal of our robot learning laboratory is the investigation of the ingredients for such a general approach to motor skill learning, to get closer towards human-like performance in robotics. We thus focus on the solution of basic problems in robotics while developing domain-appropriate machine-learning methods. Starting from theoretically well-founded approaches to representing the required control structures for task representation and execution, we replace the analytically derived modules by more flexible, learned ones.
This page presents an overview of our work on the following topics:
In the context of Motor Skill Learning, we are interested in a variety of topics that can be classified by three layers of abstraction:
A) Learning to Execute: An essential problem in robotics is the accurate execution of desired movements using only low-gain controls such that the robot will accomplish the desired task while not harming human beings in its environment. Following a trajectory with little feedback requires the accurate prediction of the needed torques, which cannot be achieved using classical methods for sufficiently complex robots. However, learning such models is hard as the joint-space can never be fully explored and the learning algorithm has to cope with a never-ending data stream in real time. We have developed learning methods both for accomplishing tasks represented in operational space as well as in joint-space. For more information on learning to execute see:
B) Learning new Elementary Tasks: While learning to execute tasks is a component essential to a framework for motor skill learning, learning the actual task is of even higher importance as discussed in here. We focus on the learning of elementary tasks or movement primitives, which are parameterized task representations based on nonlinear differential equations with desired attractor properties. We mimic how children learn new motor tasks using imitation learning for initializing these movement primitives while employing reinforcement learning to subsequently improve the task performance. We have learned tasks such as Ball-in-a-Cup or bouncing a ball on a string using this approach. In hitting and batting tasks, movement templates with a learned global shape need to be adapted during the execution so that the racket reaches a target position and velocity that will return the ball over to the other side of the net or court. This requires a reformulation of motor primitives to hitting primitives. A key motor skill for manipulating the environment is grasping, which is why one of our research goals is adapting machine learning algorithms to make them applicable in the robot grasping task domain. For grasping or hitting, several alternative motor primitives might be available.
C) Learning to Compose Complex Tasks: Most complex tasks require several motor primitives to be executed in parallel or in sequence. The selection and composition of motor primitives requires a perceptuo-motor perspective, and is the necessary for learning complex tasks. An example of a complex tasks which requires motor primitive selection and hitting primitives is the task of learning to play ping-pong. Moving towards learning complex tasks requires the solution of a variety of hard problems. Among these are the decomposition of large tasks into movement primitives (MP), the acquisition and self-improvement of MPs, the determination of the number of MPs in a data set, the determination of the relevant task-space, perceptual context estimation and goal learning for MPs, as well as the composition of MPs for new complex tasks. These questions are tackled in order to make progress towards fast and general motor skill learning for robotics.
Selected Related Research Topics: learning models for control, learning operational space control, reinforcement learning, learning motor primitives, learning complex tasks, learning to grasp, learning to play ping-pong, motor primitives for hitting, brain-robot interfaces
Research in robotics and artificial intelligence has lead to the development of complex robots such as humanoids and androids. In order to be meaningfully applied in human-inhabited environments, robots need to possess a variety of physical abilities and skills. However, programming such skills is a labour- and time intensive task which requires a large amount of expert knowledge. In particular, it often involves transforming intuitive concepts of motions and actions into formal mathematical descriptions and algorithms.
To overcome such difficulties, we use imitation learning to teach robots new motor skills. A human demonstrator first provides one or several examples of the the skill. Information recorded through motion capture or physical interaction is used by the robot to automatically generate a controller that can replicate the seen movements. This is done using modern machine learning techniques. Imitation learning also allows robots to improve upon the observed behavior. This so called self-improvement of the task can help the robot to adapt the learned movement to the characteristics of its own body or the requirements of the current context. Hence, even if the examples presented by the human are not optimal, the robot can still use them to bootstrap its behavior.
At IAS, imitation learning has already been used to teach complex motor skills to various kinds of robots. This includes skills such as locomotion, grasping of novel objects , ping-pong, ball-in-the-cup and tetherball. New machine learning methods that reduce the time needed to acquire a motor skill are developed. The goal of this research is to have intelligent robots that can autonomously enlarge their repertoire of skills by observing or interacting with human teachers.
Additionally, we recently developed a novel movement primitive representation, capable for imitation learning. The new approach is called Probabilistic Movement Primitives (ProMPs) and allows for blending between motions, adapting to altered task variables, and co-activating multiple MPs in parallel. We based our approach on a probabilistic formulation of the movement primitive concept that maintains a distribution over trajectories. Our probabilistic approach allows for the derivation of new operations which are essential for implementing all aforementioned properties in one framework. In order to use such a trajectory distribution for robot movement control, we analytically derive a stochastic feedback controller which reproduces the given trajectory distribution. We have evaluated and compare our approach to existing methods on several simulated as well as real robot scenarios.
Supervised learning is not always sufficient for motor learning problems, partly because often an expert teacher or idealized version of the behavior is not available. Because of that, one of our goals is the development reinforcement learning methods which scale into the dimensionality of humanoid robots and can generate actions for seven or more degrees of freedom.
Efficient reinforcement learning for continuous states and actions is essential for robotics and control. We follow two approaches depending on the dimensionality of the domain. For high-dimensional state and action spaces, it is often easier to directly learn policies without estimating accurate system models. The resulting algorithms are parametric policy search algorithms inspired by expectation-maximization methods or information-theoretic policy updates and can be employed for motor primitive learning. Information-theoretic policy updates seem to be particulary suited for the high-dimensional problems that occur in robotics as the update rules result in a smooth and robust policy update that always `stays close to the data'. If we can learn a model of the robot and its environment, we can employ model-based reinforcement learning to drastically improve the sample efficiency of policy search algorithms. As a result, these methods can learn good policies at a rapid pace based on only little interaction of the system.
Our general goal in reinforcement learning is the development of methods which scale into the dimensionality of humanoid robots. Such high dimensionalities is a tremendous challenge for reinforcement learning as a complete exploration of the underlying state-action spaces is impossible and few existing techniques scale into this domain.Therefore we rely upon a combination of both, watching a teacher and subsequent self-improvement. In more technical terms: first, a control policy is obtained by imitation and then improved using reinforcement learning.
Parametrized policies allow an efficient abstraction of the high-dimensional continuous action spaces which is often needed in robotics. We can directly optimize the parameters of the primitive by the use of policy search methods. Members of IAS have developed a variety of novel algorithms for this context which have been applied for learning to play table tennis, tetherball, the game 'ball in the cup' and darts.
A) Natural Actor-Critic (NAC): The NAC is currently considered the most efficient policy gradient method. It makes use of the fact, that a natural gradient usually beats a vanilla gradient. For more information read:
B) EM-like Reinforcement Learning: We formulated policy search as an inference problem. This has led to efficient algorithms like reward-weighted regression and PoWER. For more information read:
C) Information-theoretic Policy Search: The optimization in policy search can rapidly change the control policy, which might lead to jumps in the policy as well as in the resulting trajectory distribution. While this behavior might already be dangerous for a real robot, it might also lead to premature convergence to suboptimal solutions or even osciallations. Information-theoretic policy search solves this problem by bounding the Relative Entropy between two subsequent policies. Hence, the new policy always tries to stay close to the `data' that has been generated by the old policy, while maximizing the reward locally. Such policy updates are mathematically sound and allow the derivation of a whole range of new algorithms, including contextual policy search and learning hierarchical policies. For more information read:
For an overview over these approaches, please also consult the following survey papers from IAS members:
Contact: Jan Peters, Marc Deisenroth, Gerhard Neumann, Jens Kober, Christian Daniel, Abdeslam Boularias, Simone Parisi, Joni Pajarinen, Herke van Hoof, Takayuki Osa, Riad Akrour
Download: Flyer as PDF
A key long-term goal of robotics is to create autonomous robots that can perform a wide range of tasks to help humans in daily life. One of the main requirements of such domestic and service robots is the ability to manipulate a wide range of objects in their surroundings.
Modern industrial robots usually only have to manipulate a single set of identical objects, using preprogrammed actions. However, future service robots will need to operate in unstructured environments and perform tasks with novel objects. These robots will therefore need to learn to optimize their actions to specific objects, as well as generalize their actions between objects.
A) Improving Grasps through Experience: The ability to grasp objects is an important prerequisite for performing various manipulation tasks. Using trial-and-error, a robot can autonomously optimize its grasps of objects. In particular, the grasp selection process can be framed as a continuum-armed bandit reinforcement learning problem. Thus, the robot can actively balance executing grasps that are known to be good and exploring new grasps that may be better.
B) Affordance Learning: An object's affordances are the actions that the robot can perform using the object. The affordances of basic objects, such as tools, are usually defined by their surface structures. By finding similar surface structures in different objects, the robot can transfer its knowledge of afforded actions between objects. In this manner, the robot can predict whether a novel object affords a specific action, as well as adapt the action to this object.
The robot can learn an initial action from a human demonstration. Adapting this action to new objects is achieved autonomously by the robot, using a trial-and-error learning approach.
C) Multi-Phase Manipulations: Manipulation tasks can be decomposed into phases, wherein the robot's actions have distinct effects depending on the current phase. In order to perform a task, the robot will first need to reach a phase that affords the desired manipulation, which may require transitioning through other phases first. The phases thus define a sequence of subtasks for the robot to complete in order to manipulate different parts of the environment.
Our research has focused on developing methods for learning the phase structure of tasks, as well as learning manipulation skills for transitioning between the phases. The robot first learns the conditions for transitioning between phases and then optimizes its motor skills accordingly. In order to learn versatile manipulation skills, we have also investigated representations for generalizing phase transitions between different objects. As part of our research, the Darias robot has learned the phases of a variety of tasks, including stacking objects, pouring, two-handed grasping, and turning a pepper mill.
Semi-autonomous robots are robots that physically interact with a human partner in order to achieve a task in a collaborative manner. Fundamental research in semi-autonomous robotics has potential applications in a variety of scenarios where humans need assistance: from the assembly of products in factories, to the aid of the elderly at home, to the control of actuated prosthetics, to the shared control in repetitive tele-operated processes.
Research for semi-autonomous robots needs to go beyond the usual sense-plan-act paradigm to account for the interaction with humans. Sensing and perception must involve the observation and interpretation of the human movement. Planning has to take into account the inferred human intent and, from a possibly limited repertoire of robot skills, search for an appropriate combination of actions. Acting for collaboration requires algorithms that model and predict human trajectories and generate safe and appropriate commands for both human and robot.
One of the challenges that we are currently addressing is related to the fact that humans usually execute a variety of unforeseen tasks; hence pre-programming a robot for all possible tasks is infeasible. We approach this problem by investigating new algorithms that allow a robot: (1) to learn and maintain a dynamic repertoire of skills and to assess its own ability to assist the human in accomplishing a given task, and (2) to make requests for human demonstration in order to acquire a new skill via imitation and reinforcement learning.
We also investigate ways to model collaborative interaction. Such a model is used to predict the intention and movement of the human while simultaneously generating collaborative control actions for the robot. Our current approach leverages on probabilistic tools for the realization of interaction primitives. Basically, a prior model of the interaction is encoded as a joint distribution of the human and robot movements, from which a posterior distribution of the interaction can be inferred by observing only the human.
We validate and demonstrate our algorithms using real robotic systems which also involves the milestones of the 3rd Hand Project (visit the website here, download the brochure here) as one of our driving applications. You can watch our latest results in here.
Many current robots lack fine manipulation skills. A major reason for this is that most industrial robotic arm-hand systems do not receive sufficient feedback about the contact with the object. Neuroscience has shown that such feedback from tactile sensors is a critical component in the human ability to perform such tasks. Therefore, we aim to equip our robots with tactile sensors and use the sensor feedback for robot control. Research issues include how to interpret the signals from the sensors (that are often noisy and high-dimensional) and how to include such signals in a robot control loop. Among others, we aim to address such issues by using supervised, unsupervised, and reinforcement learning techniques.
A robot needs to be aware of the properties of objects to efficiently perform tasks with them. Currently, most robots are provided with this object information by a programmer. However, autonomous robots working in service industries and domestic settings will need to perform tasks with novel objects. Hence, relying on predefined object knowledge will not be an option.
Instead, robots will have to learn about objects by exploring their properties through physical interactions, such as pushing, stroking, and lifting. As random exploration is an inefficient approach, we develop methods for efficiently gathering information.
The fundamental knowledge learned about objects and primitive actions will later on form the basis for learning complex behaviors and predicting the properties of novel objects. By discovering accurate representations of objects, the robot will be able to plan and execute manipulations more precisely.
A) Learning Tactile Sensing using Vision: The textures of object surfaces can be observed both by visual inspection and by sliding a dynamic tactile sensor across the surface. The robot can combine these types of sensor readings to determine which components of the data contain information pertaining to the texture.
In particular, the robot can find the components that are maximally correlated between the two sensor modalities. Given the relevant data components, the robot can create a compact representation of object textures, which allows it to distinguish between surfaces more accurately.
B) Active Learning of Object Properties: Learning about objects is not a passive perceptual process: its embodiment allows a robot to discover object properties by actively changing its point of view and by interacting with objects. At the same time, the robot observes the effect of its actions and learns how it can bring about desired effects.
To learn efficiently, the robot should select the exploratory actions that yields the highest information gain out of all possible actions. By efficiently exploring its environment, a robot develops object knowledge grounded in its sensorimotor experience.
The developed object knowledge can be used to manipulate previously unknown objects in unstructured environments. For example, the robot could teach itself to perform tasks in domestic environments.
Optimizing parameters in robotics is a necessary but often hard task. The challenges rise from the presence of multiple local minima, noise in the measurements, lack of analytical gradients and the limited number of experiments possible on a real robot. Hence, typical approaches such as grid search, random search, gradient descent and genetic algorithms often performs poorly.
Bayesian optimization is designed to naturally deal with these challenges of optimization in robotics. Thanks to the use of a response surface model, Bayesian optimization drastically reduce the number of experiments needed. Hence, it can be successfully employed in real system where other approaches would fail.
Dealing with whole body movement of humanoid robot is a highly challenging domain involving various different problems ranging from stabilizing behaviors (e.g., balancing, support oneself against objects) over static motions (e.g., getting up from a chair, push-ups) up to locomotion (e.g., walking, running). As of now, no robot exhibits the same dexterity, efficiency, speed and robustness as a human due the difficulty of controlling and planning with a high number of degrees of freedom as well as due to the complexity of modeling and estimating physical contacts.
We aim to develop new approaches to improve the current state-of-the-art for whole body movement of humanoid robot in general, and, especially, for robot locomotion .
The human central nervous system has an incredible ability for learning new motor skills. For example, when learning to control a novel sport device, new movement and balancing strategies may be discovered within seconds. Such rapid learning processes are based on learned abstract concepts, current beliefs of the environment and future expectations of desired postures, reached targets or achieved rewards (among others). Little is known about how these underlying processes develop and how they interplay during motor learning.
In our research, we develop probabilistic models and inference techniques to gain a better understanding of the amazing human learning abilities. Our models reproduce characteristic features like motor variability, continuous exploration, stochastic decisions and the ability to learn task abstractions. Applications range from medical diagnoses and rehabilitation research to smart robot learning and control frameworks.
Spiking neural networks are powerful computational models of brain functions and are a key technology in, e.g., the Human Brain Project with estimated total costs of 1.19 billion Euros. A second driving force for intensive research efforts are the massive parallel computing abilities. Companies like IBM, SPS or HP develop 'brain-like' memory and computing chips with the long-term goal of developing neuromorphic computers.
We investigate how spiking neural network models for neuromorphic chips can be used for robot motor control and learning. A strong focus is put on the real-time processing of huge sensory streams from tactile, visual and other sensors and on learning from rewards. The implemented computational principles are based on powerful machine learning algorithms like probabilistic inference, contrastive divergence or stochastic policy search that are used in a large variety of other applications like visual scene understanding, speech processing and cognitive reasoning.
In robotics learning algorithms often return a single policy, optimal with respect to a given cost or reward function. Learning a set of policies instead of a single one can be of particular interest in two cases. First, the cost function has a significant impact on the shape of the returned policy. However, the exact definition of the cost can sometimes be arbitrary if it is a trade-off between several objectives (e.g. a trade-off between state and action costs). In this case, it is more informative to explore the set of all policies optimal for a given trade-off. Secondly, even when the cost is known, finding all the policies performing over a given performance threshold can result in a faster adaptation to a changing environment (e.g. adapting to a different opponent in robot table tennis).
A) Multi-Objective Reinforcement Learning
Many real-world applications are characterized by multiple conflicting objectives. In such problems optimality is replaced by Pareto optimality and the goal is to find the Pareto frontier, a set of solutions representing different compromises among the objectives. Despite recent advances in multi-objective optimization, achieving an accurate representation of the Pareto frontier is still an important challenge.
We formulate the problem of approximating Pareto frontiers as MOMDPs and solve it with a manifold-based approach. We combine episodic exploration strategies and importance sampling to efficiently learn a manifold in the policy parameter space such that its image in the objective space accurately approximates the Pareto frontier.
B) Intrinsically Motivated Reinforcement Learning
Intrinsic motivation defines a set of objectives the robot can pursue even in the absence of a task related reward. We are especially interested in combining intrinsic motivation and task specific objectives. For instance, a robot can decide to maximize the entropy of its sensory-motor stream, resulting in an exploratory behavior. When combined with a task reward, this can ensure that exploration of diverse behaviors is restricted to the area of interest of the task.
Learning a diverse set of good performing behaviors can have several advantages. In an adversarial setting such as robot table tennis, diversity of behaviors renders the robot harder to predict and hence harder to counter for the opponent. In an collaborative task with a human, an emphasis on diversity gives the human more opportunity to guide the robot and to avoid the latter to be stuck in local optima.