Operational space control (OSC) is one of the most elegant approaches to task control for complex, redundant robots. Its potential for dynamically consistent control, compliant control, force control, and hierarchical control has not been exhausted to date. Applications of OSC range from end-effector control of manipulators up to balancing and gait execution for humanoid robots [1].

If the robot model is accurately known, operational space control is well-understood yielding a variety of different solution alternatives. These choices include resolved-motion rate control, resolved acceleration control, and direct force-based task-space control. However, as in many new robotic systems are supposed to operate safely in human environments, compliant, low-gain operational space control is desired. As a result, the practical use of operational space control becomes in-creasingly difficult in the presence of unmodeled nonlinearities, leading to reduced accuracy or even unpredictable and unstable null-space behavior in the robot system.

Learning control methods are a promising potential solution to this problem. However, learning methods do not easily provide the highly structured knowledge required in traditional operational space control laws, e.g., Jacobians, inertia matrices, and Coriolis/centripetal and gravity forces, since all these terms are not always instantly observable. They are therefore not suitable for formulating supervised learning as traditionally used in learning control approaches. In this project, we have designed novel approaches to learning operational space control that avoid extracting such structured knowledge and rather aim at learning the operational space control law directly, i.e., we pose OSC as a direct inverse model learning problem. A first important insight for this project is that a physically correct solution to the inverse problem with redundant degrees-of-freedom does exist when learning of the inverse map is performed in a suitable piecewise linear way [2, 3]. The second crucial component for our work is based on the insight that many operational space controllers can be understood in terms of a constrained optimal control problem [1]. The cost function associated with this optimal control problem allows us to formulate a learning algorithm that automatically synthesizes a globally consistent desired resolution of redundancy while learning the operational space controller. From the machine learning point of view, this learning problem corresponds to a reinforcement learning problem that maximizes an immediate reward. We employ an expectation-maximization policy search algorithm in order to solve this problem. Evaluations on a simulated three degrees of freedom robot arm show that the approach always converges to the globally optimal solution if provided with sufficient data [3].

The application to a physically realistic simulator of the anthropomorphic SARCOS Master arm demonstrates feasibility for complex high degree-of-freedom robots. We also show that the proposed method works in the setting of learning resolved motion rate control on real, physical Mitsubishi PA-10 medical robotics arm [4] and a high-speed Barrett WAM robot arm. In future work, we will extend the kernel-based approaches from learning models for control to operational space control.

** Contact persons:** Jan Peters, Duy Nguyen-Tuong

- Peters, J.;Schaal, S. (2007). Reinforcement learning by reward-weighted regression for operational space control,
*Proceedings of the International Conference on Machine Learning (ICML2007)*. See Details Download Article BibTeX Reference - Peters, J.; Schaal, S. (2008). Learning to control in operational space,
*International Journal of Robotics Research*,**27**, pp.197-212. See Details Download Article BibTeX Reference - Peters, J.; Nguyen-Tuong, D. (2008). Real-Time Learning of Resolved Velocity Control on a Mitsubishi PA-10,
*International Conference on Robotics and Automation (ICRA)*. See Details Download Article BibTeX Reference