The Natural Actor-Critic is a policy-gradient based actor-critic architecture for reinforcement learning which employs both Natural Policy Gradients as well as the compatible function approximation. It has been described as "blazingly fast" (David Wingate from MIT at NIPS 2007), as "the current method of choice" (Douglas Aberdeen from NICTA at MLSS 2006) and as "a great algorithm for improving upon imitations" (Aude Billard from EPFL at NIPS 2007). For more information, see

Peters, J.; Schaal, S. (2008). Natural Actor-Critic, Neurocomputing, 71, 7-9, pp.1180-1190. [pdf]

Peters, J.;Schaal, S. (2008). Reinforcement learning of motor skills with policy gradients, Neural Networks, 21, pp.682697. [pdf]

Peters, J.;Schaal, S. (2006). Policy gradient methods for robotics, Proceedings of the IEEE International Conference on Intelligent Robotics Systems (IROS 2006). [pdf]

Peters, J.;Vijayakumar, S.;Schaal, S. (2005). Natural Actor-Critic, in: Gama, J.;Camacho, R.;Brazdil, P.;Jorge, A.;Torgo, L. (eds.), Proceedings of the 16th European Conference on Machine Learning (ECML 2005), 3720, pp.280-291, Springer. [pdf]

Peters, J.;Vijayakumar, S.;Schaal, S. (2003). Reinforcement learning for humanoid robotics, Humanoids2003, Third IEEE-RAS International Conference on Humanoid Robots. [pdf]

These publications should be sufficient for understanding this architecture.

Follow-up Work by Other Researchers

There has been some follow-up work on the Natural Actor-Critic by other researchers and I would like to point these papers out to the interested reader:


If you think, your works should be added here, drop me a line!


