Notation and Symbols used for Robot Learning

General Notation

  • Vectors will always be written in bold font and lower case, i.e., \mathbf{a}.
  • With a vector, we will always denote a column vector, i.e., \mathbf{a} = \left[\begin{array}{c} a_1 \\ \vdots \\ a_n \end{array}\right]. A row vector is denoted as \mathbf{a}^T.
  • Matrices will always be written in bold font and upper case, i.e., \mathbf{A}.
  • Gradients are always defined as row fectors, i.e., \frac{d f}{d \mathbf{x}} = \left[ \frac{d f}{d x_1}, \dots, \frac{d f}{d x_n} \right].
  • The gradient of a vector valued function \mathbf{f}(\mathbf{x})is a matrix defined as \frac{d \mathbf{f}}{d \mathbf{x}} = \left[\begin{array}{ccc} \frac{d f_1}{d x_1} & \dots & \frac{d f_1}{d x_n} \\ \vdots & \vdots & \vdots \\ \frac{d f_m}{d x_1} & \dots & \frac{d f_m}{d x_n}\end{array} \right]
  • The expectation of a function f(\mathbf{x}) with respect to a distribution p(\mathbf{x}) will be written as $\mathbb{E}_{p(\mathbf{x})}[f(\mathbf{x})] = \int p(\mathbf{x}) f(\mathbf{x}) d\mathbf{x}  $


  • \mathbf{q}}... joint positions, \dot{\mathbf{q}}... joint velocities, \ddot{\mathbf{q}}... joint accelerations
  • \mathbf{u} ... motor command, controls
  • \mathbf{\tau} ... 1. torques (a motor command), 2. trajectory or 3. temporal scaling parameter for movement primitives (\tau)
  • \mathbf{a} ... action (often \mathbf{a} and \mathbf{u} can be replaced)
  • \mathbf{s} ... state of the agent (used in most RL literature)
  • \mathbf{x} ... 1. state of the system (used in control literature, often \mathbf{x} and \mathbf{s} can be replaced), 2. task space coordinates (for example end-effector coordinates) 3. input sample for supervised learning methods
  • \mathbf{y} ... 1. state of a dynamical movement primitive, 2. output sample for supervised learning methods
  • \mathbf{f} ... 1. \mathbf{f}(\mathbf{q}) ... forward kinematics, 2. \mathbf{f}(\mathbf{x}, \mathbf{u}) (or similar notation for state and control) ... forward dynamics
  • \mathbf{J} ... Jacobian (of the forward kinematics)

Machine Learning

  • \mathbf{\theta} ... 1. parameter vector, 2. (occasionally) joint angles
  • \mathbf{\phi} ... feature vector of a single sample
  • \mathbf{\Phi} ... feature matrix containing the feature vectors of all samples (each row is a transposed feature vector)
  • \lambda... regularization constant = precision of the prior over the parameters
  • \sigma^2... measurement noise
  • \mathbf{X} ... matrix of all input vectors (in each row a sample)
  • \mathbf{Y} ... matrix of all output vectors (in each row a sample)

Optimal Decision Making

  • \pi(\mathbf{s}) ... determinstic policy
  • \pi(\mathbf{a}|\mathbf{s}) ... stochastic policy
  • \mathbf{\mu}^{\pi}(\mathbf{s}) ... state visit distribution of policy \pi
  • \mathbf{\mu}_0(\mathbf{s}) ... initial state distribution
  • r(\mathbf{s}, \mathbf{a}) ... reward function
  • J_{\pi} ... expected long term reward of policy \pi
  • V^{\pi} ... value function of policy $\pi$
  • Q^{\pi} ... state-action value function of policy $\pi$
  • V^{*} ... optimal value function
  • Q^{*} ... optimal state-action value function

Policy Gradients

  • \pi(\mathbf{a}|\mathbf{s}; \mathbf{\theta})... lower level policy for controlling the robot (stochastic)
  • \mathbf{a} = \pi(\mathbf{s}; \mathbf{\theta})... lower level policy for controlling the robot (deterministic)
  • \mathbf{\theta} ... parameter vector of the lower level policy
  • \pi(\mathbf{\theta}|\mathbf{\omega})... upper level policy (for choosing the parameters of the lower level policy)
  • \mathbf{\omega} ... parameter vector of the upper level policy
  • J_{\mathbf{\theta}} or J_{\mathbf{\omega}}... expected return function that depends on the parameters of lower level policy (left) or upper level policy (right)
  • \nabla_{\mathbf{\theta}} or \nabla_{\mathbf{\omega}}... gradient with respect to the parameters of lower-level policy parameters (left) or upper-level policy parameters
  • R^{[i]} ... return for the ith executed episode.
  • Q_t^{[i]} ... reward to come for time step t in the ith executed episode.
  • \mathbf{G}(\mathbf{\theta}) ... Fisher information matrix (FIM)


zum Seitenanfang