Skip to content

Probabilistic Differential Dynamic Programming

Authors: Yunpeng Pan, Evangelos A. Theodorou

Published: 2014 (Conference Paper)

Source: Neural Information Processing Systems (NeurIPS)

Algorithm: PDDP

DOI: 10.5555/2969033.2969040

Summary

Extends DDP to handle probabilistic dynamics models, propagating uncertainty through the backward and forward passes to produce trajectories that are robust to model uncertainty and compatible with learned probabilistic dynamics.

Abstract

We present a data-driven, probabilistic trajectory optimization framework for systems with unknown dynamics, called Probabilistic Differential Dynamic Programming (PDDP). PDDP takes into account uncertainty explicitly for dynamics models using Gaussian processes (GPs). Based on the second-order local approximation of the value function, PDDP performs Dynamic Programming around a nominal trajectory in Gaussian belief spaces. Different from typical gradient-based policy search methods, PDDP does not require a policy parameterization and learns a locally optimal, time-varying control policy. We demonstrate the effectiveness and efficiency of the proposed algorithm using two nontrivial tasks. Compared with the classical DDP and a state-of-the-art GP-based policy search method, PDDP offers a superior combination of data-efficiency, learning speed, and applicability.

Tags

  • Trajectory optimization

  • Differential dynamic programming

  • DDP

  • Probabilistic differential dynamic programming

  • PDDP

  • Probabilistic methods

  • Learning

  • Uncertainty