Skip to content

Sampled Differential Dynamic Programming

Authors: Joose Rajamäki, Kourosh Naderi, Ville Kyrki, Perttu Hämäläinen

Published: 2016 (Conference Paper)

Source: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Algorithm: SaDDP

DOI: 10.1109/IROS.2016.7759229

Summary

Combines DDP and path integral control by estimating the DDP Hessian via zero-order sampling rather than analytical differentiation, yielding a trajectory optimizer that blends the structure and efficiency of DDP with the robustness and simplicity of sampling-based methods. Think of it as Hessian-Free optimization (using a zero-order oracle to estimate the Hessian c.f. "Deep Learning via Hessian-free Optimization" by James Martens, 2010) specialized to trajectory optimization problems.

Abstract

We present SaDDP, a sampled version of the widely used differential dynamic programming (DDP) control algorithm. We contribute through establishing a novel connection between two major branches of robotics control research, that is, gradient-based methods such as DDP, and Monte Carlo methods such as path integral control (PI) that utilize random simulated trajectory rollouts. One of our key observations is that the Taylor-expansion, central to DDP, can be reformulated in terms of second-order statistics computed of the sampled trajectories. SaDDP makes few assumptions about the controlled system and works with black-box dynamics simulations with non-smooth contacts. Our simulation results show that the method outperforms PI and CMA-ES in both a simple linear-quadratic problem, and a multilink arm reaching task with obstacles.

Tags

  • Trajectory optimization

  • Differential dynamic programming

  • Sampled differential dynamic programming

  • Sampling-based control

  • Sampling-based planning

  • Hessian-free optimization

  • Path integral control

  • Evolution strategies

  • CMA-ES

  • Taylor expansion

  • Gradient-based