Sampled Differential Dynamic Programming¶

Authors: Joose Rajamäki, Kourosh Naderi, Ville Kyrki, Perttu Hämäläinen

Published: 2016 (Conference Paper)

Source: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Algorithm: SaDDP

DOI: 10.1109/IROS.2016.7759229

Summary¶

Combines DDP and path integral control by estimating the DDP Hessian via zero-order sampling rather than analytical differentiation, yielding a trajectory optimizer that blends the structure and efficiency of DDP with the robustness and simplicity of sampling-based methods. Think of it as Hessian-Free optimization (using a zero-order oracle to estimate the Hessian c.f. "Deep Learning via Hessian-free Optimization" by James Martens, 2010) specialized to trajectory optimization problems.

Abstract¶

We present SaDDP, a sampled version of the widely used differential dynamic programming (DDP) control algorithm. We contribute through establishing a novel connection between two major branches of robotics control research, that is, gradient-based methods such as DDP, and Monte Carlo methods such as path integral control (PI) that utilize random simulated trajectory rollouts. One of our key observations is that the Taylor-expansion, central to DDP, can be reformulated in terms of second-order statistics computed of the sampled trajectories. SaDDP makes few assumptions about the controlled system and works with black-box dynamics simulations with non-smooth contacts. Our simulation results show that the method outperforms PI and CMA-ES in both a simple linear-quadratic problem, and a multilink arm reaching task with obstacles.

Links¶

Sampled Differential Dynamic Programming¶

Summary¶

Abstract¶

Links¶

Primary

Standard

Alternate

Tags¶