Path Integral Policy Improvement with Covariance Matrix Adaptation¶

Authors: Freek Stulp, Olivier Sigaud

Published: 2012 (Conference Paper)

Source: International Conference on Machine Learning (ICML)

Algorithm: PI2-CMA

arXiv: 1206.4621

DOI: 10.5555/3042573.3042771

Summary¶

Shows that PI2, CEM, and CMA-ES all share the concept of probability-weighted averaging for parameter updates, unifying them into a common family. Derives PI2-CMA, which inherits PI2's stochastic optimal control foundations while automatically adapting the exploration noise covariance, eliminating the need to manually tune exploration magnitude.

Abstract¶

There has been a recent focus in reinforcement learning on addressing continuous state and action problems by optimizing parameterized policies. PI2 is a recent example of this approach. It combines a derivation from first principles of stochastic optimal control with tools from statistical estimation theory. In this paper, we consider PI2 as a member of the wider family of methods which share the concept of probability-weighted averaging to iteratively update parameters to optimize a cost function. We compare PI2 to other members of the same family - Cross-Entropy Methods and CMAES - at the conceptual level and in terms of performance. The comparison suggests the derivation of a novel algorithm which we call PI2-CMA for "Path Integral Policy Improvement with Covariance Matrix Adaptation". PI2-CMA's main advantage is that it determines the magnitude of the exploration noise automatically.

Links¶

Tags¶

Path integral control
Policy optimization
Reinforcement learning
Covariance matrix adaptation
Exploration noise
Continuous control

Path Integral Policy Improvement with Covariance Matrix Adaptation¶

Summary¶

Abstract¶

Links¶

Primary

Standard

Alternate

Tags¶