Skip to content

Path Integral Policy Improvement with Covariance Matrix Adaptation

Authors: Freek Stulp, Olivier Sigaud

Published: 2012 (Conference Paper)

Source: International Conference on Machine Learning (ICML)

Algorithm: PI2-CMA

arXiv: 1206.4621

DOI: 10.5555/3042573.3042771

Summary

Shows that PI2, CEM, and CMA-ES all share the concept of probability-weighted averaging for parameter updates, unifying them into a common family. Derives PI2-CMA, which inherits PI2's stochastic optimal control foundations while automatically adapting the exploration noise covariance, eliminating the need to manually tune exploration magnitude.

Abstract

There has been a recent focus in reinforcement learning on addressing continuous state and action problems by optimizing parameterized policies. PI2 is a recent example of this approach. It combines a derivation from first principles of stochastic optimal control with tools from statistical estimation theory. In this paper, we consider PI2 as a member of the wider family of methods which share the concept of probability-weighted averaging to iteratively update parameters to optimize a cost function. We compare PI2 to other members of the same family - Cross-Entropy Methods and CMAES - at the conceptual level and in terms of performance. The comparison suggests the derivation of a novel algorithm which we call PI2-CMA for "Path Integral Policy Improvement with Covariance Matrix Adaptation". PI2-CMA's main advantage is that it determines the magnitude of the exploration noise automatically.

Tags

  • Path integral control

  • Policy optimization

  • Reinforcement learning

  • Covariance matrix adaptation

  • Exploration noise

  • Continuous control