A Natural Policy Gradient¶
Authors: Sham M. Kakade
Published: 2001 (Conference Paper)
Source: Advances in Neural Information Processing Systems
Algorithm: Natural Policy Gradient
Summary¶
Proposes using the natural gradient (Fisher information metric) for policy optimization in RL, showing it moves toward policy-iteration greedy actions rather than merely better actions. Motivated and formalized the use of the natural gradient in RL, directly inspiring later work on TRPO and NPG-based algorithms.
Abstract¶
We provide a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space. Although gradient methods cannot make large changes in the values of the parameters, we show that the natural gradient is moving toward choosing a greedy optimal action rather than just a better action. These greedy optimal actions are those that would be chosen under one improvement step of policy iteration with approximate, compatible value functions, as defined by Sutton et al. We then show drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris.
Links¶
Primary
Tags¶
-
Reinforcement learning
-
Policy gradient
-
Natural gradient
-
Fisher information matrix
-
Policy optimization
-
MDP