Least-Squares Policy Iteration¶
Authors: Michail G. Lagoudakis, Ronald Parr
Published: 2003 (Journal Paper)
Source: Journal of Machine Learning Research
Algorithm: LSPI
Summary¶
Introduces LSPI (Least-Squares Policy Iteration), combining approximate policy iteration with LSTD value estimation under linear function approximation. Enables offline, model-free, off-policy reinforcement learning with efficient data reuse across policy updates. Widely cited as a foundation for batch RL methods.
Abstract¶
Links¶
Primary
Alternate
Tags¶
-
Reinforcement learning
-
Policy iteration
-
Least squares
-
Value function approximation
-
LSPI
-
Model-free control