Least-Squares Policy Iteration¶

Authors: Michail G. Lagoudakis, Ronald Parr

Published: 2003 (Journal Paper)

Source: Journal of Machine Learning Research

Algorithm: LSPI

Summary¶

Introduces LSPI (Least-Squares Policy Iteration), combining approximate policy iteration with LSTD value estimation under linear function approximation. Enables offline, model-free, off-policy reinforcement learning with efficient data reuse across policy updates. Widely cited as a foundation for batch RL methods.

Abstract¶

Links¶

Primary

Paper jmlr.org

Alternate

users.cs.duke.edu PDF users.cs.duke.edu

Tags¶

Reinforcement learning
Policy iteration
Least squares
Value function approximation
LSPI
Model-free control