Skip to content

Least-Squares Policy Iteration

Authors: Michail G. Lagoudakis, Ronald Parr

Published: 2003 (Journal Paper)

Source: Journal of Machine Learning Research

Algorithm: LSPI

Summary

Introduces LSPI (Least-Squares Policy Iteration), combining approximate policy iteration with LSTD value estimation under linear function approximation. Enables offline, model-free, off-policy reinforcement learning with efficient data reuse across policy updates. Widely cited as a foundation for batch RL methods.

Abstract

Tags

  • Reinforcement learning

  • Policy iteration

  • Least squares

  • Value function approximation

  • LSPI

  • Model-free control