Skip to content

On the Convergence of Policy Iteration in Stationary Dynamic Programming

Authors: Martin L. Puterman, Shelby L. Brumelle

Published: 1979 (Journal Paper)

Source: Mathematics of Operations Research

Algorithm: Policy Iteration

DOI: 10.1287/moor.4.1.60

Summary

Proves that policy iteration in stationary dynamic programming is equivalent to Newton-Kantorovich iteration on the Bellman operator, yielding superlinear (quadratic) convergence rates. Provides error bounds for finite-state MDPs and establishes a foundational connection between policy iteration and Newton-type numerical methods.

Abstract

Tags

  • Policy iteration

  • Dynamic programming

  • Convergence

  • Newton-Kantorovich iteration

  • Markov decision processes

  • Reinforcement learning