On the Convergence of Policy Iteration in Stationary Dynamic Programming¶

Authors: Martin L. Puterman, Shelby L. Brumelle

Published: 1979 (Journal Paper)

Source: Mathematics of Operations Research

Algorithm: Policy Iteration

DOI: 10.1287/moor.4.1.60

Summary¶

Proves that policy iteration in stationary dynamic programming is equivalent to Newton-Kantorovich iteration on the Bellman operator, yielding superlinear (quadratic) convergence rates. Provides error bounds for finite-state MDPs and establishes a foundational connection between policy iteration and Newton-type numerical methods.

Abstract¶

Links¶

Primary

Paper pubsonline.informs.org

Standard

DOI 10.1287/moor.4.1.60

Tags¶

Policy iteration
Dynamic programming
Convergence
Newton-Kantorovich iteration
Markov decision processes
Reinforcement learning