On the Convergence of Policy Iteration in Stationary Dynamic Programming¶
Authors: Martin L. Puterman, Shelby L. Brumelle
Published: 1979 (Journal Paper)
Source: Mathematics of Operations Research
Algorithm: Policy Iteration
DOI: 10.1287/moor.4.1.60
Summary¶
Proves that policy iteration in stationary dynamic programming is equivalent to Newton-Kantorovich iteration on the Bellman operator, yielding superlinear (quadratic) convergence rates. Provides error bounds for finite-state MDPs and establishes a foundational connection between policy iteration and Newton-type numerical methods.
Abstract¶
Links¶
Primary
Standard
Tags¶
-
Policy iteration
-
Dynamic programming
-
Convergence
-
Newton-Kantorovich iteration
-
Markov decision processes
-
Reinforcement learning