Adaptive linear quadratic control using policy iteration¶

Authors: S.J. Bradtke, B.E. Ydstie, A.G. Barto

Published: 1994 (Conference Paper)

Source: American Control Conference (ACC)

Algorithm: Policy Iteration LQR

DOI: 10.1109/ACC.1994.735224

Summary¶

Abstract¶

In this paper we present the stability and convergence results for dynamic programming-based reinforcement learning applied to linear quadratic regulation (LQR). The specific algorithm we analyze is based on Q-learning and it is proven to converge to an optimal controller provided that the underlying system is controllable and a particular signal vector is persistently excited. This is the first convergence result for DP-based reinforcement learning algorithms for a continuous problem.

Links¶

Primary

Paper doi.org

Standard

DOI 10.1109/ACC.1994.735224