Adaptive Dynamic Programming for Discrete-Time Zero-Sum Games

Qinglai Wei; Derong Liu; Qiao Lin; Ruizhuo Song

doi:10.1109/TNNLS.2016.2638863

Adaptive Dynamic Programming for Discrete-Time Zero-Sum Games

Qinglai Wei, Derong Liu, Qiao Lin, Ruizhuo Song

School of Computer Science

Research output: Journal Publication › Article › peer-review

151 Citations (Scopus)

Abstract

In this paper, a novel adaptive dynamic programming (ADP) algorithm, called “iterative zero-sum ADP algorithm,” is developed to solve infinite-horizon discrete-time two-player zero-sum games of nonlinear systems. The present iterative zero-sum ADP algorithm permits arbitrary positive semidefinite functions to initialize the upper and lower iterations. A novel convergence analysis is developed to guarantee the upper and lower iterative value functions to converge to the upper and lower optimums, respectively. When the saddle-point equilibrium exists, it is emphasized that both the upper and lower iterative value functions are proved to converge to the optimal solution of the zero-sum game, where the existence criteria of the saddle-point equilibrium are not required. If the saddle-point equilibrium does not exist, the upper and lower optimal performance index functions are obtained, respectively, where the upper and lower performance index functions are proved to be not equivalent. Finally, simulation results and comparisons are shown to illustrate the performance of the present method.

Original language	English
Pages (from-to)	957-969
Journal	IEEE Transactions on Neural Networks and Learning Systems
Volume	29
Issue number	4
DOIs	https://doi.org/10.1109/TNNLS.2016.2638863
Publication status	Published - 1 Apr 2018

Keywords

Adaptive critic designs
adaptive dynamic programming (ADP)
approximate dynamic programming
neurodynamic programming
optimal control
zero-sum game

Access to Document

10.1109/TNNLS.2016.2638863

https://ieeexplore.ieee.org/document/7835683/

Cite this

@article{17400c160b7f45dbaa22b09dbecd320e,

title = "Adaptive Dynamic Programming for Discrete-Time Zero-Sum Games",

abstract = "In this paper, a novel adaptive dynamic programming (ADP) algorithm, called “iterative zero-sum ADP algorithm,” is developed to solve infinite-horizon discrete-time two-player zero-sum games of nonlinear systems. The present iterative zero-sum ADP algorithm permits arbitrary positive semidefinite functions to initialize the upper and lower iterations. A novel convergence analysis is developed to guarantee the upper and lower iterative value functions to converge to the upper and lower optimums, respectively. When the saddle-point equilibrium exists, it is emphasized that both the upper and lower iterative value functions are proved to converge to the optimal solution of the zero-sum game, where the existence criteria of the saddle-point equilibrium are not required. If the saddle-point equilibrium does not exist, the upper and lower optimal performance index functions are obtained, respectively, where the upper and lower performance index functions are proved to be not equivalent. Finally, simulation results and comparisons are shown to illustrate the performance of the present method.",

keywords = "Adaptive critic designs, adaptive dynamic programming (ADP), approximate dynamic programming, neurodynamic programming, optimal control, zero-sum game",

author = "Qinglai Wei and Derong Liu and Qiao Lin and Ruizhuo Song",

year = "2018",

month = apr,

day = "1",

doi = "10.1109/TNNLS.2016.2638863",

language = "English",

volume = "29",

pages = "957--969",

journal = "IEEE Transactions on Neural Networks and Learning Systems",

issn = "2162-237X",

publisher = "IEEE Computational Intelligence Society",

number = "4",

}

TY - JOUR

T1 - Adaptive Dynamic Programming for Discrete-Time Zero-Sum Games

AU - Wei, Qinglai

AU - Liu, Derong

AU - Lin, Qiao

AU - Song, Ruizhuo

PY - 2018/4/1

Y1 - 2018/4/1

N2 - In this paper, a novel adaptive dynamic programming (ADP) algorithm, called “iterative zero-sum ADP algorithm,” is developed to solve infinite-horizon discrete-time two-player zero-sum games of nonlinear systems. The present iterative zero-sum ADP algorithm permits arbitrary positive semidefinite functions to initialize the upper and lower iterations. A novel convergence analysis is developed to guarantee the upper and lower iterative value functions to converge to the upper and lower optimums, respectively. When the saddle-point equilibrium exists, it is emphasized that both the upper and lower iterative value functions are proved to converge to the optimal solution of the zero-sum game, where the existence criteria of the saddle-point equilibrium are not required. If the saddle-point equilibrium does not exist, the upper and lower optimal performance index functions are obtained, respectively, where the upper and lower performance index functions are proved to be not equivalent. Finally, simulation results and comparisons are shown to illustrate the performance of the present method.

AB - In this paper, a novel adaptive dynamic programming (ADP) algorithm, called “iterative zero-sum ADP algorithm,” is developed to solve infinite-horizon discrete-time two-player zero-sum games of nonlinear systems. The present iterative zero-sum ADP algorithm permits arbitrary positive semidefinite functions to initialize the upper and lower iterations. A novel convergence analysis is developed to guarantee the upper and lower iterative value functions to converge to the upper and lower optimums, respectively. When the saddle-point equilibrium exists, it is emphasized that both the upper and lower iterative value functions are proved to converge to the optimal solution of the zero-sum game, where the existence criteria of the saddle-point equilibrium are not required. If the saddle-point equilibrium does not exist, the upper and lower optimal performance index functions are obtained, respectively, where the upper and lower performance index functions are proved to be not equivalent. Finally, simulation results and comparisons are shown to illustrate the performance of the present method.

KW - Adaptive critic designs

KW - adaptive dynamic programming (ADP)

KW - approximate dynamic programming

KW - neurodynamic programming

KW - optimal control

KW - zero-sum game

U2 - 10.1109/TNNLS.2016.2638863

DO - 10.1109/TNNLS.2016.2638863

M3 - Article

SN - 2162-237X

VL - 29

SP - 957

EP - 969

JO - IEEE Transactions on Neural Networks and Learning Systems

JF - IEEE Transactions on Neural Networks and Learning Systems

IS - 4

ER -

Adaptive Dynamic Programming for Discrete-Time Zero-Sum Games

Abstract

Keywords

Access to Document

Fingerprint

Cite this