Multi-agent planning under uncertainty with monte carlo Q-value function

Jian Zhang; Yaozong Pan; Ruili Wang; Yuqiang Fang; Haitao Yang

doi:10.3390/app9071430

Multi-agent planning under uncertainty with monte carlo Q-value function

Jian Zhang, Yaozong Pan, Ruili Wang, Yuqiang Fang, Haitao Yang

Research output: Journal Publication › Article › peer-review

Abstract

Decentralized partially observable Markov decision processes (Dec-POMDPs) are general multi-agent models for planning under uncertainty, but are intractable to solve. Doubly exponential growth of the search space as the horizon increases makes a brute-force search impossible. Heuristic methods can guide the search towards the right direction quickly and have been successful in different domains. In this paper, we propose a new Q-value function representation-Monte Carlo Q-value function Q _MC , which is proved to be an upper bound of the optimal Q-value function Q*. We introduce two Monte Carlo tree search enhancements-heavy playout for a simulation policy and adaptive samples-to speed up computation of Q _MC . Then, we present a clustering and expansion with Monte-Carlo algorithm (CEMC)-an offline planning algorithm using Q _MC as Q-value function, which is based on the generalized multi-agent A* with incremental clustering and expansion (GMAA*-ICE or ICE). CEMC calculates Q-value functions as required, without computing and storing all Q-value functions. An extended policy pruning strategy is used in CEMC. Finally, we present empirical results demonstrating that CEMC outperforms the best heuristic algorithm with a compact Q-value presentation in term of runtime for the same horizon, and has less memory usage for larger problems.

Original language	English
Article number	1430
Journal	Applied Sciences (Switzerland)
Volume	9
Issue number	7
DOIs	https://doi.org/10.3390/app9071430
Publication status	Published - 1 Apr 2019
Externally published	Yes

Keywords

Dec-POMDP
Monte Carlo
Multi-agent
Q-value function
Uncertainty

ASJC Scopus subject areas

General Materials Science
Instrumentation
General Engineering
Process Chemistry and Technology
Computer Science Applications
Fluid Flow and Transfer Processes

Access to Document

10.3390/app9071430

Cite this

@article{c11c6de7009541ccaa4d7c8b25055696,

title = "Multi-agent planning under uncertainty with monte carlo Q-value function",

abstract = " Decentralized partially observable Markov decision processes (Dec-POMDPs) are general multi-agent models for planning under uncertainty, but are intractable to solve. Doubly exponential growth of the search space as the horizon increases makes a brute-force search impossible. Heuristic methods can guide the search towards the right direction quickly and have been successful in different domains. In this paper, we propose a new Q-value function representation-Monte Carlo Q-value function Q MC , which is proved to be an upper bound of the optimal Q-value function Q*. We introduce two Monte Carlo tree search enhancements-heavy playout for a simulation policy and adaptive samples-to speed up computation of Q MC . Then, we present a clustering and expansion with Monte-Carlo algorithm (CEMC)-an offline planning algorithm using Q MC as Q-value function, which is based on the generalized multi-agent A* with incremental clustering and expansion (GMAA*-ICE or ICE). CEMC calculates Q-value functions as required, without computing and storing all Q-value functions. An extended policy pruning strategy is used in CEMC. Finally, we present empirical results demonstrating that CEMC outperforms the best heuristic algorithm with a compact Q-value presentation in term of runtime for the same horizon, and has less memory usage for larger problems.",

keywords = "Dec-POMDP, Monte Carlo, Multi-agent, Q-value function, Uncertainty",

author = "Jian Zhang and Yaozong Pan and Ruili Wang and Yuqiang Fang and Haitao Yang",

note = "Publisher Copyright: {\textcopyright} 2019 by the authors.",

year = "2019",

month = apr,

day = "1",

doi = "10.3390/app9071430",

language = "English",

volume = "9",

journal = "Applied Sciences (Switzerland)",

issn = "2076-3417",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "7",

}

TY - JOUR

T1 - Multi-agent planning under uncertainty with monte carlo Q-value function

AU - Zhang, Jian

AU - Pan, Yaozong

AU - Wang, Ruili

AU - Fang, Yuqiang

AU - Yang, Haitao

PY - 2019/4/1

Y1 - 2019/4/1

N2 - Decentralized partially observable Markov decision processes (Dec-POMDPs) are general multi-agent models for planning under uncertainty, but are intractable to solve. Doubly exponential growth of the search space as the horizon increases makes a brute-force search impossible. Heuristic methods can guide the search towards the right direction quickly and have been successful in different domains. In this paper, we propose a new Q-value function representation-Monte Carlo Q-value function Q MC , which is proved to be an upper bound of the optimal Q-value function Q*. We introduce two Monte Carlo tree search enhancements-heavy playout for a simulation policy and adaptive samples-to speed up computation of Q MC . Then, we present a clustering and expansion with Monte-Carlo algorithm (CEMC)-an offline planning algorithm using Q MC as Q-value function, which is based on the generalized multi-agent A* with incremental clustering and expansion (GMAA*-ICE or ICE). CEMC calculates Q-value functions as required, without computing and storing all Q-value functions. An extended policy pruning strategy is used in CEMC. Finally, we present empirical results demonstrating that CEMC outperforms the best heuristic algorithm with a compact Q-value presentation in term of runtime for the same horizon, and has less memory usage for larger problems.

AB - Decentralized partially observable Markov decision processes (Dec-POMDPs) are general multi-agent models for planning under uncertainty, but are intractable to solve. Doubly exponential growth of the search space as the horizon increases makes a brute-force search impossible. Heuristic methods can guide the search towards the right direction quickly and have been successful in different domains. In this paper, we propose a new Q-value function representation-Monte Carlo Q-value function Q MC , which is proved to be an upper bound of the optimal Q-value function Q*. We introduce two Monte Carlo tree search enhancements-heavy playout for a simulation policy and adaptive samples-to speed up computation of Q MC . Then, we present a clustering and expansion with Monte-Carlo algorithm (CEMC)-an offline planning algorithm using Q MC as Q-value function, which is based on the generalized multi-agent A* with incremental clustering and expansion (GMAA*-ICE or ICE). CEMC calculates Q-value functions as required, without computing and storing all Q-value functions. An extended policy pruning strategy is used in CEMC. Finally, we present empirical results demonstrating that CEMC outperforms the best heuristic algorithm with a compact Q-value presentation in term of runtime for the same horizon, and has less memory usage for larger problems.

KW - Dec-POMDP

KW - Monte Carlo

KW - Multi-agent

KW - Q-value function

KW - Uncertainty

UR - http://www.scopus.com/inward/record.url?scp=85064067941&partnerID=8YFLogxK

U2 - 10.3390/app9071430

DO - 10.3390/app9071430

M3 - Article

AN - SCOPUS:85064067941

SN - 2076-3417

VL - 9

JO - Applied Sciences (Switzerland)

JF - Applied Sciences (Switzerland)

IS - 7

M1 - 1430

ER -

Multi-agent planning under uncertainty with monte carlo Q-value function

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this