Multi-agent planning under uncertainty with monte carlo Q-value function

Jian Zhang, Yaozong Pan, Ruili Wang, Yuqiang Fang, Haitao Yang

Research output: Journal PublicationArticlepeer-review

Abstract

Decentralized partially observable Markov decision processes (Dec-POMDPs) are general multi-agent models for planning under uncertainty, but are intractable to solve. Doubly exponential growth of the search space as the horizon increases makes a brute-force search impossible. Heuristic methods can guide the search towards the right direction quickly and have been successful in different domains. In this paper, we propose a new Q-value function representation-Monte Carlo Q-value function Q MC , which is proved to be an upper bound of the optimal Q-value function Q*. We introduce two Monte Carlo tree search enhancements-heavy playout for a simulation policy and adaptive samples-to speed up computation of Q MC . Then, we present a clustering and expansion with Monte-Carlo algorithm (CEMC)-an offline planning algorithm using Q MC as Q-value function, which is based on the generalized multi-agent A* with incremental clustering and expansion (GMAA*-ICE or ICE). CEMC calculates Q-value functions as required, without computing and storing all Q-value functions. An extended policy pruning strategy is used in CEMC. Finally, we present empirical results demonstrating that CEMC outperforms the best heuristic algorithm with a compact Q-value presentation in term of runtime for the same horizon, and has less memory usage for larger problems.

Original languageEnglish
Article number1430
JournalApplied Sciences (Switzerland)
Volume9
Issue number7
DOIs
Publication statusPublished - 1 Apr 2019
Externally publishedYes

Keywords

  • Dec-POMDP
  • Monte Carlo
  • Multi-agent
  • Q-value function
  • Uncertainty

ASJC Scopus subject areas

  • General Materials Science
  • Instrumentation
  • General Engineering
  • Process Chemistry and Technology
  • Computer Science Applications
  • Fluid Flow and Transfer Processes

Fingerprint

Dive into the research topics of 'Multi-agent planning under uncertainty with monte carlo Q-value function'. Together they form a unique fingerprint.

Cite this