Abstract
Decentralized partially observable Markov decision processes (Dec-POMDPs) are general multi-agent models for planning under uncertainty, but are intractable to solve. Doubly exponential growth of the search space as the horizon increases makes a brute-force search impossible. Heuristic methods can guide the search towards the right direction quickly and have been successful in different domains. In this paper, we propose a new Q-value function representation-Monte Carlo Q-value function Q MC , which is proved to be an upper bound of the optimal Q-value function Q*. We introduce two Monte Carlo tree search enhancements-heavy playout for a simulation policy and adaptive samples-to speed up computation of Q MC . Then, we present a clustering and expansion with Monte-Carlo algorithm (CEMC)-an offline planning algorithm using Q MC as Q-value function, which is based on the generalized multi-agent A* with incremental clustering and expansion (GMAA*-ICE or ICE). CEMC calculates Q-value functions as required, without computing and storing all Q-value functions. An extended policy pruning strategy is used in CEMC. Finally, we present empirical results demonstrating that CEMC outperforms the best heuristic algorithm with a compact Q-value presentation in term of runtime for the same horizon, and has less memory usage for larger problems.
Original language | English |
---|---|
Article number | 1430 |
Journal | Applied Sciences (Switzerland) |
Volume | 9 |
Issue number | 7 |
DOIs | |
Publication status | Published - 1 Apr 2019 |
Externally published | Yes |
Keywords
- Dec-POMDP
- Monte Carlo
- Multi-agent
- Q-value function
- Uncertainty
ASJC Scopus subject areas
- General Materials Science
- Instrumentation
- General Engineering
- Process Chemistry and Technology
- Computer Science Applications
- Fluid Flow and Transfer Processes