TY - CHAP
T1 - Reinforcement Based U-Tree
T2 - A Novel Approach for Solving POMDP
AU - Zheng, Lei
AU - Cho, Siu Yeung
AU - Quek, Chai
N1 - Copyright:
Copyright 2013 Elsevier B.V., All rights reserved.
PY - 2010
Y1 - 2010
N2 - Partially observable Markov decision processes (POMDP) provide a mathematical framework for agent planning under stochastic and partially observable environments. The classic Bayesian optimal solution can be obtained by transforming the problem into Markov decision process using belief states. However, because the belief state space is continuous, the problem is highly intractable. Many practical heuristic based methods are proposed, but most of them require a complete prior knowledge of the environment. This article presents a memory-based reinforcement learning algorithm, namely Reinforcement based U-Tree, which is not only able to learn the state transitions from experience, but also build the state model by itself based on raw sensor inputs. This article describes an enhancement of the original U-Tree's state generation process to make the generated model more compact, and demonstrate its performance using a car-driving task with 31,224 world states. The article also presents a modification to the statistical test for reward estimation, which allows the algorithm to be benchmarked against some model-based algorithms with a set of well known POMDP problems.
AB - Partially observable Markov decision processes (POMDP) provide a mathematical framework for agent planning under stochastic and partially observable environments. The classic Bayesian optimal solution can be obtained by transforming the problem into Markov decision process using belief states. However, because the belief state space is continuous, the problem is highly intractable. Many practical heuristic based methods are proposed, but most of them require a complete prior knowledge of the environment. This article presents a memory-based reinforcement learning algorithm, namely Reinforcement based U-Tree, which is not only able to learn the state transitions from experience, but also build the state model by itself based on raw sensor inputs. This article describes an enhancement of the original U-Tree's state generation process to make the generated model more compact, and demonstrate its performance using a car-driving task with 31,224 world states. The article also presents a modification to the statistical test for reward estimation, which allows the algorithm to be benchmarked against some model-based algorithms with a set of well known POMDP problems.
KW - Dynamic programming
KW - Markov decision processes
KW - memory-based reinforcement learning
KW - partially observable Markov decision processes
KW - reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=84885405576&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-13639-9_9
DO - 10.1007/978-3-642-13639-9_9
M3 - Book Chapter
AN - SCOPUS:84885405576
SN - 9783642136382
T3 - Intelligent Systems Reference Library
SP - 205
EP - 232
BT - Handbook on Decision Making
A2 - Jain, Lakhmi
A2 - Lim, Chee Peng
ER -