TY - GEN
T1 - A memory-based reinforcement learning algorithm for partially observable Markovian decision processes
AU - Zheng, Lei
AU - Cho, Siu Yeung
AU - Quek, Chai
N1 - Copyright:
Copyright 2008 Elsevier B.V., All rights reserved.
PY - 2008
Y1 - 2008
N2 - This paper presents a modified version of U-Tree [1], a memory-based reinforcement learning (RL) algorithm that uses selective perception and short-term memory to handle partially observable Markovian decision processes (POMDP). Conventional RL algorithms rely on a set of pre-defined states to model the environment, even though it can learn the state transitions from experience. U-Tree is not only able to do that, it can also build the state model by itself based on raw sensor inputs. This paper enhances U-Tree's model generation process. The paper also shows that because of the simplified and yet effective state model generated by U-Tree, it is feasible and preferable to adopt the classical Dynamic Programming (DP) algorithm for average reward MDP to solve some difficult POMDP problems. The new U-Tree is tested using a car-driving task with 31,224 world states, with the agent having very limited sensory information and little knowledge about the dynamics of the environment.
AB - This paper presents a modified version of U-Tree [1], a memory-based reinforcement learning (RL) algorithm that uses selective perception and short-term memory to handle partially observable Markovian decision processes (POMDP). Conventional RL algorithms rely on a set of pre-defined states to model the environment, even though it can learn the state transitions from experience. U-Tree is not only able to do that, it can also build the state model by itself based on raw sensor inputs. This paper enhances U-Tree's model generation process. The paper also shows that because of the simplified and yet effective state model generated by U-Tree, it is feasible and preferable to adopt the classical Dynamic Programming (DP) algorithm for average reward MDP to solve some difficult POMDP problems. The new U-Tree is tested using a car-driving task with 31,224 world states, with the agent having very limited sensory information and little knowledge about the dynamics of the environment.
KW - Average reward
KW - Dynamic programming
KW - Partially obersvable markovian decision processs
KW - Reinforcement learning algorithm
UR - http://www.scopus.com/inward/record.url?scp=56349146117&partnerID=8YFLogxK
U2 - 10.1109/IJCNN.2008.4633888
DO - 10.1109/IJCNN.2008.4633888
M3 - Conference contribution
AN - SCOPUS:56349146117
SN - 9781424418213
T3 - Proceedings of the International Joint Conference on Neural Networks
SP - 800
EP - 805
BT - 2008 International Joint Conference on Neural Networks, IJCNN 2008
T2 - 2008 International Joint Conference on Neural Networks, IJCNN 2008
Y2 - 1 June 2008 through 8 June 2008
ER -