A memory-based reinforcement learning algorithm for partially observable Markovian decision processes

Lei Zheng, Siu Yeung Cho, Chai Quek

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)

Abstract

This paper presents a modified version of U-Tree [1], a memory-based reinforcement learning (RL) algorithm that uses selective perception and short-term memory to handle partially observable Markovian decision processes (POMDP). Conventional RL algorithms rely on a set of pre-defined states to model the environment, even though it can learn the state transitions from experience. U-Tree is not only able to do that, it can also build the state model by itself based on raw sensor inputs. This paper enhances U-Tree's model generation process. The paper also shows that because of the simplified and yet effective state model generated by U-Tree, it is feasible and preferable to adopt the classical Dynamic Programming (DP) algorithm for average reward MDP to solve some difficult POMDP problems. The new U-Tree is tested using a car-driving task with 31,224 world states, with the agent having very limited sensory information and little knowledge about the dynamics of the environment.

Original languageEnglish
Title of host publication2008 International Joint Conference on Neural Networks, IJCNN 2008
Pages800-805
Number of pages6
DOIs
Publication statusPublished - 2008
Externally publishedYes
Event2008 International Joint Conference on Neural Networks, IJCNN 2008 - Hong Kong, China
Duration: 1 Jun 20088 Jun 2008

Publication series

NameProceedings of the International Joint Conference on Neural Networks

Conference

Conference2008 International Joint Conference on Neural Networks, IJCNN 2008
Country/TerritoryChina
CityHong Kong
Period1/06/088/06/08

Keywords

  • Average reward
  • Dynamic programming
  • Partially obersvable markovian decision processs
  • Reinforcement learning algorithm

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'A memory-based reinforcement learning algorithm for partially observable Markovian decision processes'. Together they form a unique fingerprint.

Cite this