A modified memory-based reinforcement learning method for solving POMDP problems

Lei Zheng, Siu Yeung Cho

Research output: Journal PublicationArticlepeer-review

7 Citations (Scopus)


Partially observable Markov decision processes (POMDP) provide a mathematical framework for agent planning under stochastic and partially observable environments. The classic Bayesian optimal solution can be obtained by transforming the problem into Markov decision process (MDP) using belief states. However, because the belief state space is continuous and multi-dimensional, the problem is highly intractable. Many practical heuristic based methods are proposed, but most of them require a complete POMDP model of the environment, which is not always practical. This article introduces a modified memory-based reinforcement learning algorithm called modified U-Tree that is capable of learning from raw sensor experiences with minimum prior knowledge. This article describes an enhancement of the original U-Tree's state generation process to make the generated model more compact, and also proposes a modification of the statistical test for reward estimation, which allows the algorithm to be benchmarked against some traditional model-based algorithms with a set of well known POMDP problems.

Original languageEnglish
Pages (from-to)187-200
Number of pages14
JournalNeural Processing Letters
Issue number2
Publication statusPublished - Apr 2011
Externally publishedYes


  • Markov decision processes
  • Memory-based reinforcement learning
  • Partially observable Markov decision processes
  • Reinforcement learning

ASJC Scopus subject areas

  • Software
  • Neuroscience (all)
  • Computer Networks and Communications
  • Artificial Intelligence


Dive into the research topics of 'A modified memory-based reinforcement learning method for solving POMDP problems'. Together they form a unique fingerprint.

Cite this