A modified memory-based reinforcement learning method for solving POMDP problems

Lei Zheng; Siu Yeung Cho

doi:10.1007/s11063-011-9172-2

A modified memory-based reinforcement learning method for solving POMDP problems

Lei Zheng, Siu Yeung Cho

Research output: Journal Publication › Article › peer-review

9 Citations (Scopus)

Abstract

Partially observable Markov decision processes (POMDP) provide a mathematical framework for agent planning under stochastic and partially observable environments. The classic Bayesian optimal solution can be obtained by transforming the problem into Markov decision process (MDP) using belief states. However, because the belief state space is continuous and multi-dimensional, the problem is highly intractable. Many practical heuristic based methods are proposed, but most of them require a complete POMDP model of the environment, which is not always practical. This article introduces a modified memory-based reinforcement learning algorithm called modified U-Tree that is capable of learning from raw sensor experiences with minimum prior knowledge. This article describes an enhancement of the original U-Tree's state generation process to make the generated model more compact, and also proposes a modification of the statistical test for reward estimation, which allows the algorithm to be benchmarked against some traditional model-based algorithms with a set of well known POMDP problems.

Original language	English
Pages (from-to)	187-200
Number of pages	14
Journal	Neural Processing Letters
Volume	33
Issue number	2
DOIs	https://doi.org/10.1007/s11063-011-9172-2
Publication status	Published - Apr 2011
Externally published	Yes

Keywords

Markov decision processes
Memory-based reinforcement learning
Partially observable Markov decision processes
Reinforcement learning

ASJC Scopus subject areas

Software
General Neuroscience
Computer Networks and Communications
Artificial Intelligence

Access to Document

10.1007/s11063-011-9172-2

Cite this

@article{128ee90442d249e49f69e65690867cd7,

title = "A modified memory-based reinforcement learning method for solving POMDP problems",

abstract = "Partially observable Markov decision processes (POMDP) provide a mathematical framework for agent planning under stochastic and partially observable environments. The classic Bayesian optimal solution can be obtained by transforming the problem into Markov decision process (MDP) using belief states. However, because the belief state space is continuous and multi-dimensional, the problem is highly intractable. Many practical heuristic based methods are proposed, but most of them require a complete POMDP model of the environment, which is not always practical. This article introduces a modified memory-based reinforcement learning algorithm called modified U-Tree that is capable of learning from raw sensor experiences with minimum prior knowledge. This article describes an enhancement of the original U-Tree's state generation process to make the generated model more compact, and also proposes a modification of the statistical test for reward estimation, which allows the algorithm to be benchmarked against some traditional model-based algorithms with a set of well known POMDP problems.",

keywords = "Markov decision processes, Memory-based reinforcement learning, Partially observable Markov decision processes, Reinforcement learning",

author = "Lei Zheng and Cho, {Siu Yeung}",

year = "2011",

month = apr,

doi = "10.1007/s11063-011-9172-2",

language = "English",

volume = "33",

pages = "187--200",

journal = "Neural Processing Letters",

issn = "1370-4621",

publisher = "Springer Netherlands",

number = "2",

}

TY - JOUR

T1 - A modified memory-based reinforcement learning method for solving POMDP problems

AU - Zheng, Lei

AU - Cho, Siu Yeung

PY - 2011/4

Y1 - 2011/4

N2 - Partially observable Markov decision processes (POMDP) provide a mathematical framework for agent planning under stochastic and partially observable environments. The classic Bayesian optimal solution can be obtained by transforming the problem into Markov decision process (MDP) using belief states. However, because the belief state space is continuous and multi-dimensional, the problem is highly intractable. Many practical heuristic based methods are proposed, but most of them require a complete POMDP model of the environment, which is not always practical. This article introduces a modified memory-based reinforcement learning algorithm called modified U-Tree that is capable of learning from raw sensor experiences with minimum prior knowledge. This article describes an enhancement of the original U-Tree's state generation process to make the generated model more compact, and also proposes a modification of the statistical test for reward estimation, which allows the algorithm to be benchmarked against some traditional model-based algorithms with a set of well known POMDP problems.

AB - Partially observable Markov decision processes (POMDP) provide a mathematical framework for agent planning under stochastic and partially observable environments. The classic Bayesian optimal solution can be obtained by transforming the problem into Markov decision process (MDP) using belief states. However, because the belief state space is continuous and multi-dimensional, the problem is highly intractable. Many practical heuristic based methods are proposed, but most of them require a complete POMDP model of the environment, which is not always practical. This article introduces a modified memory-based reinforcement learning algorithm called modified U-Tree that is capable of learning from raw sensor experiences with minimum prior knowledge. This article describes an enhancement of the original U-Tree's state generation process to make the generated model more compact, and also proposes a modification of the statistical test for reward estimation, which allows the algorithm to be benchmarked against some traditional model-based algorithms with a set of well known POMDP problems.

KW - Markov decision processes

KW - Memory-based reinforcement learning

KW - Partially observable Markov decision processes

KW - Reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=79956064765&partnerID=8YFLogxK

U2 - 10.1007/s11063-011-9172-2

DO - 10.1007/s11063-011-9172-2

M3 - Article

AN - SCOPUS:79956064765

SN - 1370-4621

VL - 33

SP - 187

EP - 200

JO - Neural Processing Letters

JF - Neural Processing Letters

IS - 2

ER -

A modified memory-based reinforcement learning method for solving POMDP problems

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this