Reinforcement Based U-Tree: A Novel Approach for Solving POMDP

Lei Zheng; Siu Yeung Cho; Chai Quek

doi:10.1007/978-3-642-13639-9_9

Reinforcement Based U-Tree: A Novel Approach for Solving POMDP

Lei Zheng, Siu Yeung Cho, Chai Quek

Research output: Chapter in Book/Conference proceeding › Book Chapter › peer-review

2 Citations (Scopus)

Abstract

Partially observable Markov decision processes (POMDP) provide a mathematical framework for agent planning under stochastic and partially observable environments. The classic Bayesian optimal solution can be obtained by transforming the problem into Markov decision process using belief states. However, because the belief state space is continuous, the problem is highly intractable. Many practical heuristic based methods are proposed, but most of them require a complete prior knowledge of the environment. This article presents a memory-based reinforcement learning algorithm, namely Reinforcement based U-Tree, which is not only able to learn the state transitions from experience, but also build the state model by itself based on raw sensor inputs. This article describes an enhancement of the original U-Tree's state generation process to make the generated model more compact, and demonstrate its performance using a car-driving task with 31,224 world states. The article also presents a modification to the statistical test for reward estimation, which allows the algorithm to be benchmarked against some model-based algorithms with a set of well known POMDP problems.

Original language	English
Title of host publication	Handbook on Decision Making
Subtitle of host publication	Vol 1: Techniques and Applications
Editors	Lakhmi Jain, Chee Peng Lim
Pages	205-232
Number of pages	28
DOIs	https://doi.org/10.1007/978-3-642-13639-9_9
Publication status	Published - 2010
Externally published	Yes

Publication series

Name	Intelligent Systems Reference Library
Volume	4
ISSN (Print)	1868-4394
ISSN (Electronic)	1868-4408

Keywords

Dynamic programming
Markov decision processes
memory-based reinforcement learning
partially observable Markov decision processes
reinforcement learning

ASJC Scopus subject areas

General Computer Science
Information Systems and Management
Library and Information Sciences

Access to Document

10.1007/978-3-642-13639-9_9

Cite this

@inbook{300319fd1c4b4f3bb24b0ec28a49a875,

title = "Reinforcement Based U-Tree: A Novel Approach for Solving POMDP",

abstract = "Partially observable Markov decision processes (POMDP) provide a mathematical framework for agent planning under stochastic and partially observable environments. The classic Bayesian optimal solution can be obtained by transforming the problem into Markov decision process using belief states. However, because the belief state space is continuous, the problem is highly intractable. Many practical heuristic based methods are proposed, but most of them require a complete prior knowledge of the environment. This article presents a memory-based reinforcement learning algorithm, namely Reinforcement based U-Tree, which is not only able to learn the state transitions from experience, but also build the state model by itself based on raw sensor inputs. This article describes an enhancement of the original U-Tree's state generation process to make the generated model more compact, and demonstrate its performance using a car-driving task with 31,224 world states. The article also presents a modification to the statistical test for reward estimation, which allows the algorithm to be benchmarked against some model-based algorithms with a set of well known POMDP problems.",

keywords = "Dynamic programming, Markov decision processes, memory-based reinforcement learning, partially observable Markov decision processes, reinforcement learning",

author = "Lei Zheng and Cho, \{Siu Yeung\} and Chai Quek",

year = "2010",

doi = "10.1007/978-3-642-13639-9\_9",

language = "English",

isbn = "9783642136382",

series = "Intelligent Systems Reference Library",

pages = "205--232",

editor = "Lakhmi Jain and Lim, \{Chee Peng\}",

booktitle = "Handbook on Decision Making",

}

TY - CHAP

T1 - Reinforcement Based U-Tree

T2 - A Novel Approach for Solving POMDP

AU - Zheng, Lei

AU - Cho, Siu Yeung

AU - Quek, Chai

PY - 2010

Y1 - 2010

N2 - Partially observable Markov decision processes (POMDP) provide a mathematical framework for agent planning under stochastic and partially observable environments. The classic Bayesian optimal solution can be obtained by transforming the problem into Markov decision process using belief states. However, because the belief state space is continuous, the problem is highly intractable. Many practical heuristic based methods are proposed, but most of them require a complete prior knowledge of the environment. This article presents a memory-based reinforcement learning algorithm, namely Reinforcement based U-Tree, which is not only able to learn the state transitions from experience, but also build the state model by itself based on raw sensor inputs. This article describes an enhancement of the original U-Tree's state generation process to make the generated model more compact, and demonstrate its performance using a car-driving task with 31,224 world states. The article also presents a modification to the statistical test for reward estimation, which allows the algorithm to be benchmarked against some model-based algorithms with a set of well known POMDP problems.

AB - Partially observable Markov decision processes (POMDP) provide a mathematical framework for agent planning under stochastic and partially observable environments. The classic Bayesian optimal solution can be obtained by transforming the problem into Markov decision process using belief states. However, because the belief state space is continuous, the problem is highly intractable. Many practical heuristic based methods are proposed, but most of them require a complete prior knowledge of the environment. This article presents a memory-based reinforcement learning algorithm, namely Reinforcement based U-Tree, which is not only able to learn the state transitions from experience, but also build the state model by itself based on raw sensor inputs. This article describes an enhancement of the original U-Tree's state generation process to make the generated model more compact, and demonstrate its performance using a car-driving task with 31,224 world states. The article also presents a modification to the statistical test for reward estimation, which allows the algorithm to be benchmarked against some model-based algorithms with a set of well known POMDP problems.

KW - Dynamic programming

KW - Markov decision processes

KW - memory-based reinforcement learning

KW - partially observable Markov decision processes

KW - reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=84885405576&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-13639-9_9

DO - 10.1007/978-3-642-13639-9_9

M3 - Book Chapter

AN - SCOPUS:84885405576

SN - 9783642136382

T3 - Intelligent Systems Reference Library

SP - 205

EP - 232

BT - Handbook on Decision Making

A2 - Jain, Lakhmi

A2 - Lim, Chee Peng

ER -

Reinforcement Based U-Tree: A Novel Approach for Solving POMDP

Abstract

Publication series

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this