A memory-based reinforcement learning algorithm for partially observable Markovian decision processes

Lei Zheng; Siu Yeung Cho; Chai Quek

doi:10.1109/IJCNN.2008.4633888

A memory-based reinforcement learning algorithm for partially observable Markovian decision processes

Lei Zheng, Siu Yeung Cho, Chai Quek

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

2 Citations (Scopus)

Abstract

This paper presents a modified version of U-Tree [1], a memory-based reinforcement learning (RL) algorithm that uses selective perception and short-term memory to handle partially observable Markovian decision processes (POMDP). Conventional RL algorithms rely on a set of pre-defined states to model the environment, even though it can learn the state transitions from experience. U-Tree is not only able to do that, it can also build the state model by itself based on raw sensor inputs. This paper enhances U-Tree's model generation process. The paper also shows that because of the simplified and yet effective state model generated by U-Tree, it is feasible and preferable to adopt the classical Dynamic Programming (DP) algorithm for average reward MDP to solve some difficult POMDP problems. The new U-Tree is tested using a car-driving task with 31,224 world states, with the agent having very limited sensory information and little knowledge about the dynamics of the environment.

Original language	English
Title of host publication	2008 International Joint Conference on Neural Networks, IJCNN 2008
Pages	800-805
Number of pages	6
DOIs	https://doi.org/10.1109/IJCNN.2008.4633888
Publication status	Published - 2008
Externally published	Yes
Event	2008 International Joint Conference on Neural Networks, IJCNN 2008 - Hong Kong, China Duration: 1 Jun 2008 → 8 Jun 2008

Publication series

Name	Proceedings of the International Joint Conference on Neural Networks

Conference

Conference	2008 International Joint Conference on Neural Networks, IJCNN 2008
Country/Territory	China
City	Hong Kong
Period	1/06/08 → 8/06/08

Keywords

Average reward
Dynamic programming
Partially obersvable markovian decision processs
Reinforcement learning algorithm

ASJC Scopus subject areas

Software
Artificial Intelligence

Access to Document

10.1109/IJCNN.2008.4633888

Cite this

@inproceedings{4ce3389ecfe346c2a6a8e0463f44d2b7,

title = "A memory-based reinforcement learning algorithm for partially observable Markovian decision processes",

abstract = "This paper presents a modified version of U-Tree [1], a memory-based reinforcement learning (RL) algorithm that uses selective perception and short-term memory to handle partially observable Markovian decision processes (POMDP). Conventional RL algorithms rely on a set of pre-defined states to model the environment, even though it can learn the state transitions from experience. U-Tree is not only able to do that, it can also build the state model by itself based on raw sensor inputs. This paper enhances U-Tree's model generation process. The paper also shows that because of the simplified and yet effective state model generated by U-Tree, it is feasible and preferable to adopt the classical Dynamic Programming (DP) algorithm for average reward MDP to solve some difficult POMDP problems. The new U-Tree is tested using a car-driving task with 31,224 world states, with the agent having very limited sensory information and little knowledge about the dynamics of the environment.",

keywords = "Average reward, Dynamic programming, Partially obersvable markovian decision processs, Reinforcement learning algorithm",

author = "Lei Zheng and Cho, {Siu Yeung} and Chai Quek",

year = "2008",

doi = "10.1109/IJCNN.2008.4633888",

language = "English",

isbn = "9781424418213",

series = "Proceedings of the International Joint Conference on Neural Networks",

pages = "800--805",

booktitle = "2008 International Joint Conference on Neural Networks, IJCNN 2008",

}

Zheng, L, Cho, SY & Quek, C 2008, A memory-based reinforcement learning algorithm for partially observable Markovian decision processes. in 2008 International Joint Conference on Neural Networks, IJCNN 2008., 4633888, Proceedings of the International Joint Conference on Neural Networks, pp. 800-805, 2008 International Joint Conference on Neural Networks, IJCNN 2008, Hong Kong, China, 1/06/08. https://doi.org/10.1109/IJCNN.2008.4633888

A memory-based reinforcement learning algorithm for partially observable Markovian decision processes. / Zheng, Lei; Cho, Siu Yeung; Quek, Chai.
2008 International Joint Conference on Neural Networks, IJCNN 2008. 2008. p. 800-805 4633888 (Proceedings of the International Joint Conference on Neural Networks).

Research output: Chapter in Book/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - A memory-based reinforcement learning algorithm for partially observable Markovian decision processes

AU - Zheng, Lei

AU - Cho, Siu Yeung

AU - Quek, Chai

PY - 2008

Y1 - 2008

N2 - This paper presents a modified version of U-Tree [1], a memory-based reinforcement learning (RL) algorithm that uses selective perception and short-term memory to handle partially observable Markovian decision processes (POMDP). Conventional RL algorithms rely on a set of pre-defined states to model the environment, even though it can learn the state transitions from experience. U-Tree is not only able to do that, it can also build the state model by itself based on raw sensor inputs. This paper enhances U-Tree's model generation process. The paper also shows that because of the simplified and yet effective state model generated by U-Tree, it is feasible and preferable to adopt the classical Dynamic Programming (DP) algorithm for average reward MDP to solve some difficult POMDP problems. The new U-Tree is tested using a car-driving task with 31,224 world states, with the agent having very limited sensory information and little knowledge about the dynamics of the environment.

AB - This paper presents a modified version of U-Tree [1], a memory-based reinforcement learning (RL) algorithm that uses selective perception and short-term memory to handle partially observable Markovian decision processes (POMDP). Conventional RL algorithms rely on a set of pre-defined states to model the environment, even though it can learn the state transitions from experience. U-Tree is not only able to do that, it can also build the state model by itself based on raw sensor inputs. This paper enhances U-Tree's model generation process. The paper also shows that because of the simplified and yet effective state model generated by U-Tree, it is feasible and preferable to adopt the classical Dynamic Programming (DP) algorithm for average reward MDP to solve some difficult POMDP problems. The new U-Tree is tested using a car-driving task with 31,224 world states, with the agent having very limited sensory information and little knowledge about the dynamics of the environment.

KW - Average reward

KW - Dynamic programming

KW - Partially obersvable markovian decision processs

KW - Reinforcement learning algorithm

UR - http://www.scopus.com/inward/record.url?scp=56349146117&partnerID=8YFLogxK

U2 - 10.1109/IJCNN.2008.4633888

DO - 10.1109/IJCNN.2008.4633888

M3 - Conference contribution

AN - SCOPUS:56349146117

SN - 9781424418213

T3 - Proceedings of the International Joint Conference on Neural Networks

SP - 800

EP - 805

BT - 2008 International Joint Conference on Neural Networks, IJCNN 2008

T2 - 2008 International Joint Conference on Neural Networks, IJCNN 2008

Y2 - 1 June 2008 through 8 June 2008

ER -

A memory-based reinforcement learning algorithm for partially observable Markovian decision processes

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this