This paper presents a modified version of U-Tree , a memory-based reinforcement learning (RL) algorithm that uses selective perception and short-term memory to handle partially observable Markovian decision processes (POMDP). Conventional RL algorithms rely on a set of pre-defined states to model the environment, even though it can learn the state transitions from experience. U-Tree is not only able to do that, it can also build the state model by itself based on raw sensor inputs. This paper enhances U-Tree's model generation process. The paper also shows that because of the simplified and yet effective state model generated by U-Tree, it is feasible and preferable to adopt the classical Dynamic Programming (DP) algorithm for average reward MDP to solve some difficult POMDP problems. The new U-Tree is tested using a car-driving task with 31,224 world states, with the agent having very limited sensory information and little knowledge about the dynamics of the environment.