Reinforcement Learning with History Lists

Please use this identifier to cite or link to this item:
Open Access logo originally created by the Public Library of Science (PLoS)
Title: Reinforcement Learning with History Lists
Authors: Timmer, Stephan
Thesis advisor: Prof. Dr. Martin Riedmiller
Thesis referee: Prof. Dr. Kai-Uwe K├╝hnberger
Abstract: A very general framework for modeling uncertainty in learning environments is given by Partially Observable Markov Decision Processes (POMDPs). In a POMDP setting, the learning agent infers a policy for acting optimally in all possible states of the environment, while receiving only observations of these states. The basic idea for coping with partial observability is to include memory into the representation of the policy. Perfect memory is provided by the belief space, i.e. the space of probability distributions over environmental states. However, computing policies defined on the belief space requires a considerable amount of prior knowledge about the learning problem and is expensive in terms of computation time. In this thesis, we present a reinforcement learning algorithm for solving deterministic POMDPs based on short-term memory. Short-term memory is implemented by sequences of past observations and actions which are called history lists. In contrast to belief states, history lists are not capable of representing optimal policies, but are far more practical and require no prior knowledge about the learning problem. The algorithm presented learns policies consisting of two separate phases. During the first phase, the learning agent collects information by actively establishing a history list identifying the current state. This phase is called the efficient identification strategy. After the current state has been determined, the Q-Learning algorithm is used to learn a near optimal policy. We show that such a procedure can be also used to solve large Markov Decision Processes (MDPs). Solving MDPs with continuous, multi-dimensional state spaces requires some form of abstraction over states. One particular way of establishing such abstraction is to ignore the original state information, only considering features of states. This form of state abstraction is closely related to POMDPs, since features of states can be interpreted as observations of states.
Subject Keywords: Reinforcement Learning; POMDP; State Abstraction; Short-Term Memory
Issue Date: 13-Mar-2009
Type of publication: Dissertation oder Habilitation [doctoralThesis]
Appears in Collections:FB06 - E-Dissertationen

Files in This Item:
File Description SizeFormat 
E-Diss873_thesis.pdfPr├Ąsentationsformat1,06 MBAdobe PDF

Items in osnaDocs repository are protected by copyright, with all rights reserved, unless otherwise indicated.