Reinforcement Learning with History Lists

Please use this identifier to cite or link to this item:
https://osnadocs.ub.uni-osnabrueck.de/handle/urn:nbn:de:gbv:700-2009031619
Open Access logo originally created by the Public Library of Science (PLoS)
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorProf. Dr. Martin Riedmiller
dc.creatorTimmer, Stephan
dc.date.accessioned2010-01-30T14:53:46Z
dc.date.available2010-01-30T14:53:46Z
dc.date.issued2009-03-13T10:25:17Z
dc.date.submitted2009-03-13T10:25:17Z
dc.identifier.urihttps://repositorium.ub.uni-osnabrueck.de/handle/urn:nbn:de:gbv:700-2009031619-
dc.description.abstractA very general framework for modeling uncertainty in learning environments is given by Partially Observable Markov Decision Processes (POMDPs). In a POMDP setting, the learning agent infers a policy for acting optimally in all possible states of the environment, while receiving only observations of these states. The basic idea for coping with partial observability is to include memory into the representation of the policy. Perfect memory is provided by the belief space, i.e. the space of probability distributions over environmental states. However, computing policies defined on the belief space requires a considerable amount of prior knowledge about the learning problem and is expensive in terms of computation time. In this thesis, we present a reinforcement learning algorithm for solving deterministic POMDPs based on short-term memory. Short-term memory is implemented by sequences of past observations and actions which are called history lists. In contrast to belief states, history lists are not capable of representing optimal policies, but are far more practical and require no prior knowledge about the learning problem. The algorithm presented learns policies consisting of two separate phases. During the first phase, the learning agent collects information by actively establishing a history list identifying the current state. This phase is called the efficient identification strategy. After the current state has been determined, the Q-Learning algorithm is used to learn a near optimal policy. We show that such a procedure can be also used to solve large Markov Decision Processes (MDPs). Solving MDPs with continuous, multi-dimensional state spaces requires some form of abstraction over states. One particular way of establishing such abstraction is to ignore the original state information, only considering features of states. This form of state abstraction is closely related to POMDPs, since features of states can be interpreted as observations of states.eng
dc.language.isoeng
dc.subjectReinforcement Learning
dc.subjectPOMDP
dc.subjectState Abstraction
dc.subjectShort-Term Memory
dc.subject.ddc004 - Informatikger
dc.titleReinforcement Learning with History Listseng
dc.typeDissertation oder Habilitation [doctoralThesis]-
thesis.locationOsnabrück-
thesis.institutionUniversität-
thesis.typeDissertation [thesis.doctoral]-
thesis.date2009-02-06T12:00:00Z-
elib.elibid873-
elib.marc.edtjost-
elib.dct.accessRightsa-
elib.dct.created2009-03-12T15:40:06Z-
elib.dct.modified2009-03-13T10:25:17Z-
dc.contributor.refereeProf. Dr. Kai-Uwe Kühnberger
dc.subject.dnb27 - Mathematikger
dc.subject.dnb28 - Informatik, Datenverarbeitungger
dc.subject.ccsI.2.6 - Learningeng
vCard.ORGFB6ger
Appears in Collections:FB06 - E-Dissertationen

Files in This Item:
File Description SizeFormat 
E-Diss873_thesis.pdfPräsentationsformat1,06 MBAdobe PDF
E-Diss873_thesis.pdf
Thumbnail
View/Open


Items in osnaDocs repository are protected by copyright, with all rights reserved, unless otherwise indicated. rightsstatements.org