osnaDocs: Reinforcement Learning with History Lists

Reinforcement Learning with History Lists

Bitte benutzen Sie diese Kennung, um auf die Ressource zu verweisen:
https://osnadocs.ub.uni-osnabrueck.de/handle/urn:nbn:de:gbv:700-2009031619

Langanzeige der Metadaten

DC Element	Wert	Sprache
dc.contributor.advisor	Prof. Dr. Martin Riedmiller
dc.creator	Timmer, Stephan
dc.date.accessioned	2010-01-30T14:53:46Z
dc.date.available	2010-01-30T14:53:46Z
dc.date.issued	2009-03-13T10:25:17Z
dc.date.submitted	2009-03-13T10:25:17Z
dc.identifier.uri	https://osnadocs.ub.uni-osnabrueck.de/handle/urn:nbn:de:gbv:700-2009031619	-
dc.description.abstract	A very general framework for modeling uncertainty in learning environments is given by Partially Observable Markov Decision Processes (POMDPs). In a POMDP setting, the learning agent infers a policy for acting optimally in all possible states of the environment, while receiving only observations of these states. The basic idea for coping with partial observability is to include memory into the representation of the policy. Perfect memory is provided by the belief space, i.e. the space of probability distributions over environmental states. However, computing policies defined on the belief space requires a considerable amount of prior knowledge about the learning problem and is expensive in terms of computation time. In this thesis, we present a reinforcement learning algorithm for solving deterministic POMDPs based on short-term memory. Short-term memory is implemented by sequences of past observations and actions which are called history lists. In contrast to belief states, history lists are not capable of representing optimal policies, but are far more practical and require no prior knowledge about the learning problem. The algorithm presented learns policies consisting of two separate phases. During the first phase, the learning agent collects information by actively establishing a history list identifying the current state. This phase is called the efficient identification strategy. After the current state has been determined, the Q-Learning algorithm is used to learn a near optimal policy. We show that such a procedure can be also used to solve large Markov Decision Processes (MDPs). Solving MDPs with continuous, multi-dimensional state spaces requires some form of abstraction over states. One particular way of establishing such abstraction is to ignore the original state information, only considering features of states. This form of state abstraction is closely related to POMDPs, since features of states can be interpreted as observations of states.	eng
dc.language.iso	eng
dc.subject	Reinforcement Learning
dc.subject	POMDP
dc.subject	State Abstraction
dc.subject	Short-Term Memory
dc.subject.ddc	004 - Informatik	ger
dc.title	Reinforcement Learning with History Lists	eng
dc.type	Dissertation oder Habilitation [doctoralThesis]	-
thesis.location	Osnabrück	-
thesis.institution	Universität	-
thesis.type	Dissertation [thesis.doctoral]	-
thesis.date	2009-02-06T12:00:00Z	-
elib.elibid	873	-
elib.marc.edt	jost	-
elib.dct.accessRights	a	-
elib.dct.created	2009-03-12T15:40:06Z	-
elib.dct.modified	2009-03-13T10:25:17Z	-
dc.contributor.referee	Prof. Dr. Kai-Uwe Kühnberger
dc.subject.dnb	27 - Mathematik	ger
dc.subject.dnb	28 - Informatik, Datenverarbeitung	ger
dc.subject.ccs	I.2.6 - Learning	eng
vCard.ORG	FB6	ger
Enthalten in den Sammlungen:	FB06 - E-Dissertationen

Dateien zu dieser Ressource:

Datei	Beschreibung	Größe	Format
E-Diss873_thesis.pdf	Präsentationsformat	1,06 MB	Adobe PDF	E-Diss873_thesis.pdf Öffnen/Anzeigen

Zur Kurzanzeige

Alle Ressourcen im Repositorium osnaDocs sind urheberrechtlich geschützt, soweit nicht anderweitig angezeigt. rightsstatements.org