Go to main content
Formats
Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

We consider the problem of performing inverse reinforcement learning when the trajectory of the expert is not directly observable from learners view. Instead, a noisy observation of the trajectory is provided for the learner. This problem exhibits wide-ranging applications and the specific application we consider here is the scenario in which the learner tries to penetrate a perimeter patrolled by a robot. Since the learner is hidden in a secret location, it cannot observe the patroller. Therefore it does not have access to the experts trajectory. Instead, the learner can listen to the sound of the experts movement and estimate its state and action using an observation model. We treat the experts state and action as hidden data and present an algorithm based on expectation maximization and maximum entropy frameworks to solve the non-linear, non-convex problem. Previous work in this area only considers the state of the robot as hidden data and uses likelihood maximization of the observations. In contrast, our technique takes expectations over both state and action of the expert, enabling learning even in the presence of extreme noise and broader applications.

Details

PDF

Statistics

from
to
Export
Download Full History