Go to main content
Formats
Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Preparing an intelligent system in advance to respond optimally in every possible situation is difficult. Machine learning approaches like Inverse Reinforcement Learning can help learning behavior using a limited number of demonstrations. We present a model-free technique by applying maximum likelihood estimation to an IRL problem. To make our approach model-free, we model the environment using the canonical Markov Decision Process tuple, except we exclude the transition function. We define our reward function as a linear function of a known set of features. We use a modified Q-learning technique, called Q-Averaging. The direction for optimization is guided by the gradient of the likelihood function for current feature weights until the unknown reward function is identified. Experimental results over a grid world problem support our model-free representation of an IRL technique. We also extend our experiments to real-world freeway merging problem of autonomous cars and the results are significant.

Details

PDF

Statistics

from
to
Export
Download Full History