Files
Abstract
Preparing an intelligent system in advance to respond optimally in every possible situation is difficult. Machine learning approaches like Inverse Reinforcement Learning can help learning behavior using a limited number of demonstrations. We present a model-free technique by applying maximum likelihood estimation to an IRL problem. To make our approach model-free, we model the environment using the canonical Markov Decision Process tuple, except we exclude the transition function. We define our reward function as a linear function of a known set of features. We use a modified Q-learning technique, called Q-Averaging. The direction for optimization is guided by the gradient of the likelihood function for current feature weights until the unknown reward function is identified. Experimental results over a grid world problem support our model-free representation of an IRL technique. We also extend our experiments to real-world freeway merging problem of autonomous cars and the results are significant.