Files
Abstract
Inverse reinforcement learning (IRL) seeks to learn the preferences of an expert agent performing a task from the experts demonstrations. More specifically, it seeks to find the reward function of the expert modelled as a Markov decision process from observations of its state-action trajectories. IRLs ability to use an expert agents demonstrations of real-world activities, such as driving, locomotion tasks, and other robotic tasks to build intelligent agents makes IRL significant. This research provides a novel method for preference learning by developing a model-based IRL algorithm for continuous action spaces. It generalizes a previous Bayesian approach to IRL to include continuous action spaces and uses the trust region policy optimization in the method. Action space densities are generated for each state using a random walk, and an online transition model is used. Our method learns the reward function of an expert agent with a continuous action space and uses this learned function to complete the underlying MDP and predict an optimal policy. Experimental results over a benchmark problem domain called Object-World and toward modelling driver behavior on congested freeways offer evidence about the benefits of this approach.