Files
Abstract
Autonomous systems predominantly deploy IRL (inverse reinforcement learning) to model the task preferences of a user (often called an expert), as a reward function, by observing the user while executing the task. While IRL is witnessing sustained attention, the related problem of online IRL– where the observations are incrementally accrued, yet the real-time demands of the application often prohibit a full rerun of an IRL method – has received much less attention. Furthermore, most of the current online learning literature assumes perfect noise-free completely perceivable training data along with a prior knowledge of features for the model of task being learned. Unfortunately, these assumptions do not hold in real life applications. The following data imperfections and lack of prior knowledge impact learning accuracy: 1) some of the data in input trajectories is missing, 2) data is mixed up with some data from other sources, 3) input data has perception noise and observation model is unknown, 4) input data has perception noise and manual engineering of system features is not possible. The research contributions from my team address these gaps. Experimental evaluation of these cases on robotic domains (navigation and manipulation) and OpenAI gym domains showed a significant improvement in performance w.r.t. state-of-the-art baselines.