Go to main content
Formats
Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Single agent reinforcement learning has accomplished extraordinary achievements in recent years. However, acting optimally in the multiagent setting is fundamentally more challenging. The multiagent setting shifts the focus from interacting with stationary environments to non-stationary agents. There will be allies and adversaries sharing the environment. Cooperating with allies and competing with adversaries become the key to reaching optimality. The relationships are not always unconditional. Allies can easily turn into adversaries, given a change in circumstances (and vice versa). In addition, other agents leaving or joining the environment could further exacerbate the non-stationarity. Therefore, modeling other agents is critical in a multiagent setting. To this end, I propose a multiagent cooperative-competitive domain named Organization to model the complex relationships in a multiagent setting. The Organization domain features mixed cooperation and competition, partial observability, agent openness, and history-dependent reward. The history-dependent reward is a bonus term based on the agent's previous performance.In order to find optimal policies for agents in the Organization domain, I introduce a multiagent reinforcement learning method named interactive advantage actor-critic (IA2C), where a belief filter is incorporated into the A2C network. The belief filter accurately predicts other agents' actions and thus drastically reduces the episodes required to converge. Next, I investigate scaling IA2C to the many-agent setting by utilizing permutation invariance of joint actions, which is a common property shared among most many-agent domains. I introduce many-agent IA2C, a new scalable actor-critic MARL algorithm built on the property of action anonymity that scales polynomially with the number of agents. Owing to a quadratic-time Dirichlet multinomial distribution approach for modeling agent populations under partial observability, IA2CDM is able to accurately and efficiently predict action configurations for large agent populations. Lastly, I introduce Latent Interactive A2C (LIA2C), which utilizes an encoder-decoder network to model the underlying hidden state and the predicted actions of the agent population. The latent embedding from the encoder-decoder leads to lower variance and improved sample complexity.

Details

PDF

Statistics

from
to
Export
Download Full History