MULTIAGENT REINFORCEMENT LEARNING UNDER PARTIAL OBSERVABILITY

He, Keyang

MULTIAGENT REINFORCEMENT LEARNING UNDER PARTIAL OBSERVABILITY

He, Keyang

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Files

Abstract

Single agent reinforcement learning has accomplished extraordinary achievements in recent years. However, acting optimally in the multiagent setting is fundamentally more challenging. The multiagent setting shifts the focus from interacting with stationary environments to non-stationary agents. There will be allies and adversaries sharing the environment. Cooperating with allies and competing with adversaries become the key to reaching optimality. The relationships are not always unconditional. Allies can easily turn into adversaries, given a change in circumstances (and vice versa). In addition, other agents leaving or joining the environment could further exacerbate the non-stationarity. Therefore, modeling other agents is critical in a multiagent setting. To this end, I propose a multiagent cooperative-competitive domain named Organization to model the complex relationships in a multiagent setting. The Organization domain features mixed cooperation and competition, partial observability, agent openness, and history-dependent reward. The history-dependent reward is a bonus term based on the agent's previous performance.In order to find optimal policies for agents in the Organization domain, I introduce a multiagent reinforcement learning method named interactive advantage actor-critic (IA2C), where a belief filter is incorporated into the A2C network. The belief filter accurately predicts other agents' actions and thus drastically reduces the episodes required to converge. Next, I investigate scaling IA2C to the many-agent setting by utilizing permutation invariance of joint actions, which is a common property shared among most many-agent domains. I introduce many-agent IA2C, a new scalable actor-critic MARL algorithm built on the property of action anonymity that scales polynomially with the number of agents. Owing to a quadratic-time Dirichlet multinomial distribution approach for modeling agent populations under partial observability, IA2CDM is able to accurately and efficiently predict action configurations for large agent populations. Lastly, I introduce Latent Interactive A2C (LIA2C), which utilizes an encoder-decoder network to model the underlying hidden state and the predicted actions of the agent population. The latent embedding from the encoder-decoder leads to lower variance and improved sample complexity.

Details

Record ID

3368

Record Created

2024-12-05

Title

MULTIAGENT REINFORCEMENT LEARNING UNDER PARTIAL OBSERVABILITY

Author

He, Keyang

Contributor

Doshi, Prashant Advisor
Rasheed, Khaled Committee Member
Banerjee, Bikramjit Committee Member
Hong, Yi Committee Member

College or School

College of Engineering

Department

School of Computing

Content Type

Dissertation

Pagination

139

File Format

pdf

Language

English

Degree Type

Doctor of Philosophy (PHD)

Name of Granting Institution

University of Georgia

Year Degree Granted

2023-05

Record Appears in

Electronic Theses and Dissertations > Doctoral Dissertation
College of Engineering
All Resources
Doctoral

System Control Number

9949558724102959

PDF

Statistics

Download Full History