A Domain Adaptation Approach for Offensive Language Detection with Bidirectional Transformers

Singh, Sumer

A Domain Adaptation Approach for Offensive Language Detection with Bidirectional Transformers

Singh, Sumer

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Files

Abstract

Offensive language detection (OLD) has received increasing attention due to its societal impact. Recent work shows that bidirectional transformer (BERT) based methods obtain impressive performance on OLD. However, such methods usually rely on large OLD datasets for training. To address the issue of data scarcity in OLD, we propose an effective domain adaptation approach to train bidirectional transformers. Our approach introduces domain adaptation to A Lite BERT (ALBERT), such that it can effectively exploit auxiliary data from source domains to improve the OLD performance in a target domain. Two approaches to domain adaptation are taken. First, we use the auxiliary dataset in an unmodified manner. Next, we modify the auxiliary dataset labels to match the target labels. Experimental results show that the first approach, ALBERT (SA), obtains state-of-the-art performance in most cases. Particularly, our approach significantly benefits underrepresented and underperforming classes, with an improvement of about 40% over ALBERT.

Details

Record ID

6065

Record Created

2024-12-05

Title

A Domain Adaptation Approach for Offensive Language Detection with Bidirectional Transformers

Author

Singh, Sumer

Contributor

Li, Sheng Advisor
Rasheed, Khaled Committee Member
Maier, Frederick Committee Member

College or School

Franklin College of Arts and Sciences

Department

Institute for Artificial Intelligence

Subjects

Artificial intelligence

Content Type

Thesis

Pagination

44

File Format

pdf

Language

English

Degree Type

Master of Science (MS)

Name of Granting Institution

University of Georgia

Year Degree Granted

2020-07

Record Appears in

College, School, or Unit > Franklin College of Arts and Sciences
Electronic Theses and Dissertations > Graduate Thesis
All Resources

System Control Number

9949365852402959

PDF

Statistics

Download Full History