Go to main content
Formats
Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Offensive language detection (OLD) has received increasing attention due to its societal impact. Recent work shows that bidirectional transformer (BERT) based methods obtain impressive performance on OLD. However, such methods usually rely on large OLD datasets for training. To address the issue of data scarcity in OLD, we propose an effective domain adaptation approach to train bidirectional transformers. Our approach introduces domain adaptation to A Lite BERT (ALBERT), such that it can effectively exploit auxiliary data from source domains to improve the OLD performance in a target domain. Two approaches to domain adaptation are taken. First, we use the auxiliary dataset in an unmodified manner. Next, we modify the auxiliary dataset labels to match the target labels. Experimental results show that the first approach, ALBERT (SA), obtains state-of-the-art performance in most cases. Particularly, our approach significantly benefits underrepresented and underperforming classes, with an improvement of about 40% over ALBERT.

Details

PDF

Statistics

from
to
Export
Download Full History