Go to main content
Formats
Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Distributional corpus analysis (DCA) is an approach which reveals lexical relations using large-scale corpora and computational techniques in natural language processing. It has an advantage of processing and analyzing lexical relations in a quantitative, consistent, and objective way. Although the DCA approach allows analysts to process large-scale linguistic data efficiently, there are few studies using the DCA approach to investigate language phenomena within corpus linguistics. Therefore, this study aims to bridge the gap between the DCA approach and corpus linguistics by designing and describing DCA from the perspective of corpus linguistics. Specifically, this study uses the DCA approach to analyze the distributional behaviors of three Korean neologisms leyal, lwuce, and kay- and track semantic change of the three neologisms. For the analysis of distributional behaviors, Korean Twitter data spanning about ten years is collected and three state-of-the-art techniques are employed. For leyal, word2vec and cosine similarity are used and for lwuce, Latent Dirichlet Allocation is employed. For kay-, long short-term memory is utilized. Regarding kay-, its connotational and attitudinal meaning is investigated. The results from DCA show that (i) between the two meanings of leyal, ‘really’ has always been more dominant than ‘Real Madrid’, (ii) between the new and existing meanings of lwuce, the existing meaning has always been more dominant and the use of the new meaning most significantly decreased in 2015, and (iii) the semantic prosody of kay- has shifted from negative toward positive. This study has made several “first attempts”. First, this work is the first study using artificial intelligence and Korean social media data to analyze the distributional behaviors of Korean neologisms and track their semantic change over time. Secondly, this work is the first study showing DCA from the perspective of corpus linguistics. Thirdly, this work has established specific methods to validate the DCA approach using a collocation analysis in corpus linguistics for the first time. This study making several “first attempts” will be able to encourage interdisciplinary research between corpus linguistics and artificial intelligence as well as function as a foundational study upon which further DCA studies can build in corpus linguistics.

Details

PDF

Statistics

from
to
Export
Download Full History