Files
Abstract
A corpus is a collection of texts, which is an important language resource where we can observe how people actually use language. It has been widely used in various fields such as lexicography and natural language processing (e.g., Hanks, 2012; Pustejovsky & Stubbs, 2012) as well as linguistics. However, despite its importance, the Korean national corpora have not been updated since 2007. Also, the Yonsei Twitter Corpus, which is a large-scale Korean Twitter corpus, consists of old data. Thus, this paper aims to build a new Korean Twitter corpus on the basis of up-to-date data and present how to create a Korean Twitter corpus by means of Python.