Building a Korean Twitter Corpus using Python

Kim, Wonbin

Building a Korean Twitter Corpus using Python

Kim, Wonbin

2022

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Files

Abstract

A corpus is a collection of texts, which is an important language resource where we can observe how people actually use language. It has been widely used in various fields such as lexicography and natural language processing (e.g., Hanks, 2012; Pustejovsky & Stubbs, 2012) as well as linguistics. However, despite its importance, the Korean national corpora have not been updated since 2007. Also, the Yonsei Twitter Corpus, which is a large-scale Korean Twitter corpus, consists of old data. Thus, this paper aims to build a new Korean Twitter corpus on the basis of up-to-date data and present how to create a Korean Twitter corpus by means of Python.

Details

Record ID

21640

Record Created

2024-12-05

Title

Building a Korean Twitter Corpus using Python

Author

Kim, Wonbin

College or School

The Linguistics Society of UGA

Date

2022-07

Publisher

The Linguistics Society at UGA

Content Type

Working Paper

File Format

pdf

Language

English

Series

UGA Working Papers in Linguistics, Volume 5

Standard Rights Statement

ScholarWorksUGA Author Deposit Agreement

Record Appears in

Curated Collections > UGA Working Papers in Linguistics
Curated Collections > The Linguistics Society of UGA
All Resources
Publications

Note

UGA Working Papers in Linguistics

System Control Number

9949517685302959

PDF

Statistics

Download Full History