Comparison of the effects lexical and ontological information on text categorization

Koirala, Cesar

Comparison of the effects lexical and ontological information on text categorization

Koirala, Cesar

2008

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Files

Abstract

This thesis compares the effectiveness of using lexical and ontological information for text categorization. Lexical information has been induced using stemmed features. Ontological information, on the other hand, has been induced in the form of WordNet hypernyms. Text representations based on stemming and WordNet hypernyms were evaluated using four different machine learning algorithms on two datasets. The research reports average F1 measures as the results. The results show that, for the larger dataset, stemming-based text representation gives better performance than hypernym-based text representation even though the later uses a novel hypernym formation approach. However, for the smaller data set with relatively lower feature overlap, hypernym-based text representations produce results that are comparable to the stemming-based text representation. The results also indicate that combining stemming-based representation and hypernym-based representation produces an improvement in the performance for the smaller dataset.

Details

Record ID

19231

Record Created

2024-12-05

Title

Comparison of the effects lexical and ontological information on text categorization

Author

Koirala, Cesar

Contributor

Rasheed, Khaled Advisor
Potter, Walter D. Committee Member
Unsworth, Nash Committee Member

College or School

Franklin College of Arts and Sciences

Department

Institute for Artificial Intelligence

Date

2008

Publisher

University of Georgia

Content Type

Thesis

Language

English

Dissertation/ Thesis Note

Graduate

Degree Type

Master of Science (MS)

Name of Granting Institution

University of Georgia, Summer 2008

Year Degree Granted

2008

Keywords

Text Categorization; Stemming; WordNet Hypernyms; Machine Learning

Record Appears in

College, School, or Unit > Franklin College of Arts and Sciences
Electronic Theses and Dissertations > Graduate Thesis
All Resources

System Control Number

9949333222502959

PDF

Statistics

Download Full History