Evolutionary instance resampling for difficult data sets

Richardson, William Dale

Evolutionary instance resampling for difficult data sets

Richardson, William Dale

2013

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Files

Abstract

In the field of machine learning, properties of data sets such as class imbalance and overlap often pose difficulties for classifier algorithms. A number of methods alleviate these difficulties by adjusting the distribution of the training data prior to classifier construction. Resampling is typically effected by weighting, removing, or duplicating instances, but finding a good resampling for the data set is a nontrivial problem. Genetic algorithms are frequently used to search for solutions in large, difficult search spaces. In this thesis, four evolutionary approaches are applied to the problem of instance resampling across a variety of data sets and classifier paradigms. In many cases, evolutionary pre-processing is able to produce better classifiers. In particular, an integer-based, one-to-one representation and a cluster-based, real-valued weighting encoding are shown to improve classifier performance on difficult data sets.

Details

Record ID

14097

Record Created

2024-12-05

Title

Evolutionary instance resampling for difficult data sets

Author

Richardson, William Dale

Contributor

Rasheed, Khaled Advisor
Doshi, Prashant Committee Member
Potter, Walter D. Committee Member

College or School

Franklin College of Arts and Sciences

Department

Institute for Artificial Intelligence

Date

2013

Publisher

University of Georgia

Content Type

Thesis

Language

English

Dissertation/ Thesis Note

Graduate

Degree Type

Master of Science (MS)

Name of Granting Institution

University of Georgia, Winter 2013

Year Degree Granted

2013

Keywords

machine learning; imbalance; undersampling; oversampling; instance selection

Record Appears in

College, School, or Unit > Franklin College of Arts and Sciences
Electronic Theses and Dissertations > Graduate Thesis
All Resources

System Control Number

9949334128402959

PDF

Statistics

Download Full History