Analyzing the performance of machine learning algorithms on metagenomic data

Mahamuda, Vasim

Analyzing the performance of machine learning algorithms on metagenomic data

Mahamuda, Vasim

2010

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Metagenomics is a branch of bioinformatics that deals with the study and analysis of micro-organisms in natural environments. Some micro-organisms including many species of bacteria, archea and viruses should be studied in their natural habitat as these organisms cannot be cultivated in the laboratory by using standard techniques. Machine learning techniques are being applied to this field to predict novel genes. In this thesis, we try to address the issue of classifying metagenomic sequences. First, we compare the performance of several machine learning approaches including ensemble learners to identify which algorithms will be able to bin metagenomic data into taxa-specific bins with high accuracy. Then we do scalability studies to investigate how the performance of those algorithms degrades as the number of species in the metagenomic sample increases. We also study the performance degradation with the increase in the number of unknown sequences in the data. The results are very promising and show that machine learning algorithms perform very well in this domain. Futhermore, the performance degrades gracefully with the increase in the number of species and the number of unknown sequences.

Record Created

2024-12-05

Title

Analyzing the performance of machine learning algorithms on metagenomic data

Author

Mahamuda, Vasim

Contributor

Rasheed, Khaled Advisor
Arabnia, Hamid R. Committee Member
Potter, Walter D. Committee Member

College or School

College of Engineering

Department

School of Computing

Date

2010

Publisher

University of Georgia

Content Type

Thesis

Language

English

Dissertation/ Thesis Note

Graduate

Degree Type

Master of Science (MS)

Name of Granting Institution

University of Georgia, Summer 2010

Year Degree Granted

2010

Keywords

Binning, decision trees, machine learning, metagenomics, ensemble methods, supervised learning.

Record Appears in

College, School, or Unit > College of Engineering > School of Computing
Electronic Theses and Dissertations > Graduate Thesis
All Resources

System Control Number

9949333750402959

Download Full History

Analyzing the performance of machine learning algorithms on metagenomic data

Files

Abstract

Details

PDF

Statistics