Go to main content
Formats
Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

This work presents two papers in which machine learning techniques are used to analyze genetic sequence information. In the first paper feature vectors were created from protein data of the influenza virus using N-gram methods common to text classification. A number of classifiers were trained on the feature vector and were successful in predicting influenza host organisms of corresponding viral strains. The best classifier achieved an accuracy of 97.2% on a set of over 700,000 sequences, the largest experiment of its kind to date. The second paper explores N-gramfeature vectors in phylogenetic construction. Methods are presented which speed up feature vector creation by 26% over the state of the art. GPU optimized functions were examined in the distance matrix calculation task of phylogenetic construction and showed up to a 33x speed up in comparison with CPU based methods.

Details

PDF

Statistics

from
to
Export
Download Full History