Go to main content
Formats
Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Machine Learning (ML) methods have been increasingly employed in the genetics domain. ML methods have shown promise in the field of characterizing genetic mutations. Mutations can have significant impact on the activity of the Human Epidermal Growth Factor Receptor (EGFR), a protein instrumental in cell proliferation. Over-activation of EGFR is a major cause of tumor growth. Although many computational methods have been proposed to identify disease causing mutations, these methods are not designed to predict mutation impact on protein activity. We explored feature selection strategies suitable for the small, complex data within this domain and tested a variety of machine learning algorithms. We generated a model achieving 85.9% accuracy and an F-Measure of 0.70 with a Support Vector Machine with a Gaussian radial basis function kernel using a set of 6 features. This classifier combined with others using weighted probability voting achieved an area under the ROC curve of 0.83.

Details

PDF

Statistics

from
to
Export
Download Full History