Go to main content
Formats
Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

In this thesis multiple advanced machine learning algorithms, such as random forests, boosted treess, support vector machine, etc., were applied to investigate the problem of malware file classification. Feature engineering procedures were performed on a large dataset ( 400G) of malware files provided by Kaggle.com. Four different feature sets were generated: filesizes and header string frequency, byte-sequence n-grams, opcode n-grams and image features. Each of these feature sets was studied individually at first, and then different combinations of them were investigated in detail. Moreover, the importance of different features was studied and discussed as well.

Details

PDF

Statistics

from
to
Export
Download Full History