Mutual Information-based Machine Learning with Microarray Cancer Data

Jandaghi, Zahra

Mutual Information-based Machine Learning with Microarray Cancer Data

Jandaghi, Zahra

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Files

Abstract

Big data are indispensable for machine learning and complex data modeling. Such computation tasks with big data are expensive, requiring extensive computer memory and computing time. Thus methods are often sought that can scale down the amount of raw data or model size without compromising substantial information in the original data. Such approximation aims at reducing required computing resources while achieving high performance of the related machine learning tasks. In this research, we investigate the approximation issue in machine learning areas: feature selection to learn a small set of critical features in big data and neural network sparsification to determine a small set of pertinent connections between neurons in neural networks. An additional goal of this research is to reveal pertinent relationships across these two research areas.

Information theory has great potential in machine learning, offering an alternative way for information extraction from data and for the approximation of data models. In particular, mutual information-based methods have been developed for feature learning and sparsifying neural networks, nevertheless with mixed results. The previous work has yet to establish connections across the two machine learning areas. We propose a mutual information-based framework that aims at addressing the approximation issue in these two subtopics and revealing pertinent relationships between them.

In this research, the proposed mutual information-based is tested on a large amount of microarray gene expression of human cancer data for disease classification. Microarray expression data containing tens of thousands of genes are ideal for the evaluation of methods for both feature learning and neural network sparsification.

In particular, the significant gene subset identified by our method decreases the required number of genes to perform classification tasks ten to hundred folds and outperforms the previous methods' performance. Sparsification neural networks with mutual information between neuron outputs let us removing up to $90\%$ unnecessary connections while maintaining or even improving the performance. Our experiments reveals sparsified neural network ignores unimportant (irrelevant) genes and only considers significant or pseudo-significant genes that we identify in the first part of this research through gene filtering.

Details

Record ID

3530

Record Created

2024-12-05

Title

Mutual Information-based Machine Learning with Microarray Cancer Data

Author

Jandaghi, Zahra

Contributor

Cai, Liming Advisor
Rasheed, Khaled Committee Member
Arpinar, Ismailcem B. Committee Member
Huang, Xiuzhen Committee Member

College or School

College of Engineering

Department

School of Computing

Content Type

Dissertation

Pagination

71

File Format

pdf

Language

English

Degree Type

Doctor of Philosophy (PHD)

Name of Granting Institution

University of Georgia

Year Degree Granted

2022-12

Keywords

Feature Selection; Microarray; Mutual Information; Sparse Neural Networks

Record Appears in

College, School, or Unit > College of Engineering > School of Computing
Electronic Theses and Dissertations > Doctoral Dissertation
All Resources
Doctoral

System Control Number

9949515821902959

PDF

Statistics

Download Full History