Files
Abstract
Artificial intelligence (AI) has become an essential tool in medical data analysis. This dissertation presents new AI methods for medical data analysis, focusing on improving classification and prediction tasks. The dissertation is divided into three projects, and the first two are dealing with medical image classification problems. Unlike natural image datasets, the class labels in medical image datasets are usually severely imbalanced and with limited sample sizes subject to the availability of patients, and aggregating medical images from multiple sources can be challenging due to policy restrictions, privacy concerns, communication costs, and data heterogeneity caused by equipment differences and labeling discrepancies. Further, medical image classes are often highly overlapped. In the first project of this dissertation, we propose to address the imbalanced data and communication issues with the help of transfer learning and artificial samples created by generative models. Instead of requesting medical images from source data, our method only needs a parsimonious supplement of model parameters pre-trained on the source data. The second project introduces an augmented tensor factor model to address the challenge of overlapping issues in image classification. Our model combines samples from multiple classes and decomposes the tensor data into a pervasive component that shares the same pattern across classes and a specific component that varies from class to class. Finally, the third project studies statistical and machine learning models for COVID-19 mortality prediction, where the proposed SARIMAX model with exogenous hospital information variables improves prediction accuracy and outperformed most of the models from the CDC. This dissertation presents new insights and solutions to important problems in medical data analysis, highlighting the potential of artificial intelligence in healthcare applications.