Files
Abstract
For high dimensional and low sample size (HDLSS) data, traditional Canonical CorrelationAnalysis (CCA) faces difficulty in execution and also in interpretation. When we have more
variables than observations, it encounters an issue with the computation of inverses of sample
covariance matrices. Also, interpretation of results from CCA focuses on the magnitudes
of the loadings in canonical vectors, but it can be subjective. When more than two data
sets are in use, the difficulty in execution and interpretation becomes more complex. In this
dissertation, we develop two different sparse canonical correlation methods based on soft-
thresholding that can be applied to more than two datasets and can assess the importance
of variables by controlling sparsity parameters in HDLSS. We investigate the performance of
the proposed methods comprehensively and compare them with existing approaches through
an extensive simulation study. We then apply the proposed methods to the multimodal
HDLSS real data analysis of a stroke related clinical study on pigs to address identification
of key biomarkers and pattern of recovery from stroke based on physiological changes. Since,
pigs and humans share many anatomical similarities, this study helps in understanding the
recovery process for humans affected by ischemic stroke.