Files
Abstract
Supervised learning problems in high-dimensional settings have a wide range of applications across different disciplines, such as the predictions using high-throughput data from molecular biology. High dimensionality poses many challenges to traditional supervised learning problems and has captured great attention in the statistics and machine learning community. One solution is to use regularization methods. This dissertation considers new regularization approaches for high-dimensional data under the context of two supervised learning topics. The first topic concerns the ordinal classification problems which lie between standard classification and regression. We propose two novel methods that consider a new regularization idea, which weights the features by calculating their rank correlations with the class labels. In the first method, we incorporate the feature weights into the framework of linear discriminant analysis and add the group Lasso penalty to achieve sparse solutions. In the second method, we add the weights into sparse optimal scoring with an adaptive Lasso penalty. Both of the proposed methods can project the original data onto a lower-dimensional subspace which reveals the underlying ordinal structure. This distinguishes our methods from existing work which assume a strict underlying linear ordinality within the data. We also demonstrate the difference between linear and nonlinear ordinality and show that our methods are capable of detecting the nonlinear ordinality and applicable to high-dimensional data. Simulation studies and real data examples show that the proposed methods have superior performance for ordinal classification with respect to various evaluation metrics. The second topic revisits the trace ratio optimization problems involved in dimension reduction. Solving the trace ratio optimization is not straightforward and it is conventionally replaced by a sub-optimal alternative, the ratio trace problem. We consider a trace regularization method and modify it in the scenario of high-dimensional canonical correlation analysis (CCA). Results from numerical studies demonstrate the efficiency of the modified trace regularization method, compared with other well-known high-dimensional CCA approaches.