Files
Abstract
With advances in science and technologies in the past decade, the amount of data generated and recorded has grown enormously in virtually all fields of industry and science.
This extraordinary amount of data provides unprecedented opportunities for data-driven decision-making and knowledge discovery. However, the task of analyzing such large-scale dataset poses significant challenges and calls for innovative statistical methods specifically designed for faster speed and higher efficiency. In this thesis, I will cover some state-of-the-art data reduction methods for large-scale data analysis, with a focus on the design-based subsampling methods and some applications of sufficient dimension reduction
in optimal transport methods.
This extraordinary amount of data provides unprecedented opportunities for data-driven decision-making and knowledge discovery. However, the task of analyzing such large-scale dataset poses significant challenges and calls for innovative statistical methods specifically designed for faster speed and higher efficiency. In this thesis, I will cover some state-of-the-art data reduction methods for large-scale data analysis, with a focus on the design-based subsampling methods and some applications of sufficient dimension reduction
in optimal transport methods.