Go to main content
Formats
Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

The advent of the age of big data poses challenges for statistical data analysis. On the one hand, the ultra-large size of datasets renders the application of many classical statistical methods computationally demanding. On the other hand, with the system studied getting more complicated, the model set-up for some popular off-the-shelf methods may not be applicable anymore; modeling of model uncertainty, in particular, becomes important since it is challenging to put one single (set of) assumption(s) on complicated systems. Developing new theoretically justifiable and computationally efficient methods for tackling big data problems from computational and modeling perspectives is the primary motivation for my research. The first focus of my work is on studying the theoretical properties of subsampling methods for dealing with the sheer size of big data. In the framework of the linear model, I show the asymptotic normality of subsampling estimators for both estimating the parameter (unconditional inference) and approximating full sample estimate (conditional inference) with certain regularity conditions satisfied. Based on these asymptotic results, I propose optimal subsampling estimators under different scenarios. The second focus is to propose a Bayesian hierarchical model for integrating the model uncertainty in statistical inference. Under the smoothing spline model, I incorporate the uncertainty in model assumption, i.e., choice of penalty, as a mixture prior for the function to be estimated, and carefully choose innovative (partially) noninformative priors for the parameters in the model. The propriety of the resulting posterior distribution is established to provide theoretical underpinnings. Advantages of the proposed methods are shown using both simulated and real-world examples. In the end, I also discuss the application part of my research in the generation of small RNAs and their function in gene silencing in C. elegans.

Details

PDF

Statistics

from
to
Export
Download Full History