Grouped variable screening for ultrahigh dimensional data under linear model

Qiu, Debin

Qiu, Debin

2016

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Files

Abstract

High or ultrahigh dimensional data set with group structure emerge in a wide range of scientific research and applications nowadays. However, sparsity may exist in this high or ultrahigh dimensional data with such group form. In such case, our primary goal is to select the important groups that are significantly correlated with outcome. In particular, grouped variable selection plays a critical role in selecting groups and estimating the nonzero coefficients for these covariates within these important groups. Nevertheless, in the presence of ultra-high dimensional data consisting of grouped variables, many algorithms for grouped variable selection may fail to converge or yield insensible results. Even if the algorithm works, it will suffer from a rather intensive computation load. In this dissertation, we propose a two-stage procedure, grouped variable screening and selection, to solve those challenging issues. At the first stage, grouped variable screening is applied to reduce the dimensionality of data by filtering out the unimportant groups that have no contribution to outcome. A sure screening property is established to ensure an overwhelming probability of retaining all important groups after the screening procedure under suitable conditions. This work will mainly focus on four grouped variable screening criteria. At the second stage, since the data have been reduced from ultra-high dimensionality to the moderate one or even lower than sample size, grouped variable selection methods are able to select the important groups effectively and estimate the nonzero coefficients accurately. Meanwhile, the computation can be decreased dramatically in terms of running time and complexity when executing the grouped variable selection. The performance of the proposed two-stage procedure is evaluated by various simulated examples and a real data set in genetic analysis. An R package called grpss is developed to incorporate the two-stage procedure into real applications.

Details

Record ID

15558

Record Created

2024-12-05

Title

Grouped variable screening for ultrahigh dimensional data under linear model

Author

Qiu, Debin

Contributor

Ahn, Jeongyoun Advisor
Ji, Pengsheng Committee Member
McCormick, William Committee Member
Wang, Lily Committee Member

College or School

Statistics

Date

2016

Publisher

University of Georgia

Content Type

Dissertation

Language

English

Dissertation/ Thesis Note

Doctoral

Degree Type

Doctor of Philosophy (PHD)

Name of Granting Institution

University of Georgia, Spring 2016

Year Degree Granted

2016

Keywords

grouped variables; grouped variable selection; grouped variable screening; marginal correlation learning; penalized regression; random permutation; sure screening property

Record Appears in

Electronic Theses and Dissertations > Doctoral Dissertation
All Resources
Doctoral

System Control Number

9949334026602959

PDF

Statistics

Download Full History