Statistical and Deep Learning for Data Objects with Non-Euclidean Metrics

Kang, Ilsuk

Statistical and Deep Learning for Data Objects with Non-Euclidean Metrics

Kang, Ilsuk

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Files

Abstract

In this era of Big Data, large-scale data storage provides the motivation for statisticians to analyze new types of data. The standard statistical techniques with the Euclidean metric are typically not designed to handle those new types of data. Because extracting useful information from these vast amounts of complex data is critical in modern research, there is a strong need to develop new approaches with non-Euclidean metrics for analyzing highly structured data. Among the complex data emerging in various fields of science, our research focuses on analysis of data objects.The first topic focuses on classification problems when predictors are observed as or aggregated into histograms. Because conventional classification methods take vectors as input, a natural approach converts histograms into vector-valued data using summary values, such as the mean or median. However, this approach forgoes the distributional information available in histograms. To address this issue, we propose a margin-based classifier called support histogram machine (SHM) for histogram-valued data. We adopt the support vector machine framework and the Wasserstein-Kantorovich metric to measure distances between histograms. The proposed optimization problem is solved by a dual approach. We then test the proposed SHM via simulated and real examples and demonstrate its superior performance to summary-value-based methods. In the second topic, we propose two clustering methods based on the Fréchet distance for longitudinal data: Multivariate Fréchet K-means (MFKmL) and Sparse Fréchet K-medoids (SFKmL). The Fréchet distance is a useful tool when measuring the similarity between trajectories based on their shapes. The MFKmL method follows the standard K-means algorithm with the Fréchet distance in multiple dimensions, and the SFKmL method introduces sparsity with the variable-wise Fréchet distance in the K-medoids algorithm. A simulation study suggests that SFKmL outperforms MFKmL and an existing clustering method. Moreover, the real data analysis using SFKmL provides a clustering result that is interpretable from a clinical perspective. Lastly, we proposed a conditional distribution estimator in a regression setting with a histogram-valued target variable and vector-valued covariates. By partitioning the support of the target variable, the cumulative relative frequencies associated with the histogram bins are obtained and embedded into the output layer in the neural network. We explore the prediction performance of the proposed method in various simulation settings.

Details

Record ID

4611

Record Created

2024-12-05

Title

Statistical and Deep Learning for Data Objects with Non-Euclidean Metrics

Author

Kang, Ilsuk

Contributor

Park, Cheolwoo Advisor
Liu, Liang Committee Member
Ke, Yuan Committee Member
Strait, Justin Committee Member

College or School

Franklin College of Arts and Sciences

Department

Statistics

Subjects

Statistics

Content Type

Dissertation

Pagination

129

File Format

pdf

Language

English

Degree Type

Doctor of Philosophy (PHD)

Name of Granting Institution

University of Georgia

Year Degree Granted

2021-08

Keywords

Deep learning; Fréchet distance; Histogram-valued data; K-means; Support Vector Machines; Wasserstein-Kantorovich distance

Record Appears in

Electronic Theses and Dissertations > Doctoral Dissertation
Franklin College of Arts and Sciences
All Resources
Doctoral

System Control Number

9949391256702959

PDF

Statistics

Download Full History