Files
Abstract
In this thesis, we focus on developing novel statistical and machine learning methods for network dataanalysis, with emphasis on both appealing statistical properties and computational efficiency. In particular,
the first project studies the problem of network cross-validation in graphon estimation. Graphon, short
for graph function, provides a generative model for networks. The success of most graphon estimation
methods depends on a proper specification of hyperparameters. Existing network cross-validation meth-
ods suffer from restrictive model assumptions, expensive computational costs, and a lack of theoretical
guarantees. To address these issues, we propose a masked mirror validation (MMV) method. The second
project studies the problem of network sampling. In the past decades, many large graphs with millions of
nodes have been collected/constructed. The high computational cost and significant visualization difficulty hinder the analysis of large graphs. To overcome the computational challenge of a large graph, we
propose a graph subsampling algorithm, i.e., Ollivier-Ricci curvature Gradient-based subsampling (ORG-
sub) algorithm, which employs Riemannian geometric information. The superiority of the proposed
methods is demonstrated by various synthetic and real experiments. The third project developed and
applied network analysis methods to analyze transnational advocacy networks (TANs). We build a dataset
of the 3,903 NGOs connected through 1.3 million ties occurring through meetings and conferences for
NGOs put on or coordinated by the United Nations. Using community detection methods, we identify four distinct communities in the overall NGO network, with differences in distributions of brokerage
roles across communities. This help us better understand how the TANs simultaneously provides social
power and exacerbates global inequalities.