Files
Abstract
Network data analysis is an essential topic in statistical learning field, with ubiquitous applications in social science, physics, biology, etc. In this dissertation, we first propose a set of novel models and algorithms to perform community detection in networks with node attributes and provide theoretical and experimental studies. In the second part, we answered a fundamental question in network data analysis - testing the existence of communities. The Peak dEnsity raTio (PET) statistic is proposed to achieve this goal. An experimental study with simulated networks and real-world benchmark data sets show that our approach can effectively differentiate the presence and absence of communities. A generalized community detection method is applied to phylogenomic data for understanding the evolutionary history of species, often described as a phylogenetic network, under a mixture multispecies coalescence model. The generalized detection method is able to successfully reconstruct the phylogenetic network from phylogenomic data.