Files
Abstract
A quantitative understanding of local transmission dynamics is crucial for designing effective disease prevention strategies. Since the COVID-19 pandemic, large-scale sequencing has enabled real-time mutation surveillance, helping to track emerging variants and inform public health policies. Additionally, genomic data provides high-resolution insights into transmission patterns, offering a detailed view of local-scale disease spread. However, existing methods often struggle to process large-scale genomic datasets efficiently, limiting their ability to extract meaningful epidemiological insights. These challenges highlight the need for scalable statistical and computational approaches. This dissertation develops bioinformatic tools and computational frameworks to reconstruct viral introductions and local transmission patterns. Through tools like Subsamplerr, Clusterfinder, and TTAT, we improve dataset representativeness, detect viral introductions, and quantify phylogeny-trait associations. Additionally, incorporating spatial transmission count statistics, Source-Sink Score, and Local Import Score allows us to quantify transmission dynamics and identify viral sources and sinks. Integrating these metrics into a Bayesian phylodynamic framework further enhances uncertainty assessments. Applying these methods, we examine how regional heterogeneity—particularly urban-rural differences—shapes viral spread in Texas by using over 12,000 full genomes and linked epidemiological data. Our findings indicate that urban centers acted as primary epidemic sources, closely linked to the global pandemic, while rural outbreaks were largely driven by repeated introductions. A more detailed analysis of 26,000 SARS-CoV-2 genomes in Greater Houston identified over 1,000 independent introduction events. The majority of introductions were domestically sourced, while earlier international introductions were associated with larger cluster sizes. An analysis of locally circulating clusters revealed age-structured transmission dynamics. Geographic reconstruction of cluster spread identified Harris County as the primary viral source for surrounding areas. Together, this dissertation provides data-driven approaches for monitoring infectious diseases, guiding targeted control measures, and strengthening public health responses to ongoing and future pandemics.