Files
Abstract
Reconstructing transmission networks is critical for identifying epidemiological factors such as superspreaders and high-risk locations, informing targeted strategies for pandemic prevention and control. This dissertation introduces two Bayesian frameworks designed to reconstruct infectious disease transmission networks by integrating genomic, temporal, and social network data. The proposed models accommodate within-host genetic diversity, unobserved infection times, incomplete sampling, latent periods, and symptom onset, significantly enhancing the precision of inferred transmission dynamics. Simulation studies demonstrate the robustness of the Bayesian transmission model without network data, achieving high accuracy in identifying direct transmission pairs -93 % at a genome length of 1 × 10^6 and 100% at 4.4 × 10^6. Hypothesis testing reliably identifies direct transmission events, maintaining an average false positive proportion of approximately 1%. Meanwhile, sensitivity declines with decreasing sample sizes due to increased misclassification of indirect transmissions. Implementing Nelder–Mead optimization improved sensitivity by approximately 30%, although it concurrently raised false positives by around 10%, highlighting an inherent trade-off. Furthermore, an Exponential Random Graph Model (ERGM) fitted to the inferred transmission tree demonstrated the robust effect of social distance on transmission dynamics, revealing thateach unit increase in social distance decreased transmission likelihood. Perturbation analyses with 5%, 10%, and 20% noise confirmed that ERGM reliably captured the social-distance effect and remains robust to network uncertainty. Real-world analysis using a Bayesian model on genomic and temporal data from 93 tuberculosis cases identified 28 direct transmission pairs, highlighting limited within-neighborhood transmission. ERGM analysis further suggested a trend toward increased transmission likelihood with greater social distance, implying that contacts outside immediate neighborhoods potentially drive transmission, though this association was statistically insignificant and weakened with increasing network uncertainty. Notably, this study represents the first network investigation of tuberculosis transmission in an endemic region. In future work, we will combine GPS and cell-phone trajectory data with traditional social network data using machine learning to derive personal network information, thereby refining contact probability estimation. Additionally, adopting advanced substitution models and relaxing assumptions about uniform effective population size may further enhance model accuracy. Leveraging parallel computing will improve computational efficiency, increasing the practicality and scalability of Bayesian methods in epidemiological research.