Files
Abstract
Public health agencies have been entrusted with the monumental task of identifying emerging human pathogens and performing outbreak surveillance to limit the spread of these pathogens within the population. Samples often contain minuscule amounts of multiple strains of target pathogens mixed into a complex background. To improve public health surveillance, various enrichment methods have been developed to identify and characterize target DNA within metagenomic samples. Multiplex polymerase chain reaction (PCR) and hybridization capture are common enrichment methods used to increase the proportion of target DNA present. Implementing these target enrichment methods can reduce costs, while increasing efficiency of surveillance research. Here, we present three studies that advance our understanding of pathogen mixtures: 1) compares three previously developed PCR enrichment methods for SARS-CoV-2 using mock wastewater communities, 2) develops new hybridized capture baits to enrich the whole genomes of human-infecting Cryptosporidium spp., and 3) develops a novel bioinformatics pipeline (StrainSort) to detect and estimate variant abundance within sample mixtures. In the first study, we found that the ARTIC V4 approach yielded the most accurate SARS-CoV-2 variant identification and abundance estimates. In the second study, we determined that our baits increase the proportion of target reads by more than 1000x for single enrichments and 10,000x for double enrichments. The enriched libraries not only yield higher genome coverage with greater depths of coverage, but they also yield dramatic improvements in detecting and correctly assigning sequences to Cryptosporidium spp. in mock communities and real-world human fecal DNA samples. Finally, we introduce a novel analysis pipeline, StrainSort, that uses a collection of commonly used bioinformatics tools to separate sequence reads by strain and perform phylogenetic analysis on the acquired strains. The StrainSort pipeline can parse out mixed infections or multi-strain mixtures within a sample, allowing researchers and public health officials to identify emerging human pathogens and conduct surveillance using samples that were previously too challenging to analyze. These tools increase information to understand emerging pathogens and enable improved public health decisions, thus reducing effects on populations.