Files
Abstract
Leptospirosis, caused by the bacterial spirochete Leptospira, is a tropical neglected disease infecting many mammalian hosts and leading to a large amount of morbidity and
mortality in humans and livestock around the world. Despite the caused large social and
economic damages, the molecular mechanisms underlying Leptospira’s pathogenicity are still
not fully understood. With the advancement of Next Generation Sequencing technologies,
modern Leptospira research has an unprecedented opportunity to further our understanding of
this complex organism. In this thesis, I tested and developed computational methods to facilitate
the understanding of Leptospira genome evolution and structural compositions across biological
scales using multi-omics approaches.
First, I used metagenomics to evaluate some of the most widely used direct read shotgun
metagenomics taxonomic profiling software and databases to determine their accuracy and
sensitivity in detecting Leptospira from biological samples. I showed that the discrepancies from
different software and databases’ profiling results could lead to significant variations in the
distinct microbial taxa classified and, thus, cause false positives or false negatives in the
detection of Leptospira, especially at the species level resolution.
Second, I used transcriptomics to characterize the transcriptomes of different Leptospira
serovars using the long-read platform of Oxford Nanopore Technologies. I determined novel
RNA molecules and compared the transcriptomes of pathogenic and non-pathogenic Leptospira
to identify signals of Leptospira pathogenicity. I also provided evidence for the existence of
posttranscriptional polyadenylation for Leptospira RNA expression regulation and the use of
ONT sequencing without polyadenylation as a tool to improve our understanding of prokaryotic
RNA polyadenylation.
Third, I characterized a whole genome sequence dataset of Leptospira collected from
public databases to determine associations between genetic variations identified from different
Leptospira genome sequences and their pathogenicity. I designed and implemented two
automatic workflows, BactASM and BactPrep, for the cleaning, organization, annotation, and
characterization of large whole-genome sequence datasets. Using these workflows, I identified
genetic variations between and within different Leptospira species genomes. These variations
reflect the history of Leptospira’s non-clonal evolutionary mechanisms related pathogenesis.
Overall, this thesis provides a comprehensive omics framework that furthers our
understanding of Leptospira detection and evolution at multiple biological scales.