Files
Abstract
A prokaryote is a single-celled organism whose genome in the form of single continuous string of letters A, C, G and T, representing the four nucleotide bases of a DNA strand. Since the first whole genome sequencing of Haemophilus influenzae in 1995, the sequencing technology and computational study of the complete genomes have been under extensive development. Among the many problems, identifying and understanding repetitive patterns is still not fully resolved. In this dissertation, we present a novel method for detection of pairs of motifs separated by spacers of variable nucleotide sequence but conserved length. Unlike existing methods for motif discovery, the motifs themselves are not required to occur at unusually high frequency but only to exhibit a significant preference to occur at a specific distance from each other. The method was used to scan 569 complete prokaryotic genomes for novel sequence motifs. The results suggest that the method detects previously known motifs that are expected to be found as well as new motifs that are unlikely to be discovered by traditional motif-finding methods. Monte Carlo simulations were then conducted to explore the important statistical properties such as false positive rate and power of the spaced motif detection methodology. Periodic spacing of A-tracts has been associated with intrinsic DNA curvature whose physiological role in prokaryotes is not fully understood. One hypothesis centers on possible role of intrinsic DNA bends in nucleoid compaction. We use comparative genomics to investigate possible relationship between the A-tract periodicity and nucleoid-associated proteins in prokaryotes. We found that genomes with DNA-bridging proteins tend to exhibit stronger A-tract periodicity, presumably indicative of more prevalent intrinsic DNA curvature. A weaker relationship was detected for nucleoid-associated proteins that do not form DNA bridges. We consider these results an indication that intrinsic DNA curvature acts collaboratively with DNA-bridging proteins in maintaining the compact structure of the nucleoid.