Files
Abstract
Prokaryotic genomes are diverse in terms of their nucleotide and oligonucleotide composition as well as presence of various sequence features that can affect physical properties of DNA molecule. We present a survey of local sequence patterns which have a potential to promote non-canonical DNA conformations (i.e., different from standard B-DNA double helix) and interpret the results in terms of relationships with organisms habitats, phylogenetic classifications, and other characteristics. Our work differs from earlier similar surveys not only by investigating a wider range of sequence patterns in a large number of genomes but also by using a more realistic null model to assess significant deviations. Our results show that simple sequence repeats and Z-DNA-promoting patterns are generally suppressed in prokaryotic genomes, whereas palindromes and inverted repeats are overrepresented. Representation of patterns that promote Z-DNA and intrinsic DNA curvature increases with increasing optimal growth temperature (OGT), and decreases with increasing oxygen requirement. The observed relationships with environmental characteristics, particularly OGT, suggest possible evolutionary scenarios of structural adaptation of DNA to particular environmental niches. As a natural next step, we develop software for identification of specific occurrences of the structure-related patterns and regulatory motifs that are under selective constraints, which would be indicative of a physiological role of such patterns. This is achieved by two major steps. First, the program finds orthologous sites matching the given sequence pattern in a collection of related genomes; the level of pattern conservation is subsequently evaluated by comparison of information entropy within each pattern occurrence and its immediate flanking sequences in the multiple sequence alignment of the orthologous sites. The new tools have been demonstrated in several pilot studies, including analysis of palindromic sequence patterns and intrinsically curved segments in Campylobacter and 54 binding site motifs in Salmonella and E. coli. Our methodology for investigation of evolution of regulatory motifs is an important step towards understanding the evolution of regulatory networks and how organisms adapt to changing conditions or environments. The program for detection of sequence patterns that are under selective constraint can serve as an exploratory and hypothesis-generating tool, which can be of significant interest to the scientific community.