Files
Abstract
Flowering plants vary tremendously in their nuclear genome sizes, chromosome numbers and the frequencies of genomic rearrangement, but changes in genic content are relatively rare. The research goal of this dissertation is to investigate the genetic basis for the great variations in morphological or physiological features among plants. However, precise genome comparison among plant species remains challenging because of their genomic complexity and lability. To minimize the errors introduced by mis-annotations and incomplete genome assemblies, we designed a set of filtering steps and identified high-confidence gene content differences between Arabidopsis and grasses. The gene content in flowering plants was shown to be very stable: only 6% to 8% differences exist between Arabidopsis and the investigated grasses, even though they diverged from a common ancestor 150~200 million years ago. A positive correlation was observed between divergence times and the gene presence/absence variances between species.A further step is to investigate gene family content difference between species, because the complete loss or gain of a family is most likely to impose dramatic effects on the function of a genome. Our study identified 2,357 gene families that were shared by all of the investigated plants. Functional categorization and enrichment tests identified several GO terms that are over-represented in the dynamically changing gene and gene family sets, suggesting one origin for the adaptive evolution of plant genomes.Positive selection is an important source of evolutionary novelty. To understand how genetic diversity is created and maintained, we studied genomic loci that are targeted by positive selection in Arabidopsis. In total, 4,478 genes were identified as genes that exhibit positive selection, and they were mostly enriched with functions related to defense responses. Three knockout mutants out of 26 randomly sampled positively selected genes with unknown function were shown to demonstrate unusually high sensitivities to Pseudomonas syringae infection, compared to zero out of 29 controls. Therefore, we propose that the selection profile of genes could be used as a guide to infer possible gene functions, and the positively selected genes identified in this study can be used as candidates to facilitate future resistance gene discovery and study.