Files
Abstract
The epigenome consists of proteins and DNA modifications that influence how geneticinformation is used. This allows for varied gene expression across cell and tissue types despite
all cells sharing the same DNA sequence. This work explores the role of epigenomic data in
enhancing our understanding of plant genomes. It includes refining genome annotations and
studying the evolution of regulatory sequences in various grass species.
In the first study, we employed Chromatin ImmunoPrecipitation sequencing to pinpoint
transcriptional units in plant genomes, focusing on maize (Zea mays). We identified regions with
histone modifications correlated with active transcription, focusing on histone modifications
which are throughout the gene body, and a set enriched at transcriptional start sites. We utilized
these two types of marks in tandem to assay the genome for transcriptional units. Many of these
regions corresponded to known protein-coding genes, but we also discovered new regions distant
from any known gene annotations. We then leveraged this dataset to identify incorrectly
annotated protein coding genes, either those which were fractured, or missing their transcription
start site. This method was then extended to other plants revealing widespread annotation errors
across numerous plant genomes.
In the second and third chapters, we utilize single-cell indexed Assay for Transposase
Accessible Chromatin (sciATAC-seq), a technique that identifies open genomic regions using a
hyperactive Tn5 transposase. These open regions devoid of nucleosomes often contain regulatory
sequences critical for gene expression. This method was applied to five different plant species,
Zea mays, Sorghum bicolor, Panicum Miliaceum, Urochlua fusca as well as Oryza sativa. Using
this data, we developed novel methods to annotate cell types across diverse plant species.
Furthermore, we leverage this data to investigate the regulatory regions associated with C4
photosynthesis—a more efficient variant of photosynthesis in hot, arid climates compared to the
common C3 type. By generating genome-wide maps of chromatin regulatory regions, we
identify potentially crucial regulatory regions for the expression of C4 photosynthesis genes. In
the third chapter this data is repurposed to explore the evolution of regulatory regions across this
sample of monocots. In short, we find conservation of the gene regulatory networks associated
with specific cell-types, but massive turn over of sequence, indicating that conservation of cell-
type-specificity is happening at different levels in the genome.