Go to main content
Formats
Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

The genome cartology approach to genomics is a spatially explicit framework for the study of patterns in the distribution and abundance of sequence features within and among genomes. The ultimate goal of this approach is to identify the demographic and selective processes that have given rise to these extant patterns. However, tools for the accurate annotation and taxonomic assignment of sequence features must first be implemented before these ultimate goals can be realized. I have designed, implemented and assessed the accuracy of novel annotation and taxonomic classification software and applied these tools to a genome cartology of maize LTR retrotransposons (LRPs).The DAWGPAWS pipeline facilitates combined evidence human curation of ab initio and similarity search based computational results. I verified the value of DAWGPAWS by using this pipeline to annotate genes and transposable elements in 220 BAC insertions from the hexaploid wheat genome. To illustrate that these techniques can scale to entire genomes, the pipeline was applied to the annotation of LRPs in the B73 maize genome and discovered over 31,000 intact elements.The RepMiner suite of programs allows for the clustering of sequences into families based on networks of shared homology. I applied the RepMiner approach to the database of intact maize LRPs annotated by DAWGPAWS. RepMiner further illuminated previously identified family relationships, indicated an unrecognized split in the Huck family, and recognized over 350 new families of LRPs. Affinity propagation based clustering of intact LRPs identified a subset of ~500 exemplar sequences that can serve as a representative database of all maize LRPs. The exemplar database of maize LRPs was used to map the location of intact and fragmented LRPs in the assembled genome of maize. These LRPs comprised over 75% of the genome, and are nonrandomly distributed with a preferential accumulation in pericentromeric heterochromatin. Surprisingly, the regions of the genome with the highest accumulation of LRPs had the lowest diversity of LRP families. These results indicate that genome cartology will provide new insight into genome dynamics, and that continued development of this approach to genomics will further enlighten the study of genome evolution.The genome cartology approach to genomics is a spatially explicit framework for the study of patterns in the distribution and abundance of sequence features within and among genomes. The ultimate goal of this approach is to identify the demographic and selective processes that have given rise to these extant patterns. However, tools for the accurate annotation and taxonomic assignment of sequence features must first be implemented before these ultimate goals can be realized. I have designed, implemented and assessed the accuracy of novel annotation and taxonomic classification software and applied these tools to a genome cartology of maize LTR retrotransposons (LRPs).The DAWGPAWS pipeline facilitates combined evidence human curation of ab initio and similarity search based computational results. I verified the value of DAWGPAWS by using this pipeline to annotate genes and transposable elements in 220 BAC insertions from the hexaploid wheat genome. To illustrate that these techniques can scale to entire genomes, the pipeline was applied to the annotation of LRPs in the B73 maize genome and discovered over 31,000 intact elements.The RepMiner suite of programs allows for the clustering of sequences into families based on networks of shared homology. I applied the RepMiner approach to the database of intact maize LRPs annotated by DAWGPAWS. RepMiner further illuminated previously identified family relationships, indicated an unrecognized split in the Huck family, and recognized over 350 new families of LRPs. Affinity propagation based clustering of intact LRPs identified a subset of ~500 exemplar sequences that can serve as a representative database of all maize LRPs. The exemplar database of maize LRPs was used to map the location of intact and fragmented LRPs in the assembled genome of maize. These LRPs comprised over 75% of the genome, and are nonrandomly distributed with a preferential accumulation in pericentromeric heterochromatin. Surprisingly, the regions of the genome with the highest accumulation of LRPs had the lowest diversity of LRP families. These results indicate that genome cartology will provide new insight into genome dynamics, and that continued development of this approach to genomics will further enlighten the study of genome evolution.

Details

PDF

Statistics

from
to
Export
Download Full History