Files
Abstract
Genome projects continue to produce large quantities of sequence data. Annotation of
this sequence data to indicate the location of genes, start and stop codons, inverted and direct repeats, and other patterns of interest is a challenging problem. In this thesis we present three contributions to solving this problem. First, we have performed an analysis of several gene-finding programs for the fungus N. crassa , and applied both standard metrics and new metrics we have defined. Next, we have developed a general tool that can automatically evaluate any gene-finding program and report performance metrics. Finally, we have developed an Interactive Pattern Search Tool (IPST) to facilitate finding complex patterns in nucleotide sequence data. The hashtable based approach employed in IPST is compared with the suffix tree approach for pattern search. IPST is applied to the problems of locating Long Terminal Repeat (LTR) retrotransposons and Miniature Inverted repeat Transposable Elements (MITEs) in rice sequences.