Files
Abstract
Non-coding RNAs (ncRNAs) are RNA molecules without potential of producing proteins. The computational detection of ncRNA genes in genomic backgrounds requires capturing the signals of ncRNAs. However, unlike protein-coding genes, strong, universal sequential signals for identification ncRNAs have not yet been discovered. RNA secondary structure has been widely applied as an exploitable feature for identifying ncRNA genes. Under the traditional Boltzmann ensemble of secondary structures, some structure based approaches have been developed, which have diverse performance across different ncRNA species. The mixed success of traditional structure based methods shows that some features of ncRNA sequences may not have been captured by the canonical secondary structure space defined with the Boltzmann structure ensemble.In this dissertation, I introduce novel models of the RNA secondary structure by narrowing down the space with incorporating structural elements favored by tertiary structures. The significant performance improvement achieved by the new models in separating ncRNAs from other sequences suggests that investigating secondary structure space is a promising approach to design an effective ncRNA gene finding method.