TRADITIONAL AND REPRESENTATION LEARNING APPROACHES FOR PROTEIN SEQUENCE ANALYSIS

Yeung, Wayland

TRADITIONAL AND REPRESENTATION LEARNING APPROACHES FOR PROTEIN SEQUENCE ANALYSIS

Yeung, Wayland

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Files

Abstract

The analysis of protein sequence information is an important part of bioinformatics, used for high-throughput predictions of protein structure, function, and evolution. While traditional analytical methods utilize sequence alignments, recent advances in representation learning facilitate alternative, alignment-independent strategies. In this work, I develop and apply both alignment-based and alignment-independent approaches to analyze the protein kinase superfamily, a biomedically-relevant and highly conserved class of signaling enzymes. Using a large curated sequence alignment, I characterized sequence variations of the αC-β4 loop across diverse protein kinase enzymes and identified the region as a major kinase regulatory hotspot. Using a more focused alignment, I characterized the functional evolution of tyrosine kinases families across diverse holozoan taxa and proposed a new representative phylogeny. Finally, I infer the evolutionary relationships which connect the protein kinases superfamily to structurally divergent lipid and small-molecule kinases using an alignment-independent approach, facilitated by sequence embeddings learned from Transformer protein language models. My work provides new insights on the functional evolution of the protein kinase superfamily using a combination of traditional and novel approaches inspired by unsupervised analytical techniques from representation learning. The broad applicability of my sequence embedding-based framework is further demonstrated in pilot analyses of phosphatase enzymes as well as the radical S-adenosyl-L-methionine (SAM) superfamily.

Details

Record ID

4298

Record Created

2024-12-05

Title

TRADITIONAL AND REPRESENTATION LEARNING APPROACHES FOR PROTEIN SEQUENCE ANALYSIS

Author

Yeung, Wayland

Contributor

Kannan, Natarajan Advisor
Li, Sheng Committee Member
Kennedy, Eileen Committee Member
Woods, Robert Committee Member

College or School

Franklin College of Arts and Sciences

Department

Genetics

Subjects

Bioinformatics
Artificial intelligence

Content Type

Dissertation

Pagination

182

File Format

pdf

Language

English

Degree Type

Doctor of Philosophy (PHD)

Name of Granting Institution

University of Georgia

Year Degree Granted

2022-05

Keywords

deep learning; evolution; protein kinase; representation learning; sequence analysis; Biochemistry

Record Appears in

College, School, or Unit > Franklin College of Arts and Sciences > Genetics
Electronic Theses and Dissertations > Doctoral Dissertation
All Resources
Doctoral

System Control Number

9949450526802959

PDF

Statistics

Download Full History