Unveiling the Complexity of Protein Sequence Through Effective and Interpretable Deep Learning

Zhou, Zhongliang

Unveiling the Complexity of Protein Sequence Through Effective and Interpretable Deep Learning

Zhou, Zhongliang

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Files

Abstract

Building effective representations for protein sequences has been a longstanding challenge in computational biology, necessitating sophisticated approaches for precise analysis and interpretation. This thesis capitalizes on recent advancements in protein language models (PLMs), which have revolutionized our ability to understand and interpret the complex language of proteins. Inspired by breakthroughs in natural language processing, PLMs have emerged as powerful tools for addressing intricate biological questions. The research presented here focuses on two critical areas: kinase-substrate phosphorylation prediction and protein sequence conservation. We introduce Phosformer, an innovative deep learning model that sets a new benchmark in predicting kinase-specific phosphosites with unparalleled accuracy across the entire kinome. Phosformer not only enhances the understanding of kinase-peptide interactions but also brings much-needed transparency and generalizability to these predictions. In the realm of protein sequence conservation, this thesis proposes an alignment-free method using PLMs, a significant leap from traditional alignment-based approaches, to accurately identify conserved functional sites in complex protein structures. Overall, this work contributes groundbreaking models and methodologies, advancing our understanding of kinase-substrate interactions and protein sequence conservation, with implications that extend beyond biological research to therapeutic applications, showcasing the transformative potential of PLMs in deciphering the language of proteins.

Details

Record ID

2082

Record Created

2024-12-05

Title

Unveiling the Complexity of Protein Sequence Through Effective and Interpretable Deep Learning

Author

Zhou, Zhongliang

Contributor

Li, Sheng Advisor
Kannan, Natarajan Advisor
Liu, Tianming Committee Member
Cai, Liming Committee Member

College or School

College of Engineering

Department

School of Computing

Content Type

Dissertation

Pagination

92

File Format

pdf

Language

English

Degree Type

Doctor of Philosophy (PHD)

Name of Granting Institution

University of Georgia

Year Degree Granted

2024-01

Keywords

Bioinformatics; Deep Learning; Kinase; XAI

Record Appears in

College, School, or Unit > College of Engineering > School of Computing
Electronic Theses and Dissertations > Doctoral Dissertation
All Resources
Doctoral

System Control Number

9949644930402959

Unveiling the Complexity of Protein Sequence Through Effective and Interpretable Deep Learning

Files

Abstract

Details

Statistics