Files
Abstract
Heparin (HP) and heparan sulfate (HS) are linear glycan structures that undergo a complex variety of modifications, and are known to play important roles in human development and disease. Structural characterization of highly sulfated glycosaminoglycans, such as HP/HS, by LC-MS/MS remains challenging because of the extensive sulfate losses and the difficulties in separation. Our lab has introduced a LC-MS/MS method for HP/HS structural sequencing involving chemical derivatization that replaces sulfate groups with more stable markers. However the MS data produced by these experiments cannot be annotated with existing glycomics software. The work presented here describes the first database driven searching engine, GAG-ID. GAG-ID employs the multivariate hypergeometric distribution to score an experimental MS/MS spectrum against the theoretical spectrum generated by GAG-DB. A defined mixture of twenty-one synthesized tetrasaccharides as well as longer HS oligosaccharides was applied to examine the performance of GAG-ID. To evaluate the identification confidence of identified HS oligosaccharides, a multivariate EM model was designed to estimate the observed GAG-ID score distribution. The result revealed that this analysis makes it possible to filter large amount of GAG-ID search results with predictable false identification error rates. To further increase the understandability and reproducibility of the reported data, the GAG-ID plugin has been developed to integrate into GRITS toolbox. A multivariate EM model coupled with the GAG-ID search engine demonstrated the capacity for high-throughput HP/HS sequencing which would contribute to discover the rich information encoded by HP/HS chains and develop HS drugs to target broad-spectrum diseases.