Files
Abstract
Untargeted metabolomics studies measure tens of thousands of features in a single biological sample. However, most features detected are unknown compounds. This creates a great need for a reliable approach to identify unknown compounds. A factor contributing to the large number of unknown compounds is that metabolomics studies usually apply genetics and pathway mapping after analytical measurements are collected. The problem with this approach is that unknown spectral features are challenging to resolve outside the context of a pathway. Here, we put genetic strain selection before data collection, thus established and hypothesized pathways can put unknown spectral features into context and help narrow possibilities during compound identification. Here, the model organism Caenorhabditis elegans is used to develop, test, and validate a pipeline to identify unknown metabolites. First, culturing and assaying large mixed- stage C. elegans populations in large-scale culture plates yield enough animals to collect phenotypic and population data, along with analytical chemical data. This method standardizes culturing conditions crucial for reproducible data. Second, three disparate study groups of C. elegans strains are compared (i.e., genetically distinct natural strains; primary and secondary metabolism mutants) to showcase how an augmented design coupled with meta-analysiseffectively handles known obstacles in metabolomics experiments to compare data in long-term studies. Technical obstacles encompassing non-linear batch variation, limited overlap in technology coverage, instability of spectral features, and challenging statistical analysis caused by heteroscedasticity are overcome using our approach. This project demonstrates the importance of using pipeline validation and proper study design, yielding reliable data for downstream unknown compound identification and metabolic pathway interpretation.