Files
Abstract
Metabolomics is the comprehensive study of the collective small molecules within a biological system. The study of metabolites gives a close measure of the phenotype, giving insight into an organism’s physiological and biochemical state. Thus, metabolomics has emerged as a highly attractive field employed to study normal and altered physiology due to natural diversity, genetic mutations, and disease state(s). A typical untargeted metabolomics workflow involves experimental design, sample collection, sample preparation, data collection and processing, and analysis. While improvements in bioinformatic approaches and available reference databases have pushed the field of untargeted metabolomics to new heights, compound identification has remained the major bottleneck over the past decade with only a small percentage (<2%) of detected mass spectral features able to be confidently identified. Here, we showcase how optimizing these upstream steps is essential for the creation of a robust pipeline for the identification of unknown metabolites, and how complementary analytical technologies, when bridged together, can be used to address some of the key limitations in compound identification. First, we show how using an augmented design coupled with meta-analysis can effectively handle technical obstacles such as non-linear batch variation, spectral feature instability, and statistical analysis challenges in large-scale metabolomics experiments. Second, we show how a Taguchi design of experiments (DoE) approach can be used to optimize a range of extraction parameters quickly and efficiently in a sequential non-polar and polar extraction using reverse phase (RP) and hydrophilic interaction liquid chromatography (HILIC) coupled to MS. Lastly, we show how semi-preparative fractionation of metabolomics samples can be used to improve compound identification by reducing spectral overlap. Further, we demonstrate that fractionation can concentrate metabolites previously too dilute to be analyzed using both NMR and LC-MS/MS, and thus, can be used to bridge the sensitivity gap between these analytical platforms, often difficult to integrate yet essential for the structural elucidation of unknown metabolites. While metabolite ID remains the key bottleneck in metabolomics, the work showcased herein demonstrates the importance of each individual step of the metabolomics workflow, the necessity to optimize them, and the strengths complementary analytical techniques bring to the compound ID table.