Files
Abstract
Aneuploidy is the gain or loss of chromosomes in a cell, has major effects in eukaryotes and is linked to cancer and genetic diseases in humans. Aneuploidy is common in the wine yeast, Saccharomyces cerevisiae, despite the fact that the number of chromosomes in this species has not changed in 100 million years. This is a great model system for studying aneuploidy because of the wide availability of genome data from geographically, genetically, and environmentally diverse S. cerevisiae strains. My thesis focuses on disentangling genetic and environmental factors affecting variation in aneuploidy frequency among different individuals. To do that, I used yeast genome data to: benchmark the computational detection of aneuploidy using whole-genome sequencing data (chapter 2); quantify the variation in the frequency of aneuploidy, considering environmental and genomic factors (chapter 3); and to identify genetic variants associated with chromosome gain in S. cerevisiae (chapter 4).
Two computational approaches are commonly used to detect copy number variation from short read genomic data: one uses read depth and the other uses B-allele frequency. I tested six tools on a variety of genomes from S. cerevisiae and the pathogenic yeast Candida albicans. I showed that read depth tools have high accuracy regardless of heterozygosity levels, but are sensitive to sequencing quality. In contrast, the accuracy of B-allele frequency tools is low in homozygous genomes, but high regardless of sequencing quality. A combination of both approaches works best for characterizing individual genomes, but read depth approaches are better for systematic studies in species with homozygous individuals.
Using publicly available whole-genome sequencing data from 1,000 S. cerevisiae strains, I tested for associations between genetic background, ecology, and aneuploidy prevalence. I used a combination of population genomics, phylogenomics and regression models to disentangle the relationship between ecology, genetics, and aneuploidy. My results show that genetic lineage is a better predictor than ecology for chromosome gain. I also showed that the predicted probability of chromosome gain is higher in some human-associated lineages, and that this increase in aneuploidy prevalence arose multiple independent times in S. cerevisiae.
Finally, I employed genome-wide association studies to identify genetic variants (SNVs and indels) associated with chromosome gain in diploid S. cerevisiae. This analysis showed correlation between genetic variants linked to aneuploidy and population structure, which is a common issue in genome-wide association studies. After excluding individuals in strongly structured lineages, I identified 25 SNVs associated with chromosome gain. These SNVs are in genes with reported functions that are seemingly unlinked to aneuploidy. Nonetheless, future studies can investigate if and how these genes affect the aneuploidy prevalence and tolerance in strains from specific genetic backgrounds
Two computational approaches are commonly used to detect copy number variation from short read genomic data: one uses read depth and the other uses B-allele frequency. I tested six tools on a variety of genomes from S. cerevisiae and the pathogenic yeast Candida albicans. I showed that read depth tools have high accuracy regardless of heterozygosity levels, but are sensitive to sequencing quality. In contrast, the accuracy of B-allele frequency tools is low in homozygous genomes, but high regardless of sequencing quality. A combination of both approaches works best for characterizing individual genomes, but read depth approaches are better for systematic studies in species with homozygous individuals.
Using publicly available whole-genome sequencing data from 1,000 S. cerevisiae strains, I tested for associations between genetic background, ecology, and aneuploidy prevalence. I used a combination of population genomics, phylogenomics and regression models to disentangle the relationship between ecology, genetics, and aneuploidy. My results show that genetic lineage is a better predictor than ecology for chromosome gain. I also showed that the predicted probability of chromosome gain is higher in some human-associated lineages, and that this increase in aneuploidy prevalence arose multiple independent times in S. cerevisiae.
Finally, I employed genome-wide association studies to identify genetic variants (SNVs and indels) associated with chromosome gain in diploid S. cerevisiae. This analysis showed correlation between genetic variants linked to aneuploidy and population structure, which is a common issue in genome-wide association studies. After excluding individuals in strongly structured lineages, I identified 25 SNVs associated with chromosome gain. These SNVs are in genes with reported functions that are seemingly unlinked to aneuploidy. Nonetheless, future studies can investigate if and how these genes affect the aneuploidy prevalence and tolerance in strains from specific genetic backgrounds