Files
Abstract
In Livestock applications, genome wide association studies and genomic selection are regularly conducted using purebred populations. Estimation and often validation of SNP are carried out using primarily pure bred animals. This process was successful when estimated SNP effects were used to predict genomic breeding values of animals of similar breed. However, it fails at different degrees when these SNP estimates are used for genomic prediction in other breeds or crossbred animals. Current approaches for dealing with admixed and crossbred populations in genomic selection rely on using different groups of pooled animals in the training and validation sets, and hence are data dependent and often lead to reduction in accuracies for animals in the pure breed populations. In an admixture population or in presence of crossbred animals, pooled data based methods assume that SNP effects are the same across breeds or sub-populations. This assumption is inaccurate due to the fact that several parameters such as allele frequencies, strength of linkage disequilibrium, and linkage phase change across sub-populations. To remedy the problem, we proposed a multi-compartment model where the effect of an SNP could be different between breeds and parameterized as a function of its effect on one of the breeds in the pooled population through a one to one mapping function. In a simulation study, it was shown the proposed multi-compartment model is clearly superior to the pooled breed approach as it accounts for the difference in SNP effects across divergent lines. Its superiority compared to the pooled data approach ranged from approximately from 17 to 47% and increases as the divergence between lines increases. However, the proposed multi-compartment model suffers from the high dimensionality of the unknown parameters to estimate. In fact, an extra parameter per SNP and per component in the admixed population is needed to be estimated. Although the model works well when the number of animals in each breed is reasonable, it performance degrades as the number of animals in some lines decreases, making the estimation of their corresponding SNP effects numerically unstable and, in extreme cases, statistically inefficient (severely biased). To overcome this problem, we proposed not to estimate a mapping parameter for each SNP rather to build a model as a function of information already available in the genotype data via a hierarchical structural model. In this study, the genetic difference between lines was modeled as a function of the change in linkage disequilibrium and the potential change in linkage phase.