The increasing availability of SNP (single nucleotide polymor- phisms) genotype data in livestock is stimulating the develop- ment of new data analysis strategies, which can be applied in animal breeding. One possible application is the prediction of carriers of specific haplotypes, especially if they impact animal health. It is therefore convenient to have a practical and easy-to- implement statistical method for the accurate classification of individuals into carriers and non-carriers. In this paper, we pres- ent a procedure for the identification of carriers of the haplotype HH1 on BTA5 (Bos Taurus autosome 5), which is known to be associated with reduced cow fertility in Holstein-Friesian cattle. A population of 1104 Holstein bulls genotyped with the 54K SNP- chip was available for the analysis. There were 45 carriers (5.3%) and 1045 non-carriers (94.7%). Two complementary mul- tivariate statistical techniques were used for the identification of haplotype carriers: Backward Stepwise Selection (BSS) to select the SNP that best fit the model, and Linear Discriminant Analysis (LDA) to classify observations, based on the selected SNP, into carriers and non-carriers. In order to explore the min- imum-sized set of SNP that correctly identifies haplotype carri- ers, different proportions of SNP were tested: 2.5; 10; 15; 30; 50 and 100%. For each proportion of SNP, BSS and LDA were applied, and the classification error rate was estimated in a 10- fold cross-validation scheme. Data were split in 10 subsets. The first subset was treated as validation set, while the model was fit on the remaining nine subsets (the training set). The overall error rate for the prediction of haplotype carriers was on average very low (~1%) both in the training and in the validation datasets. The error rate was found to depend on the number of SNPs in the model and their density around the region of the haplotype on BTA5. The minimum set of SNPs to achieve accu- rate predictions was 8, with a total test error rate of 1.27. This work describes a procedure to accurately identify haplotype car- riers from SNP genotypes in cattle populations. Very few misclas- sifications were observed, which indicates that this is a very reli- able approach for potential applications in cattle breeding.

A statistical learning approach to detect carriers of the HH1 haplotype in Italian Holstein Friesian cattle

S Biffani;S Chessa;A Stella;F Biscarini
2015

Abstract

The increasing availability of SNP (single nucleotide polymor- phisms) genotype data in livestock is stimulating the develop- ment of new data analysis strategies, which can be applied in animal breeding. One possible application is the prediction of carriers of specific haplotypes, especially if they impact animal health. It is therefore convenient to have a practical and easy-to- implement statistical method for the accurate classification of individuals into carriers and non-carriers. In this paper, we pres- ent a procedure for the identification of carriers of the haplotype HH1 on BTA5 (Bos Taurus autosome 5), which is known to be associated with reduced cow fertility in Holstein-Friesian cattle. A population of 1104 Holstein bulls genotyped with the 54K SNP- chip was available for the analysis. There were 45 carriers (5.3%) and 1045 non-carriers (94.7%). Two complementary mul- tivariate statistical techniques were used for the identification of haplotype carriers: Backward Stepwise Selection (BSS) to select the SNP that best fit the model, and Linear Discriminant Analysis (LDA) to classify observations, based on the selected SNP, into carriers and non-carriers. In order to explore the min- imum-sized set of SNP that correctly identifies haplotype carri- ers, different proportions of SNP were tested: 2.5; 10; 15; 30; 50 and 100%. For each proportion of SNP, BSS and LDA were applied, and the classification error rate was estimated in a 10- fold cross-validation scheme. Data were split in 10 subsets. The first subset was treated as validation set, while the model was fit on the remaining nine subsets (the training set). The overall error rate for the prediction of haplotype carriers was on average very low (~1%) both in the training and in the validation datasets. The error rate was found to depend on the number of SNPs in the model and their density around the region of the haplotype on BTA5. The minimum set of SNPs to achieve accu- rate predictions was 8, with a total test error rate of 1.27. This work describes a procedure to accurately identify haplotype car- riers from SNP genotypes in cattle populations. Very few misclas- sifications were observed, which indicates that this is a very reli- able approach for potential applications in cattle breeding.
2015
BIOLOGIA E BIOTECNOLOGIA AGRARIA
cattle
fertility
HH1 haplotype
single nucleotide polymorphism
statistical learning
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/296669
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact