CNR Institutional Research Information System

In this paper we address the Pure Parsimony Haplotyping problem: Given a set of genotypes G, find a minimum cardinality set of haplotypes which explains G. A genotype g is an n-dimensional vector in {0, 1, 2} and a haplotype h is an n-dimensional binary vector. A set of haplotypes H is said to explain G if for every g 2 G there are h1, h2 2 H such that h1 h2 = g. The position-wise sum h1 h2 indicates the genotype which has a 2 in the positions where h1 and h2 disagree, and the same value as h1 and h2 where they agree. In this paper we show the APX-hardness of the problem even when the number of "2" symbols is at most 3 for every g 2 G. We then give two Integer Linear Programming formulations. From the first one (appeared also in [9]) we derive a 2k-1-approximation algorithm when the number of symbols "2" is at most k for every g 2 G. This formulation has an exponential number of variables and constraints. The second formulation is new, and has a polynomial number of variables and constraints. Finally, we give approximation algorithms not based on Linear Programming, such as a trivial p|G|-approximation algorithm for the general case, and an effective probabilistic algorithm with a performance guarantee of 2k+1blog |G|c (1 + dln |G|e) when the number of symbols "2" is at most k for every g 2 G. The expected running time of the algorithm turns out to be almost linear in the input size. 1 Introduction A Single Nucleotide Polymorphism (SNP) is a site of the human genome (i.e., the posi- tion of a specific nucleotide), whose content shows a statistically significant variability within a population. A position is considered a SNP if for the minority of the pop- ulation (as long as it consists of at least 5% of the population) a certain nucleotide

Haplotyping populations by pure parsinomy: complexity, exact and approximation algorithms

Lancia G;Pinotti M C;Rizzi R

2004

Abstract

In this paper we address the Pure Parsimony Haplotyping problem: Given a set of genotypes G, find a minimum cardinality set of haplotypes which explains G. A genotype g is an n-dimensional vector in {0, 1, 2} and a haplotype h is an n-dimensional binary vector. A set of haplotypes H is said to explain G if for every g 2 G there are h1, h2 2 H such that h1 h2 = g. The position-wise sum h1 h2 indicates the genotype which has a 2 in the positions where h1 and h2 disagree, and the same value as h1 and h2 where they agree. In this paper we show the APX-hardness of the problem even when the number of "2" symbols is at most 3 for every g 2 G. We then give two Integer Linear Programming formulations. From the first one (appeared also in [9]) we derive a 2k-1-approximation algorithm when the number of symbols "2" is at most k for every g 2 G. This formulation has an exponential number of variables and constraints. The second formulation is new, and has a polynomial number of variables and constraints. Finally, we give approximation algorithms not based on Linear Programming, such as a trivial p|G|-approximation algorithm for the general case, and an effective probabilistic algorithm with a performance guarantee of 2k+1blog |G|c (1 + dln |G|e) when the number of symbols "2" is at most k for every g 2 G. The expected running time of the algorithm turns out to be almost linear in the input size. 1 Introduction A Single Nucleotide Polymorphism (SNP) is a site of the human genome (i.e., the posi- tion of a specific nucleotide), whose content shows a statistically significant variability within a population. A position is considered a SNP if for the minority of the pop- ulation (as long as it consists of at least 5% of the population) a certain nucleotide

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2004
			
	Strutture organizzative
	
				Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
			
	Parole chiave
	
				Distributed Systems
			
	Appare nelle tipologie:
	
				01.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
prod_44124-doc_124914.pdf solo utenti autorizzati Descrizione: Haplotyping populations by pure parsinomy: complexity, exact and approximation algorithms Tipologia: Versione Editoriale (PDF) Dimensione 223.93 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	223.93 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/39987

Citazioni

ND

ND

ND

social impact