CNR Institutional Research Information System

The Inverse Frequent itemset Mining (IFM) is the problem of computing a transaction database D satisfying specified support constraints on a given set S of itemsets, that are typically the frequent ones. Earlier studies focused on investigating computational and approximability properties of this problem, that is NP-hard. However, this classical formulation of IFM does not enforce any constraint on the other itemsets (i.e., the ones that are not listed in S); D may therefore happen to contain additional (and, perhaps, unsuspected or even undesired) frequent itemsets. A possibility for removing this anomaly is to introduce a more general formulation of IFM in which the supports of itemsets that do not belong to S are explicitly constrained by a given threshold in order not to eventually get unexpected frequent itemsets. This paper investigates this formulation, shows how it can be encoded as an integer linear program, and introduces a no-integer version of it solvable by a decomposition technique, that is a method designed to handle optimization problems with a huge number of variables by a using a limited memory space. As the decomposition technique requires at each step the solution of an auxiliary NP-hard integer linear program, a constructive heuristic for this auxiliary problem has also been defined, which enjoys very good scaling, thereby paving the way for its application over real-life scenarios.

Decomposition technique for the inverse frequent itemset mining problem

Guzzo Antonella;Moccia Luigi;Saccà D;Serra Edoardo

2011

Abstract

The Inverse Frequent itemset Mining (IFM) is the problem of computing a transaction database D satisfying specified support constraints on a given set S of itemsets, that are typically the frequent ones. Earlier studies focused on investigating computational and approximability properties of this problem, that is NP-hard. However, this classical formulation of IFM does not enforce any constraint on the other itemsets (i.e., the ones that are not listed in S); D may therefore happen to contain additional (and, perhaps, unsuspected or even undesired) frequent itemsets. A possibility for removing this anomaly is to introduce a more general formulation of IFM in which the supports of itemsets that do not belong to S are explicitly constrained by a given threshold in order not to eventually get unexpected frequent itemsets. This paper investigates this formulation, shows how it can be encoded as an integer linear program, and introduces a no-integer version of it solvable by a decomposition technique, that is a method designed to handle optimization problems with a huge number of variables by a using a limited memory space. As the decomposition technique requires at each step the solution of an auxiliary NP-hard integer linear program, a constructive heuristic for this auxiliary problem has also been defined, which enjoys very good scaling, thereby paving the way for its application over real-life scenarios.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2011
			
	Strutture organizzative
	
				Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
			
	Parole chiave
	
				Inverse Frequent itemset Mining
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/264521

Citazioni

ND

0

ND

social impact