CNR Institutional Research Information System

The present paper describes LMF LExical MErger (L-LEME), an architecture to combine two lexicons in order to obtain new resource(s). L-LEME relies on standards, thus exploiting the benefits of the ISO Lexical Markup Framework (LMF) to ensure interoperability. L-LEME is meant to be dynamic and heavily adaptable: it allows the users to configure it to meet their specific needs. The L-LEME architecture is composed of two main modules: the Mapper, which takes in input two lexicons A and B and a set of user-defined rules and instructions to guide the mapping process (Directives D) and gives in output all matching entries. The algorithm also calculates a cosine similarity score. The Builder takes in input the previous results, a set of Directives D1 and produces a new LMF lexicon C. The Directives allow the user to define its own building rules and different merging scenarios. L-LEME is applied to a specific concrete task within the PANACEA project, namely the merging of two Italian SubCategorization Frame (SCF) lexicons. The experiment is interesting in that A and B have different philosophies behind, being A built by human introspection and B automatically extracted. Ultimately, L-LEME has interesting repercussions in many language technology applications

L-LEME: an Automatic Lexical Merger based on the LMF Standard

Riccardo Del Gratta;Francesca Frontini;Monica Monachini;Valeria Quochi;Francesco Rubino;Matteo Abrate;Angelica Lo Duca

2012

Abstract

The present paper describes LMF LExical MErger (L-LEME), an architecture to combine two lexicons in order to obtain new resource(s). L-LEME relies on standards, thus exploiting the benefits of the ISO Lexical Markup Framework (LMF) to ensure interoperability. L-LEME is meant to be dynamic and heavily adaptable: it allows the users to configure it to meet their specific needs. The L-LEME architecture is composed of two main modules: the Mapper, which takes in input two lexicons A and B and a set of user-defined rules and instructions to guide the mapping process (Directives D) and gives in output all matching entries. The algorithm also calculates a cosine similarity score. The Builder takes in input the previous results, a set of Directives D1 and produces a new LMF lexicon C. The Directives allow the user to define its own building rules and different merging scenarios. L-LEME is applied to a specific concrete task within the PANACEA project, namely the merging of two Italian SubCategorization Frame (SCF) lexicons. The experiment is interesting in that A and B have different philosophies behind, being A built by human introspection and B automatically extracted. Ultimately, L-LEME has interesting repercussions in many language technology applications

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2012
			
	Strutture organizzative
	
				Istituto di informatica e telematica - IIT
Istituto di linguistica computazionale "Antonio Zampolli" - ILC
			
	Lingua/e
	
				Inglese
			
	Supervisori e coordinatori esterni
	
				Bel N. , Gavrilidou M. , Monachini M., Quochi V., Rimell L.
			
	Titolo del Volume
	
				Proceedings of the LREC 2012 Workshop on Language Resource Merging
			
	Titolo del convegno
	
				The Eight International Conference on Language Resources and Evaluation (LREC) 2012
			
	Da pagina
	
				31
			
	A pagina
	
				40
			
	Numero di pagine
	
				10
			
	Codice ISBN
	
				978-2-9517408-7-7
			
	Referee
	
				Sì, ma tipo non specificato
			
	Periodo del Convegno
	
				2012
			
	Luogo del Convegno
	
				Istanbul, Turkey
			
	Parole chiave
	
				LMF
Lexicon mapping
similarity score
			
	Altre informazioni
	
				ID_PUMA; /cnr.iit/2012-A2-035
                 cnr.iit/2012-A2-020
			
	Numero autori
	
				7
			
	Fulltext
	
				none
			
	Tutti gli autori
	
						DEL GRATTA, Riccardo; Frontini, Francesca; Monachini, Monica; Quochi, Valeria; Rubino, Francesco; Abrate, Matteo; LO DUCA, Angelica
					
	Tipologia Login Miur
	
				273
			
	Tipologia
	
				info:eu-repo/semantics/conferenceObject
			
	Tipologia
	
				04 Contributo in convegno::04.01 Contributo in Atti di convegno
			
	Identificativo progetto
	
	Titolo Progetto
	
									Platform for Automatic, Normalized Annotation and Cost-Effective Acquisition of Language Resources for Human Language Technologies
								
	Acronimo
	
									PANACEA
								
	Finanziamento
	
									FP7
								
	N. Contratto
	
									248064
								
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/117790

Citazioni

ND

ND

ND

social impact