CNR Institutional Research Information System

The availability of keywords that describe the content of a text or document certainly is essential for effective and efficient content-based retrieval. But their quality, the presence of spelling variants, synonyms, near-synonyms, and spelling errors make their use less effective. Here we present a set of tools we are developing for the management of tags. These tools are intended to be used to improve the quality of textual features and to enhance traditional ways of searching and browsing data on the web. This approach integrates different methods: word embedding models, able to capture the semantics of words and their context, clustering algorithms, able to identify/group semantically related terms, and methods able to calculate the syntactic similarity between strings. The work is still under development, and the paper will present some preliminary qualitative results that demonstrate the feasibility of our approach.

Machine learning tools to improve the quality of imperfect keywords

MT Artese;I Gagliardi

2022

Abstract

The availability of keywords that describe the content of a text or document certainly is essential for effective and efficient content-based retrieval. But their quality, the presence of spelling variants, synonyms, near-synonyms, and spelling errors make their use less effective. Here we present a set of tools we are developing for the management of tags. These tools are intended to be used to improve the quality of textual features and to enhance traditional ways of searching and browsing data on the web. This approach integrates different methods: word embedding models, able to capture the semantics of words and their context, clustering algorithms, able to identify/group semantically related terms, and methods able to calculate the syntactic similarity between strings. The work is still under development, and the paper will present some preliminary qualitative results that demonstrate the feasibility of our approach.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2022
			
	Strutture organizzative
	
				Istituto di Matematica Applicata e Tecnologie Informatiche - IMATI -
			
	Lingua/e
	
				Inglese
			
	Supervisori e coordinatori esterni
	
				Furferi, R., Governi, L., Volpe, Y., Seymour, K., Pelagotti, A., Gherardini, F.
			
	Titolo del Volume
	
				The Future of Heritage Science and Technologies: ICT and Digital Heritage. Florence Heri-Tech 2022.
			
	Serie
	
				COMMUNICATIONS IN COMPUTER AND INFORMATION SCIENCE (PRINT)
			
	Titolo del convegno
	
				3rd Florence Heri-Tech International Conference, Florence Heri-Tech 2022
			
	Da pagina
	
				97
			
	A pagina
	
				111
			
	Codice ISBN
	
				978-3-031-20302-2
			
	Codice DOI
	
				https://dx.doi.org/10.1007/978-3-031-20302-2_8
			
	URL
	
				https://link.springer.com/chapter/10.1007/978-3-031-20302-2_8
			
	Nome Editore
	
				Springer
			
	Città Editore
	
				Cham, Heidelberg, New York, Dordrecht, London
			
	Nazione Editore
	
				SVIZZERA
			
	Referee
	
				Sì, ma tipo non specificato
			
	Periodo del Convegno
	
				16-18/05/2022
			
	Luogo del Convegno
	
				Firenze
			
	Parole chiave
	
				Clustering
Content based retrieval
Multilingual tags
Natural language processing
Quality of data
Semantic relatedness
Syntactic similarity
Word embedding models
			
	Altre informazioni
	
				Online: 30 ottobre 2022
			
	Codice Scopus
	
				2-s2.0-85142750400
			
	Numero autori
	
				2
			
	Fulltext
	
				restricted
			
	Tutti gli autori
	
						Artese, Mt; Gagliardi, I
					
	Tipologia Login Miur
	
				273
			
	Tipologia
	
				info:eu-repo/semantics/conferenceObject
			
	Tipologia
	
				04 Contributo in convegno::04.01 Contributo in Atti di convegno
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
prod_474539-doc_194008.pdf solo utenti autorizzati Descrizione: Machine Learning Tools to Improve the Quality of Imperfect Keywords Tipologia: Versione Editoriale (PDF) Dimensione 1.9 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.9 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/420202

Citazioni

ND

0

ND

social impact