The availability of keywords that describe the content of a text or document certainly is essential for effective and efficient content-based retrieval. But their quality, the presence of spelling variants, synonyms, near-synonyms, and spelling errors make their use less effective. Here we present a set of tools we are developing for the management of tags. These tools are intended to be used to improve the quality of textual features and to enhance traditional ways of searching and browsing data on the web. This approach integrates different methods: word embedding models, able to capture the semantics of words and their context, clustering algorithms, able to identify/group semantically related terms, and methods able to calculate the syntactic similarity between strings. The work is still under development, and the paper will present some preliminary qualitative results that demonstrate the feasibility of our approach.

Machine learning tools to improve the quality of imperfect keywords

MT Artese;I Gagliardi
2022

Abstract

The availability of keywords that describe the content of a text or document certainly is essential for effective and efficient content-based retrieval. But their quality, the presence of spelling variants, synonyms, near-synonyms, and spelling errors make their use less effective. Here we present a set of tools we are developing for the management of tags. These tools are intended to be used to improve the quality of textual features and to enhance traditional ways of searching and browsing data on the web. This approach integrates different methods: word embedding models, able to capture the semantics of words and their context, clustering algorithms, able to identify/group semantically related terms, and methods able to calculate the syntactic similarity between strings. The work is still under development, and the paper will present some preliminary qualitative results that demonstrate the feasibility of our approach.
2022
Istituto di Matematica Applicata e Tecnologie Informatiche - IMATI -
Inglese
Furferi, R., Governi, L., Volpe, Y., Seymour, K., Pelagotti, A., Gherardini, F.
The Future of Heritage Science and Technologies: ICT and Digital Heritage. Florence Heri-Tech 2022.
3rd Florence Heri-Tech International Conference, Florence Heri-Tech 2022
97
111
978-3-031-20302-2
https://link.springer.com/chapter/10.1007/978-3-031-20302-2_8
Springer
Cham, Heidelberg, New York, Dordrecht, London
SVIZZERA
Sì, ma tipo non specificato
16-18/05/2022
Firenze
Clustering
Content based retrieval
Multilingual tags
Natural language processing
Quality of data
Semantic relatedness
Syntactic similarity
Word embedding models
Online: 30 ottobre 2022
2
restricted
Artese, Mt; Gagliardi, I
273
info:eu-repo/semantics/conferenceObject
04 Contributo in convegno::04.01 Contributo in Atti di convegno
File in questo prodotto:
File Dimensione Formato  
prod_474539-doc_194008.pdf

solo utenti autorizzati

Descrizione: Machine Learning Tools to Improve the Quality of Imperfect Keywords
Tipologia: Versione Editoriale (PDF)
Dimensione 1.9 MB
Formato Adobe PDF
1.9 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/420202
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact