PurposeThe purpose of this paper is to propose a methodology for the enrichment and tailoring of a knowledge organization system (KOS), in order to support the information extraction (IE) task for the analysis of documents in the tourism domain. In particular, the KOS is used to develop a named entity recognition (NER) system.Design/methodology/approachA method to improve and customize an available thesaurus by leveraging documents related to the tourism in Italy is firstly presented. Then, the obtained thesaurus is used to create an annotated NER corpus, exploiting both distant supervision, deep learning and a light human supervision.FindingsThe study shows that a customized KOS can effectively support IE tasks when applied to documents belonging to the same domains and types used for its construction. Moreover, it is very useful to support and ease the annotation task using the proposed methodology, allowing to annotate a corpus with a fraction of the effort required for a manual annotation.Originality/valueThe paper explores an alternative use of a KOS, proposing an innovative NER corpus annotation methodology. Moreover, the KOS and the annotated NER data set will be made publicly available.

Integrated use of KOS and deep learning for data set annotation in tourism domain

Giovanna Aracri
;
Stefano Silvestri
2023

Abstract

PurposeThe purpose of this paper is to propose a methodology for the enrichment and tailoring of a knowledge organization system (KOS), in order to support the information extraction (IE) task for the analysis of documents in the tourism domain. In particular, the KOS is used to develop a named entity recognition (NER) system.Design/methodology/approachA method to improve and customize an available thesaurus by leveraging documents related to the tourism in Italy is firstly presented. Then, the obtained thesaurus is used to create an annotated NER corpus, exploiting both distant supervision, deep learning and a light human supervision.FindingsThe study shows that a customized KOS can effectively support IE tasks when applied to documents belonging to the same domains and types used for its construction. Moreover, it is very useful to support and ease the annotation task using the proposed methodology, allowing to annotate a corpus with a fraction of the effort required for a manual annotation.Originality/valueThe paper explores an alternative use of a KOS, proposing an innovative NER corpus annotation methodology. Moreover, the KOS and the annotated NER data set will be made publicly available.
2023
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
Istituto di informatica e telematica - IIT
KOS
Named entity recognition
Annotation
Distant supervision
Information extraction
Active learning
File in questo prodotto:
File Dimensione Formato  
10-1108_JD-02-2023-0019.pdf

non disponibili

Tipologia: Versione Editoriale (PDF)
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 636.04 kB
Formato Adobe PDF
636.04 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/435534
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? ND
social impact