The present work delves into innovative methodologies leveraging the widely used BERT model to enhance the population and enrichment of domainoriented controlled vocabularies as Thesauri. Starting from BERT's embeddings, we extracted information from a sample corpus of Cybersecurity related documents and presented a novel Natural Language Processing-inspired pipeline that combines Neural language models, knowledge graph extraction, and natural language inference for identifying implicit relations (adaptable to thesaural relationships) and domain concepts to populate a domain thesaurus. Preliminary results are promising, showing the effectiveness of using the proposed methodology, and thus the applicability of LLMs, BERT in particular, to enrich specialized controlled vocabularies with new knowledge.

Towards the Automated Population of Thesauri Using BERT: A Use Case on the Cybersecurity Domain

E Cardillo
;
A Portaro;M Taverniti;R Guarasci
2024

Abstract

The present work delves into innovative methodologies leveraging the widely used BERT model to enhance the population and enrichment of domainoriented controlled vocabularies as Thesauri. Starting from BERT's embeddings, we extracted information from a sample corpus of Cybersecurity related documents and presented a novel Natural Language Processing-inspired pipeline that combines Neural language models, knowledge graph extraction, and natural language inference for identifying implicit relations (adaptable to thesaural relationships) and domain concepts to populate a domain thesaurus. Preliminary results are promising, showing the effectiveness of using the proposed methodology, and thus the applicability of LLMs, BERT in particular, to enrich specialized controlled vocabularies with new knowledge.
2024
Istituto di informatica e telematica - IIT
Inglese
Leonard Barolli
Advances in Internet, Data & Web Technologies
Contributo
The 12-th International Conference on Emerging Internet, Data & Web Technologies (EIDWT-2024)
193
100
109
10
978-3-031-53554-3
https://link.springer.com/chapter/10.1007/978-3-031-53555-0_10
Springer
SVIZZERA
Esperti anonimi
21-23/02/2024
Napoli, Italia
Internazionale
Thesauri, Domain-specific language modeling, Semantic analysis, Knowledge Exctraction, LLMs
Elettronico
5
open
Cardillo, E; Portaro, A; Taverniti, M; Lanza, C; Guarasci, R
273
info:eu-repo/semantics/conferenceObject
04 Contributo in convegno::04.01 Contributo in Atti di convegno
   SEcurity and RIghts In the CyberSpace
   SERICS
   MUR National Recovery and Resilience Plan funded by the European Union - NextGenerationEU
File in questo prodotto:
File Dimensione Formato  
prod_492270-doc_205391.pdf

accesso aperto

Descrizione: Towards the Automated Population of Thesauri Using BERT: A Use Case on the Cybersecurity Domain
Tipologia: Documento in Post-print
Licenza: Creative commons
Dimensione 143.05 kB
Formato Adobe PDF
143.05 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/450089
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? ND
social impact