In recent years, a large number of Scholarly Knowledge Graphs (SKGs) have been introduced in the literature. The communities behind these graphs strive to gather, clean, and integrate scholarly metadata from various sources to produce clean and easy-to-process knowledge graphs. In this context, a very important task of the respective cleaning and integration workflows is deduplication. In this paper, we briefly describe and evaluate the accuracy of the deduplication algorithm used for the OpenAIRE Research Graph. Our experiments show that the algorithm has an adequate performance producing a small number of false positives and an even smaller number of false negatives.

A preliminary assessment of the article deduplication algorithm used for the OpenAIRE Research Graph

De Bonis M;Atzori C;Manghi P;
2022

Abstract

In recent years, a large number of Scholarly Knowledge Graphs (SKGs) have been introduced in the literature. The communities behind these graphs strive to gather, clean, and integrate scholarly metadata from various sources to produce clean and easy-to-process knowledge graphs. In this context, a very important task of the respective cleaning and integration workflows is deduplication. In this paper, we briefly describe and evaluate the accuracy of the deduplication algorithm used for the OpenAIRE Research Graph. Our experiments show that the algorithm has an adequate performance producing a small number of false positives and an even smaller number of false negatives.
2022
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Inglese
Di Nunzio G.M., Portelli B., Redavid D., Silvello G.
IRCDL 2022 Italian Research Conference on Digital Libraries 2022
IRCDL 2022 - 18th Italian Research Conference on Digital Libraries
8
http://ceur-ws.org/Vol-3160/
24-25/02/2022
Padua, Italy
Deduplication
Open Science
Scholarly data
Knowledge graphs
8
open
Vichos, K; De Bonis, M; Kanellos, I; Chatzopoulos, S; Atzori, C; Manola, N; Manghi, P; Vergoulis, T
273
info:eu-repo/semantics/conferenceObject
04 Contributo in convegno::04.01 Contributo in Atti di convegno
   OpenAIRE-Nexus Scholarly Communication Services for EOSC users
   OpenAIRE Nexus
   H2020
   101017452
File in questo prodotto:
File Dimensione Formato  
prod_468963-doc_190105.pdf

accesso aperto

Descrizione: A preliminary assessment of the article deduplication algorithm used for the OpenAIRE Research Graph
Tipologia: Versione Editoriale (PDF)
Dimensione 1.58 MB
Formato Adobe PDF
1.58 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/416548
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact