Unlabeled entity deduplication is a relevant task already studied in the recent literature. Most methods can be traced back to the following workflow: entity blocking phase, in-block pairwise comparisons between entities to draw similarity relations, closure of the resulting meshes to create groups of duplicate entities, and merging group entities to remove disambiguation. Such methods are effective but still not good enough whenever a very low false positive rate is required. In this paper, we present an approach for evaluating the correctness of "groups of duplicates", which can be used to measure the group's accuracy hence its likelihood of false-positiveness. Our novel approach is based on a Graph Neural Network that exploits and combines the concept of Graph Attention and Long Short Term Memory (LSTM). The accuracy of the proposed approach is verified in the context of Author Name Disambiguation applied to a curated dataset obtained as a subset of the OpenAIRE Graph that includes PubMed publications with at least one ORCID identifier.

A graph neural network approach for evaluating correctness of groups of duplicates

De Bonis M;Falchi F;Manghi P
2023

Abstract

Unlabeled entity deduplication is a relevant task already studied in the recent literature. Most methods can be traced back to the following workflow: entity blocking phase, in-block pairwise comparisons between entities to draw similarity relations, closure of the resulting meshes to create groups of duplicate entities, and merging group entities to remove disambiguation. Such methods are effective but still not good enough whenever a very low false positive rate is required. In this paper, we present an approach for evaluating the correctness of "groups of duplicates", which can be used to measure the group's accuracy hence its likelihood of false-positiveness. Our novel approach is based on a Graph Neural Network that exploits and combines the concept of Graph Attention and Long Short Term Memory (LSTM). The accuracy of the proposed approach is verified in the context of Author Name Disambiguation applied to a curated dataset obtained as a subset of the OpenAIRE Graph that includes PubMed publications with at least one ORCID identifier.
2023
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Inglese
Alonso O.; Cousijn H.; Silvello G.; Marrero M.; Teixeira Lopes C.; Marchesin S.
Linking Theory and Practice of Digital Libraries
TPDL 2023 - 27th International Conference on Theory and Practice of Digital Libraries
207
219
13
978-3-031-43848-6
https://link.springer.com/chapter/10.1007/978-3-031-43849-3_18
26-29/09/2023
Zadar, Croatia
Disambiguation
Graph neural network
Scholarly knowledge graphs
Elettronico
4
open
De Bonis, M; Minutella, F; Falchi, F; Manghi, P
273
info:eu-repo/semantics/conferenceObject
04 Contributo in convegno::04.01 Contributo in Atti di convegno
   OpenAIRE-Nexus Scholarly Communication Services for EOSC users
   OpenAIRE Nexus
   H2020
   101017452
File in questo prodotto:
File Dimensione Formato  
prod_490346-doc_204321.pdf

accesso aperto

Descrizione: A graph neural network approach for evaluating correctness of groups of duplicates
Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 771.95 kB
Formato Adobe PDF
771.95 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/452216
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact