The OpenAIRE infrastructure populates a scholarly communication big graph interlinking metadata objects of publications, datasets, software, organizations, funders, and projects. In order to de-duplicate this graph, OpenAIRE has developed GDup, an integrated, scalable, general-purpose system for entity deduplication over big information graphs. GDup offers functionalities to realize a hilly-fledged entity deduplication workflow over a generic input graph, inclusive of Ground Truth support, end-user feedback, and strategies for identifying and merging duplicates to obtain an output disambiguated graph.
De-duplicating the OpenAIRE scholarly communication big graph
Atzori C;Manghi P;Bardi A
2018
Abstract
The OpenAIRE infrastructure populates a scholarly communication big graph interlinking metadata objects of publications, datasets, software, organizations, funders, and projects. In order to de-duplicate this graph, OpenAIRE has developed GDup, an integrated, scalable, general-purpose system for entity deduplication over big information graphs. GDup offers functionalities to realize a hilly-fledged entity deduplication workflow over a generic input graph, inclusive of Ground Truth support, end-user feedback, and strategies for identifying and merging duplicates to obtain an output disambiguated graph.File | Dimensione | Formato | |
---|---|---|---|
prod_402402-doc_139936.pdf
solo utenti autorizzati
Descrizione: De-duplicating the OpenAIRE Scholarly Communication Big Graph
Tipologia:
Versione Editoriale (PDF)
Dimensione
113.65 kB
Formato
Adobe PDF
|
113.65 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
prod_402402-doc_139937.pdf
accesso aperto
Descrizione: De-duplicating the OpenAIRE Scholarly Communication Big Graph (poster presentation)
Tipologia:
Versione Editoriale (PDF)
Dimensione
449.16 kB
Formato
Adobe PDF
|
449.16 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.