In the OpenAIRE context, research organizations are aggregated from several datasources. This often leads to a duplication problem because an organization can be provided by multiple datasources. Deduplication is a fundamental task to solve this problem. The deduplication in OpenAIRE follows three main stages: clustering of entities pairwise comparisons of entities in the same cluster to draw similarity relations identification of connected components to create representative entities that groups all the duplicates of each organization Given that the pairwise comparison stage is an automatic algorithm, many false positives (or negatives) can be found. The software available in this release provides the OpenOrgs web application: a web interface for the collection of user’s feedbacks in the context of organizations deduplication. An user can edit organization’s metadata and approve or reject similarity relations suggested by the deduplication algorithm. The deduplication algorithm takes advantage of user’s feedback to increase the precision and the recall of the results. The organizations resulting from the deduplication enhanced by the user feedback are indexed and subsequently exposed by the OpenAIRE portal. This application is distributed as part of the dnet-applications module which contains some web applications developed within the OpenAIRE-Connect and OpenAIRE-Advance projects.
OpenAIRE OpenOrgs database
Artini M.
;De Bonis M.;Manghi P.;Atzori C.;Bardi A.;Baglioni M.
2021
Abstract
In the OpenAIRE context, research organizations are aggregated from several datasources. This often leads to a duplication problem because an organization can be provided by multiple datasources. Deduplication is a fundamental task to solve this problem. The deduplication in OpenAIRE follows three main stages: clustering of entities pairwise comparisons of entities in the same cluster to draw similarity relations identification of connected components to create representative entities that groups all the duplicates of each organization Given that the pairwise comparison stage is an automatic algorithm, many false positives (or negatives) can be found. The software available in this release provides the OpenOrgs web application: a web interface for the collection of user’s feedbacks in the context of organizations deduplication. An user can edit organization’s metadata and approve or reject similarity relations suggested by the deduplication algorithm. The deduplication algorithm takes advantage of user’s feedback to increase the precision and the recall of the results. The organizations resulting from the deduplication enhanced by the user feedback are indexed and subsequently exposed by the OpenAIRE portal. This application is distributed as part of the dnet-applications module which contains some web applications developed within the OpenAIRE-Connect and OpenAIRE-Advance projects.| File | Dimensione | Formato | |
|---|---|---|---|
|
dnet-applications-3.1.8.zip
accesso aperto
Descrizione: https://openpolicyfinder.jisc.ac.uk/
Tipologia:
Altro materiale allegato
Licenza:
Creative commons
Dimensione
33.3 MB
Formato
Zip File
|
33.3 MB | Zip File | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


