The full exploitation of scholarly repositories is pivotal in modern Open Science, and scholarly repository registries are kingpins in enabling researchers and research infrastructures to list and search for suitable repositories. However, since multiple registries exist, repository managers are keen on registering multiple times the repositories they manage to maximise their traction and visibility across different research communities, disciplines, and applications. These multiple registrations ultimately lead to information fragmentation and redundancy on the one hand and, on the other, force registries' users to juggle multiple registries, profiles and identifiers describing the same repository. Such problems are known to registries, which claim equivalence between repository profiles whenever possible by cross-referencing their identifiers across different registries. However, as we will see, this "claim set" is far from complete and, therefore, many replicas slip under the radar, possibly creating problems downstream. In this work, we combine such claims to create duplicate sets and extend them with the results of an automated clustering algorithm run over repository metadata descriptions. Then we manually validate our results to produce an "as accurate as possible" de-duplicated dataset of scholarly repositories.

(Semi)automated disambiguation of scholarly repositories

Baglioni M;Mannocci A;Pavone G;De Bonis M;Manghi P
2023

Abstract

The full exploitation of scholarly repositories is pivotal in modern Open Science, and scholarly repository registries are kingpins in enabling researchers and research infrastructures to list and search for suitable repositories. However, since multiple registries exist, repository managers are keen on registering multiple times the repositories they manage to maximise their traction and visibility across different research communities, disciplines, and applications. These multiple registrations ultimately lead to information fragmentation and redundancy on the one hand and, on the other, force registries' users to juggle multiple registries, profiles and identifiers describing the same repository. Such problems are known to registries, which claim equivalence between repository profiles whenever possible by cross-referencing their identifiers across different registries. However, as we will see, this "claim set" is far from complete and, therefore, many replicas slip under the radar, possibly creating problems downstream. In this work, we combine such claims to create duplicate sets and extend them with the results of an automated clustering algorithm run over repository metadata descriptions. Then we manually validate our results to produce an "as accurate as possible" de-duplicated dataset of scholarly repositories.
2023
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Inglese
Bardi A., Falcon A., Ferilli S., Marchesin S., Redavid D.
IRCDL 2023 : Information and Research Science Connecting to Digital and Library Science 2023
IRCDL 2023 - 19th conference on Information and Research Science Connecting to Digital and Library Science
47
59
https://ceur-ws.org/Vol-3365/
Sì, ma tipo non specificato
23-24/02/2023
Bari, Italy
Scholarly registries
Scholarly repositories
De-duplication
Open Science
5
open
Baglioni, M; Mannocci, A; Pavone, G; De Bonis, M; Manghi, P
273
info:eu-repo/semantics/conferenceObject
04 Contributo in convegno::04.01 Contributo in Atti di convegno
   OpenAIRE-Nexus Scholarly Communication Services for EOSC users
   OpenAIRE Nexus
   H2020
   101017452
File in questo prodotto:
File Dimensione Formato  
prod_481879-doc_198646.pdf

accesso aperto

Descrizione: (Semi)automated disambiguation of scholarly repositories
Tipologia: Versione Editoriale (PDF)
Dimensione 1.19 MB
Formato Adobe PDF
1.19 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/460090
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact