CNR Institutional Research Information System

This technical report presents a reusable, resource-agnostic methodology for converting tabular linguistic and terminological resources into SKOS-compliant RDF using Jupyter Notebooks. Developed within the H2IOSC Linguistic Linked Open Data Pilot and the CLARIN-IT ecosystem, the methodology covers source preparation, metadata normalization, intermediate JSON generation, declarative mapping with YARRRML and RML, RDF materialization, validation, post-processing, and publication through GraphDB and Skosmos. The report examines REALITER and AGRO-TERM as complementary use cases, distinguishing common methodological components from resource-specific requirements. It also discusses limitations concerning semantic modeling, identifiers, validation, reproducibility, maintainability, and publication. Although demonstrated through these two resources, the proposed workflow can be adapted to other linguistic and terminological datasets with different structures, domains, and publication requirements.

From Tabular Data to the Semantic Web: Jupyter Notebooks for Converting Linguistic Resources to SKOS

Michele Mallia^{Primo

Writing – Original Draft Preparation}

2026

Abstract

This technical report presents a reusable, resource-agnostic methodology for converting tabular linguistic and terminological resources into SKOS-compliant RDF using Jupyter Notebooks. Developed within the H2IOSC Linguistic Linked Open Data Pilot and the CLARIN-IT ecosystem, the methodology covers source preparation, metadata normalization, intermediate JSON generation, declarative mapping with YARRRML and RML, RDF materialization, validation, post-processing, and publication through GraphDB and Skosmos. The report examines REALITER and AGRO-TERM as complementary use cases, distinguishing common methodological components from resource-specific requirements. It also discusses limitations concerning semantic modeling, identifiers, validation, reproducibility, maintainability, and publication. Although demonstrated through these two resources, the proposed workflow can be adapted to other linguistic and terminological datasets with different structures, domains, and publication requirements.

Scheda breve

Scheda completa

Scheda completa (DC)

Campo DC	Valore	Lingua
dc.authority.orgunit	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	en
dc.authority.people	Michele Mallia	en
dc.authority.project	IR0000029	en
dc.collection.id.s	95773a9f-8d06-4466-a951-5d4e15d70690	*
dc.collection.name	08.04 Rapporto tecnico	*
dc.contributor.appartenenza	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	*
dc.contributor.appartenenza.mi	918	*
dc.contributor.area	Non assegn	*
dc.date.firstsubmission	2026/06/15 12:41:01	*
dc.date.issued	2026	-
dc.date.submission	2026/06/15 12:41:01	*
dc.description.abstracteng	This technical report presents a reusable, resource-agnostic methodology for converting tabular linguistic and terminological resources into SKOS-compliant RDF using Jupyter Notebooks. Developed within the H2IOSC Linguistic Linked Open Data Pilot and the CLARIN-IT ecosystem, the methodology covers source preparation, metadata normalization, intermediate JSON generation, declarative mapping with YARRRML and RML, RDF materialization, validation, post-processing, and publication through GraphDB and Skosmos. The report examines REALITER and AGRO-TERM as complementary use cases, distinguishing common methodological components from resource-specific requirements. It also discusses limitations concerning semantic modeling, identifiers, validation, reproducibility, maintainability, and publication. Although demonstrated through these two resources, the proposed workflow can be adapted to other linguistic and terminological datasets with different structures, domains, and publication requirements.	-
dc.description.allpeople	Mallia, Michele	-
dc.description.allpeopleoriginal	Michele Mallia	en
dc.description.fulltext	none	en
dc.description.numberofauthors	1	-
dc.identifier.doi	10.5281/zenodo.20700756	en
dc.identifier.source	datacite	*
dc.identifier.uri	https://hdl.handle.net/20.500.14243/587113	-
dc.language.iso	eng	en
dc.relation.projectAcronym	H2IOSC	en
dc.relation.projectAwardNumber	-	en
dc.relation.projectAwardTitle	H2IOSC Project - Humanities and cultural Heritage Italian Open Science Cloud	en
dc.relation.projectFunderName	European Union – NextGenerationEU – NRRP M4C2	en
dc.relation.projectFundingStream	B63C22000730005	en
dc.subject.keywordseng	Linguistic Linked Open Data	-
dc.subject.keywordseng	Semantic Web	-
dc.subject.keywordseng	Reusable Workflows	-
dc.subject.singlekeyword	Linguistic Linked Open Data	*
dc.subject.singlekeyword	Semantic Web	*
dc.subject.singlekeyword	Reusable Workflows	*
dc.title	From Tabular Data to the Semantic Web: Jupyter Notebooks for Converting Linguistic Resources to SKOS	en
dc.type.driver	info:eu-repo/semantics/other	-
dc.type.full	08 Report e Working Paper::08.04 Rapporto tecnico	it
dc.type.miur	298	-
iris.orcid.lastModifiedDate	2026/06/15 12:41:01	*
iris.orcid.lastModifiedMillisecond	1781520061779	*
iris.sitodocente.maxattempts	1	-
iris.unpaywall.metadataCallLastModified	28/06/2026 06:31:27	-
iris.unpaywall.metadataCallLastModifiedMillisecond	1782621087404	-
iris.unpaywall.metadataErrorDescription	0	-
iris.unpaywall.metadataErrorType	ERROR_NO_MATCH	-
iris.unpaywall.metadataStatus	ERROR	-

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/587113

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ente

Citazioni

ND

ND

ND

social impact