CNR Institutional Research Information System

This technical report presents a reusable, resource-agnostic methodology for converting tabular linguistic and terminological resources into SKOS-compliant RDF using Jupyter Notebooks. Developed within the H2IOSC Linguistic Linked Open Data Pilot and the CLARIN-IT ecosystem, the methodology covers source preparation, metadata normalization, intermediate JSON generation, declarative mapping with YARRRML and RML, RDF materialization, validation, post-processing, and publication through GraphDB and Skosmos. The report examines REALITER and AGRO-TERM as complementary use cases, distinguishing common methodological components from resource-specific requirements. It also discusses limitations concerning semantic modeling, identifiers, validation, reproducibility, maintainability, and publication. Although demonstrated through these two resources, the proposed workflow can be adapted to other linguistic and terminological datasets with different structures, domains, and publication requirements.

From Tabular Data to the Semantic Web: Jupyter Notebooks for Converting Linguistic Resources to SKOS

Michele Mallia^{Primo

Writing – Original Draft Preparation}

2026

Abstract

This technical report presents a reusable, resource-agnostic methodology for converting tabular linguistic and terminological resources into SKOS-compliant RDF using Jupyter Notebooks. Developed within the H2IOSC Linguistic Linked Open Data Pilot and the CLARIN-IT ecosystem, the methodology covers source preparation, metadata normalization, intermediate JSON generation, declarative mapping with YARRRML and RML, RDF materialization, validation, post-processing, and publication through GraphDB and Skosmos. The report examines REALITER and AGRO-TERM as complementary use cases, distinguishing common methodological components from resource-specific requirements. It also discusses limitations concerning semantic modeling, identifiers, validation, reproducibility, maintainability, and publication. Although demonstrated through these two resources, the proposed workflow can be adapted to other linguistic and terminological datasets with different structures, domains, and publication requirements.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2026
			
	Strutture organizzative
	
				Istituto di linguistica computazionale "Antonio Zampolli" - ILC
			
	Parole chiave
	
				Linguistic Linked Open Data
Semantic Web
Reusable Workflows
			
	Appare nelle tipologie:
	
				08.04 Rapporto tecnico

File in questo prodotto:

File	Dimensione	Formato
from_tabular_data_to_the_semantic_web_jupyter_notebooks_for_converting_linguistic_resources_to_SKOS.pdf accesso aperto Licenza: Creative commons Dimensione 1.82 MB Formato Adobe PDF Visualizza/Apri	1.82 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/587113

Citazioni

ND

ND

ND

social impact