CNR Institutional Research Information System

In this paper we propose a methodology to reduce the complexity to realize a software validation model,starting from medical specifications written in Italian natural language text. In order to obtain an auto-matic validation system it is necessary to manually translate the specification documents into softwaremodels. This task is long, tedious and error prone, due to the manual effort needed. To speed up thisprocess and to reduce the errors that can occur, an important boost can be obtained from the grouping ofthe conformance rules belonging to the same pattern. Clustering algorithms can accomplish this task, butthere is the need to know a priori the total cluster number, and this is not possible in this kind of problem.At this aim, we propose two innovative automatic cluster selection methodologies able to evaluate theoptimal number of clusters, based on an iterative internal cluster measure evaluation. These approachesconsider three different Vector Space Models (VSMs), two different clustering algorithms and the impactof the using the Principal Component Analysis technique. The experimental assessment has been per-formed on four different datasets extracted from the HL7 CDA R2 Italian language conformance rulesspecification documents, demonstrating the effectiveness of the proposed methodology. Finally, in orderto compare the results of all possible configurations, we realized a non-parametric statistical analysis.The obtained results demonstrated the effectiveness of the proposed methodology for automatic clusternumber selection.

A clustering based methodology to support the translation of medical specifications to software models

Francesco Gargiulo;Stefano Silvestri;Mario Ciampi

2018

Abstract

In this paper we propose a methodology to reduce the complexity to realize a software validation model,starting from medical specifications written in Italian natural language text. In order to obtain an auto-matic validation system it is necessary to manually translate the specification documents into softwaremodels. This task is long, tedious and error prone, due to the manual effort needed. To speed up thisprocess and to reduce the errors that can occur, an important boost can be obtained from the grouping ofthe conformance rules belonging to the same pattern. Clustering algorithms can accomplish this task, butthere is the need to know a priori the total cluster number, and this is not possible in this kind of problem.At this aim, we propose two innovative automatic cluster selection methodologies able to evaluate theoptimal number of clusters, based on an iterative internal cluster measure evaluation. These approachesconsider three different Vector Space Models (VSMs), two different clustering algorithms and the impactof the using the Principal Component Analysis technique. The experimental assessment has been per-formed on four different datasets extracted from the HL7 CDA R2 Italian language conformance rulesspecification documents, demonstrating the effectiveness of the proposed methodology. Finally, in orderto compare the results of all possible configurations, we realized a non-parametric statistical analysis.The obtained results demonstrated the effectiveness of the proposed methodology for automatic clusternumber selection.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2018
			
	Strutture organizzative
	
				Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
			
	Parole chiave
	
				Clustering
Medical Specification Document
HL7 CDA R2
Validation
Natural Language Processing
Schematron
			
	Appare nelle tipologie:
	
				01.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Pubblicazione12.pdf non disponibili Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 2.29 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	2.29 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/372288

Citazioni

ND

13

ND

social impact