In this paper we propose a methodology to reduce the complexity to realize a software validation model,starting from medical specifications written in Italian natural language text. In order to obtain an auto-matic validation system it is necessary to manually translate the specification documents into softwaremodels. This task is long, tedious and error prone, due to the manual effort needed. To speed up thisprocess and to reduce the errors that can occur, an important boost can be obtained from the grouping ofthe conformance rules belonging to the same pattern. Clustering algorithms can accomplish this task, butthere is the need to know a priori the total cluster number, and this is not possible in this kind of problem.At this aim, we propose two innovative automatic cluster selection methodologies able to evaluate theoptimal number of clusters, based on an iterative internal cluster measure evaluation. These approachesconsider three different Vector Space Models (VSMs), two different clustering algorithms and the impactof the using the Principal Component Analysis technique. The experimental assessment has been per-formed on four different datasets extracted from the HL7 CDA R2 Italian language conformance rulesspecification documents, demonstrating the effectiveness of the proposed methodology. Finally, in orderto compare the results of all possible configurations, we realized a non-parametric statistical analysis.The obtained results demonstrated the effectiveness of the proposed methodology for automatic clusternumber selection.

A clustering based methodology to support the translation of medical specifications to software models

Francesco Gargiulo;Stefano Silvestri;Mario Ciampi
2018

Abstract

In this paper we propose a methodology to reduce the complexity to realize a software validation model,starting from medical specifications written in Italian natural language text. In order to obtain an auto-matic validation system it is necessary to manually translate the specification documents into softwaremodels. This task is long, tedious and error prone, due to the manual effort needed. To speed up thisprocess and to reduce the errors that can occur, an important boost can be obtained from the grouping ofthe conformance rules belonging to the same pattern. Clustering algorithms can accomplish this task, butthere is the need to know a priori the total cluster number, and this is not possible in this kind of problem.At this aim, we propose two innovative automatic cluster selection methodologies able to evaluate theoptimal number of clusters, based on an iterative internal cluster measure evaluation. These approachesconsider three different Vector Space Models (VSMs), two different clustering algorithms and the impactof the using the Principal Component Analysis technique. The experimental assessment has been per-formed on four different datasets extracted from the HL7 CDA R2 Italian language conformance rulesspecification documents, demonstrating the effectiveness of the proposed methodology. Finally, in orderto compare the results of all possible configurations, we realized a non-parametric statistical analysis.The obtained results demonstrated the effectiveness of the proposed methodology for automatic clusternumber selection.
2018
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
Clustering
Medical Specification Document
HL7 CDA R2
Validation
Natural Language Processing
Schematron
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/372288
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 12
  • ???jsp.display-item.citation.isi??? ND
social impact