In this paper we propose a methodology to reduce the complexity to realize a software validation model,starting from medical specifications written in Italian natural language text. In order to obtain an auto-matic validation system it is necessary to manually translate the specification documents into softwaremodels. This task is long, tedious and error prone, due to the manual effort needed. To speed up thisprocess and to reduce the errors that can occur, an important boost can be obtained from the grouping ofthe conformance rules belonging to the same pattern. Clustering algorithms can accomplish this task, butthere is the need to know a priori the total cluster number, and this is not possible in this kind of problem.At this aim, we propose two innovative automatic cluster selection methodologies able to evaluate theoptimal number of clusters, based on an iterative internal cluster measure evaluation. These approachesconsider three different Vector Space Models (VSMs), two different clustering algorithms and the impactof the using the Principal Component Analysis technique. The experimental assessment has been per-formed on four different datasets extracted from the HL7 CDA R2 Italian language conformance rulesspecification documents, demonstrating the effectiveness of the proposed methodology. Finally, in orderto compare the results of all possible configurations, we realized a non-parametric statistical analysis.The obtained results demonstrated the effectiveness of the proposed methodology for automatic clusternumber selection.
A clustering based methodology to support the translation of medical specifications to software models
Francesco Gargiulo;Stefano Silvestri;Mario Ciampi
2018
Abstract
In this paper we propose a methodology to reduce the complexity to realize a software validation model,starting from medical specifications written in Italian natural language text. In order to obtain an auto-matic validation system it is necessary to manually translate the specification documents into softwaremodels. This task is long, tedious and error prone, due to the manual effort needed. To speed up thisprocess and to reduce the errors that can occur, an important boost can be obtained from the grouping ofthe conformance rules belonging to the same pattern. Clustering algorithms can accomplish this task, butthere is the need to know a priori the total cluster number, and this is not possible in this kind of problem.At this aim, we propose two innovative automatic cluster selection methodologies able to evaluate theoptimal number of clusters, based on an iterative internal cluster measure evaluation. These approachesconsider three different Vector Space Models (VSMs), two different clustering algorithms and the impactof the using the Principal Component Analysis technique. The experimental assessment has been per-formed on four different datasets extracted from the HL7 CDA R2 Italian language conformance rulesspecification documents, demonstrating the effectiveness of the proposed methodology. Finally, in orderto compare the results of all possible configurations, we realized a non-parametric statistical analysis.The obtained results demonstrated the effectiveness of the proposed methodology for automatic clusternumber selection.File | Dimensione | Formato | |
---|---|---|---|
Pubblicazione12.pdf
non disponibili
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
2.29 MB
Formato
Adobe PDF
|
2.29 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.