We apply hierarchical supervised learning technology to the problem of assigning codes from the well-known ACR Index (a "double-hierarchy" classification scheme from the American College of Radiology) to radiology reports. This task is actually two classification tasks in one: the former uses a first hierarchy of codes describing anatomic locations, and the latter uses a second hierarchy of codes describing pathologies, where the two hierarchies are closely intertwined. A requirement of each such classification task is that the document be placed in exactly one node of depth >= 2 of the "anatomic location" hierarchy and in exactly one node of depth >= 3 of the "pathology" hierarchy; this makes our task a (fairly uncommon) variable-constraint classification task, since at the first levels of the hierarchy (2 for anatomic location, 3 for pathology) we need to use a standard "exactly 1 class per document" constraint, while at the lower levels we need to use an "at most 1 class per document" constraint. We have used a large dataset of about 250,000 radiology reports written in Italian and an adaptation of our TreeBoost.MH learning algorithm to variable-constraint classification. Notwithstanding the extreme difficulty of the task (given by the fact that the two codes had to be picked out of a pool of 719 codes for anatomic location and 5,269 codes for pathology, respectively) our system displayed good accuracy, indicating that it may represent a viable tool for semi-automated classification of medical reports. We also analyzed the quantification accuracy of our system (i.e., the ability of the system at correctly estimating the frequency of the individual codes), a concern of special interest in epidemiology; the results show that our system has excellent quantification accuracy, making this system a valuable tool for the fully automated coding of radiology reports for epidemiological purposes.
Variable-constraint classification and quantification of radiology reports under the ACR Index
Esuli A;Sebastiani F
2013
Abstract
We apply hierarchical supervised learning technology to the problem of assigning codes from the well-known ACR Index (a "double-hierarchy" classification scheme from the American College of Radiology) to radiology reports. This task is actually two classification tasks in one: the former uses a first hierarchy of codes describing anatomic locations, and the latter uses a second hierarchy of codes describing pathologies, where the two hierarchies are closely intertwined. A requirement of each such classification task is that the document be placed in exactly one node of depth >= 2 of the "anatomic location" hierarchy and in exactly one node of depth >= 3 of the "pathology" hierarchy; this makes our task a (fairly uncommon) variable-constraint classification task, since at the first levels of the hierarchy (2 for anatomic location, 3 for pathology) we need to use a standard "exactly 1 class per document" constraint, while at the lower levels we need to use an "at most 1 class per document" constraint. We have used a large dataset of about 250,000 radiology reports written in Italian and an adaptation of our TreeBoost.MH learning algorithm to variable-constraint classification. Notwithstanding the extreme difficulty of the task (given by the fact that the two codes had to be picked out of a pool of 719 codes for anatomic location and 5,269 codes for pathology, respectively) our system displayed good accuracy, indicating that it may represent a viable tool for semi-automated classification of medical reports. We also analyzed the quantification accuracy of our system (i.e., the ability of the system at correctly estimating the frequency of the individual codes), a concern of special interest in epidemiology; the results show that our system has excellent quantification accuracy, making this system a valuable tool for the fully automated coding of radiology reports for epidemiological purposes.| File | Dimensione | Formato | |
|---|---|---|---|
|
prod_277187-doc_78078.pdf
solo utenti autorizzati
Descrizione: Variable-constraint classification and quantification of radiology reports under the ACR Index
Tipologia:
Versione Editoriale (PDF)
Dimensione
450.95 kB
Formato
Adobe PDF
|
450.95 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


