Background: Recently, a novel speech-in-noise test for adult hearing screening integrated with multivariate logistic regression, involving a certain number of features besides the speech-recognition-threshold (SRT) (i.e., the usual output of speech-in-noise test) has proven to be as accurate as comparable state of the art hearing screening tools. Despite its prevalence, hearing loss is often neglected, especially by older adults that are reluctant to test their hearing abilities. Even more so, these subjects may not fully trust recommendations/predictions provided by a new hearing screening tool, if its decision-making process is based on 'black box' models, thus lacking of transparency. Therefore, the integration of the previously mentioned speech-in-noise test with eXplainable Artificial intelligence (XAI) models, supplied with a set of intelligible rules and numerical cut-offs, may provide reliable advices about the subject's hearing condition. Methods: A trustworthy screening platform based on a natively explainable algorithm called Logic Learning Machine (LLM) was implemented. The classification task was performed and evaluated starting from nine features related to hearing screening data collected from 148 subjects. Generative Adversarial Networks were used to create synthetic augmented datasets with 1000 records. The quality of two synthetic datasets with slightly different Maximum Mean Discrepancy (MMD) values was evaluated in terms of generated rules (defined by cut-offs, number, covering end error) and related classification performances on real and augmented data. Specifically, the synthetic datasets were evaluated in terms of classification performance on three different conditions besides the real performance (A): LLM trained and tested on synthetic data (B), LLM trained on synthetic data and tested on real data (C), LLM trained on real data and tested on synthetic data (D). Results: The real classification performance was evaluated on ten randomly shuffled versions of the training and test set (average F-measure (A): 74%; average n° of rules: 10; average n° of conditions per rule: 2.5; average covering: 37.1%; average error: 3.22%). The LLM led to stable single-condition rules among ten iterations in the form age <= ?age for the 'pass' class and #correct <= ?correct for the 'fail' class. More complex rules involving the same parameters were often coupled with additional conditions involving for example SRT, the percentage correct responses and the average reaction time. Features like the volume and the total test time showed instead a lower relevance, hence being only sporadically involved as rules conditions. The F-measure was evaluated in three conditions for both the synthetic dataset, respectively, the high MMD one (B: 50,3%; C: 67,8%; D: 56,54%) and the low MMD one (B: 95,13%; C: 74,3%; D: 70,6%). The dataset with higher MMD (i.e., lower quality) showed rules with less similarities as it was characterized by a higher average number of conditions and rules that didn't reflect the real relationships between input features and the output class. On the contrary, the synthetic dataset with lower MMD (i.e., better quality) showed better classification performance on both synthetic and real data, thanks to a set of representative rules. Conclusion: The use of speech-in-noise tests coupled with multivariate explainable techniques may allow trustworthy and accurate online hearing loss detection. Nevertheless, AI methods require a large amount of data in order to guarantee a more than satisfactory performance. Data scarcity in the medical field can be tackled with the use of data augmentation. However, synthetic data must guarantee a faithful reproduction of the real phenomenon observed. There appears to be some agreement between traditional metrics for data quality assessment and parameters derived from the classification performance on both the synthetic data itself and the real data. The use of XAI allows additional information to be extracted with respect to validation of the extracted rules. Ongoing research is focusing on the creation of a synthetic data quality metrics based on the combination of classification performance (i.e., covering, error, n° of conditions per rule) and a measure of similarity between rules derived from the synthetic dataset with respect to those derived from the real dataset.

A framework of Explainable Artificial Intelligence for adult hearing screening

Orani V;Mongelli M;Paglialonga;
2022

Abstract

Background: Recently, a novel speech-in-noise test for adult hearing screening integrated with multivariate logistic regression, involving a certain number of features besides the speech-recognition-threshold (SRT) (i.e., the usual output of speech-in-noise test) has proven to be as accurate as comparable state of the art hearing screening tools. Despite its prevalence, hearing loss is often neglected, especially by older adults that are reluctant to test their hearing abilities. Even more so, these subjects may not fully trust recommendations/predictions provided by a new hearing screening tool, if its decision-making process is based on 'black box' models, thus lacking of transparency. Therefore, the integration of the previously mentioned speech-in-noise test with eXplainable Artificial intelligence (XAI) models, supplied with a set of intelligible rules and numerical cut-offs, may provide reliable advices about the subject's hearing condition. Methods: A trustworthy screening platform based on a natively explainable algorithm called Logic Learning Machine (LLM) was implemented. The classification task was performed and evaluated starting from nine features related to hearing screening data collected from 148 subjects. Generative Adversarial Networks were used to create synthetic augmented datasets with 1000 records. The quality of two synthetic datasets with slightly different Maximum Mean Discrepancy (MMD) values was evaluated in terms of generated rules (defined by cut-offs, number, covering end error) and related classification performances on real and augmented data. Specifically, the synthetic datasets were evaluated in terms of classification performance on three different conditions besides the real performance (A): LLM trained and tested on synthetic data (B), LLM trained on synthetic data and tested on real data (C), LLM trained on real data and tested on synthetic data (D). Results: The real classification performance was evaluated on ten randomly shuffled versions of the training and test set (average F-measure (A): 74%; average n° of rules: 10; average n° of conditions per rule: 2.5; average covering: 37.1%; average error: 3.22%). The LLM led to stable single-condition rules among ten iterations in the form age <= ?age for the 'pass' class and #correct <= ?correct for the 'fail' class. More complex rules involving the same parameters were often coupled with additional conditions involving for example SRT, the percentage correct responses and the average reaction time. Features like the volume and the total test time showed instead a lower relevance, hence being only sporadically involved as rules conditions. The F-measure was evaluated in three conditions for both the synthetic dataset, respectively, the high MMD one (B: 50,3%; C: 67,8%; D: 56,54%) and the low MMD one (B: 95,13%; C: 74,3%; D: 70,6%). The dataset with higher MMD (i.e., lower quality) showed rules with less similarities as it was characterized by a higher average number of conditions and rules that didn't reflect the real relationships between input features and the output class. On the contrary, the synthetic dataset with lower MMD (i.e., better quality) showed better classification performance on both synthetic and real data, thanks to a set of representative rules. Conclusion: The use of speech-in-noise tests coupled with multivariate explainable techniques may allow trustworthy and accurate online hearing loss detection. Nevertheless, AI methods require a large amount of data in order to guarantee a more than satisfactory performance. Data scarcity in the medical field can be tackled with the use of data augmentation. However, synthetic data must guarantee a faithful reproduction of the real phenomenon observed. There appears to be some agreement between traditional metrics for data quality assessment and parameters derived from the classification performance on both the synthetic data itself and the real data. The use of XAI allows additional information to be extracted with respect to validation of the extracted rules. Ongoing research is focusing on the creation of a synthetic data quality metrics based on the combination of classification performance (i.e., covering, error, n° of conditions per rule) and a measure of similarity between rules derived from the synthetic dataset with respect to those derived from the real dataset.
2022
Istituto di Elettronica e di Ingegneria dell'Informazione e delle Telecomunicazioni - IEIIT
hearing screening
explainable AI
speech-in-noise test
speech-recognition threshold
mild hearing loss;
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/446194
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact