Background: Biomedical natural language processing (NLP) increasingly relies on large language models and extensive datasets, presenting significant computational challenges. Methods: We propose Blue5, a multi-task model based on SciFive that incorporates instance selection (IS) to enable efficient, multi-task learning (MTL) on biomedical data. We adapt the E2SC-IS framework for the biomedical domain, integrating a calibrated SVM classifier to reduce computational costs. Results: Our approach achieves an average data reduction of 26.6% across the several tasks of the BLUE (Biomedical Language Understanding Evaluation) Benchmark, while maintaining performance comparable with state-of-the-art models. The multi-task SVM configuration emerges as the most effective, demonstrating the power of combining IS with MTL for biomedical NLP. As a result of the unified framework, Blue5 effectively selects the most informative instances across tasks, ensuring model generalization while efficiently handling multiple NLP tasks. Conclusion: Our work offers a practical solution to address growing computational demands, enabling more scalable and accessible applications of advanced NLP techniques in biomedical research and healthcare.

Efficient multi-task learning with instance selection for biomedical NLP

Bonfigli A.;Pecchia L.;Merone M.;Dell'Orletta F.
2025

Abstract

Background: Biomedical natural language processing (NLP) increasingly relies on large language models and extensive datasets, presenting significant computational challenges. Methods: We propose Blue5, a multi-task model based on SciFive that incorporates instance selection (IS) to enable efficient, multi-task learning (MTL) on biomedical data. We adapt the E2SC-IS framework for the biomedical domain, integrating a calibrated SVM classifier to reduce computational costs. Results: Our approach achieves an average data reduction of 26.6% across the several tasks of the BLUE (Biomedical Language Understanding Evaluation) Benchmark, while maintaining performance comparable with state-of-the-art models. The multi-task SVM configuration emerges as the most effective, demonstrating the power of combining IS with MTL for biomedical NLP. As a result of the unified framework, Blue5 effectively selects the most informative instances across tasks, ensuring model generalization while efficiently handling multiple NLP tasks. Conclusion: Our work offers a practical solution to address growing computational demands, enabling more scalable and accessible applications of advanced NLP techniques in biomedical research and healthcare.
Campo DC Valore Lingua
dc.authority.ancejournal COMPUTERS IN BIOLOGY AND MEDICINE en
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC en
dc.authority.people Bonfigli A. en
dc.authority.people Bacco L. en
dc.authority.people Pecchia L. en
dc.authority.people Merone M. en
dc.authority.people Dell'Orletta F. en
dc.collection.id.s b3f88f24-048a-4e43-8ab1-6697b90e068e *
dc.collection.name 01.01 Articolo in rivista *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.contributor.area Non assegn *
dc.date.accessioned 2026/03/03 14:57:20 -
dc.date.available 2026/03/03 14:57:20 -
dc.date.firstsubmission 2026/03/02 18:50:58 *
dc.date.issued 2025 -
dc.date.submission 2026/03/02 18:50:58 *
dc.description.abstracteng Background: Biomedical natural language processing (NLP) increasingly relies on large language models and extensive datasets, presenting significant computational challenges. Methods: We propose Blue5, a multi-task model based on SciFive that incorporates instance selection (IS) to enable efficient, multi-task learning (MTL) on biomedical data. We adapt the E2SC-IS framework for the biomedical domain, integrating a calibrated SVM classifier to reduce computational costs. Results: Our approach achieves an average data reduction of 26.6% across the several tasks of the BLUE (Biomedical Language Understanding Evaluation) Benchmark, while maintaining performance comparable with state-of-the-art models. The multi-task SVM configuration emerges as the most effective, demonstrating the power of combining IS with MTL for biomedical NLP. As a result of the unified framework, Blue5 effectively selects the most informative instances across tasks, ensuring model generalization while efficiently handling multiple NLP tasks. Conclusion: Our work offers a practical solution to address growing computational demands, enabling more scalable and accessible applications of advanced NLP techniques in biomedical research and healthcare. -
dc.description.allpeople Bonfigli, A.; Bacco, L.; Pecchia, L.; Merone, M.; Dell'Orletta, F. -
dc.description.allpeopleoriginal Bonfigli A.; Bacco L.; Pecchia L.; Merone M.; Dell'Orletta F. en
dc.description.fulltext open en
dc.description.international no en
dc.description.numberofauthors 5 -
dc.identifier.doi 10.1016/j.compbiomed.2025.110050 en
dc.identifier.scopus 2-s2.0-105001252768 en
dc.identifier.source scopus *
dc.identifier.uri https://hdl.handle.net/20.500.14243/570501 -
dc.language.iso eng en
dc.relation.volume 190 en
dc.subject.keywords Biomedical NLP -
dc.subject.keywords BLUE benchmark -
dc.subject.keywords Computational efficiency -
dc.subject.keywords Instance selection -
dc.subject.keywords Multi-task learning -
dc.subject.singlekeyword Biomedical NLP *
dc.subject.singlekeyword BLUE benchmark *
dc.subject.singlekeyword Computational efficiency *
dc.subject.singlekeyword Instance selection *
dc.subject.singlekeyword Multi-task learning *
dc.title Efficient multi-task learning with instance selection for biomedical NLP en
dc.type.driver info:eu-repo/semantics/article -
dc.type.full 01 Contributo su Rivista::01.01 Articolo in rivista it
dc.type.miur 262 -
iris.mediafilter.data 2026/03/04 02:52:12 *
iris.orcid.lastModifiedDate 2026/03/03 14:57:20 *
iris.orcid.lastModifiedMillisecond 1772546240668 *
iris.scopus.extIssued 2025 -
iris.scopus.extTitle Efficient multi-task learning with instance selection for biomedical NLP -
iris.sitodocente.maxattempts 1 -
iris.unpaywall.doi 10.1016/j.compbiomed.2025.110050 *
iris.unpaywall.isoa false *
iris.unpaywall.journalisindoaj false *
iris.unpaywall.metadataCallLastModified 04/03/2026 04:34:02 -
iris.unpaywall.metadataCallLastModifiedMillisecond 1772595242548 -
iris.unpaywall.oastatus closed *
scopus.authority.ancejournal COMPUTERS IN BIOLOGY AND MEDICINE###0010-4825 *
scopus.category 2718 *
scopus.category 1706 *
scopus.contributor.affiliation Università Campus Bio-Medico di Roma -
scopus.contributor.affiliation Università Campus Bio-Medico di Roma -
scopus.contributor.affiliation Fondazione Policlinico Universitario Campus Bio-Medico di Roma -
scopus.contributor.affiliation Università Campus Bio-Medico di Roma -
scopus.contributor.affiliation National Research Council -
scopus.contributor.afid 60005308 -
scopus.contributor.afid 60005308 -
scopus.contributor.afid 60276021 -
scopus.contributor.afid 60005308 -
scopus.contributor.afid 60021199 -
scopus.contributor.auid 58973576400 -
scopus.contributor.auid 57220927387 -
scopus.contributor.auid 35746897300 -
scopus.contributor.auid 56102657200 -
scopus.contributor.auid 57540567000 -
scopus.contributor.country Italy -
scopus.contributor.country Italy -
scopus.contributor.country Italy -
scopus.contributor.country Italy -
scopus.contributor.country Italy -
scopus.contributor.dptid 116307659 -
scopus.contributor.dptid 116307659 -
scopus.contributor.dptid -
scopus.contributor.dptid 116307659 -
scopus.contributor.dptid 121833164 -
scopus.contributor.name Agnese -
scopus.contributor.name Luca -
scopus.contributor.name Leandro -
scopus.contributor.name Mario -
scopus.contributor.name Felice -
scopus.contributor.subaffiliation Research Unit of Intelligent Technology for Health and Wellbeing;Department of Engineering; -
scopus.contributor.subaffiliation Research Unit of Computer Systems and Bioinformatics;Department of Engineering; -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation Research Unit of Intelligent Technology for Health and Wellbeing;Department of Engineering; -
scopus.contributor.subaffiliation ItaliaNLP Lab;Institute of Computational Linguistics ”Antonio Zampolli”; -
scopus.contributor.surname Bonfigli -
scopus.contributor.surname Bacco -
scopus.contributor.surname Pecchia -
scopus.contributor.surname Merone -
scopus.contributor.surname Dell'Orletta -
scopus.date.issued 2025 *
scopus.description.abstracteng Background: Biomedical natural language processing (NLP) increasingly relies on large language models and extensive datasets, presenting significant computational challenges. Methods: We propose Blue5, a multi-task model based on SciFive that incorporates instance selection (IS) to enable efficient, multi-task learning (MTL) on biomedical data. We adapt the E2SC-IS framework for the biomedical domain, integrating a calibrated SVM classifier to reduce computational costs. Results: Our approach achieves an average data reduction of 26.6% across the several tasks of the BLUE (Biomedical Language Understanding Evaluation) Benchmark, while maintaining performance comparable with state-of-the-art models. The multi-task SVM configuration emerges as the most effective, demonstrating the power of combining IS with MTL for biomedical NLP. As a result of the unified framework, Blue5 effectively selects the most informative instances across tasks, ensuring model generalization while efficiently handling multiple NLP tasks. Conclusion: Our work offers a practical solution to address growing computational demands, enabling more scalable and accessible applications of advanced NLP techniques in biomedical research and healthcare. *
scopus.description.allpeopleoriginal Bonfigli A.; Bacco L.; Pecchia L.; Merone M.; Dell'Orletta F. *
scopus.differences scopus.subject.keywords *
scopus.document.type ar *
scopus.document.types ar *
scopus.identifier.doi 10.1016/j.compbiomed.2025.110050 *
scopus.identifier.eissn 1879-0534 *
scopus.identifier.pmid 40168806 *
scopus.identifier.pui 2038116232 *
scopus.identifier.scopus 2-s2.0-105001252768 *
scopus.journal.sourceid 17957 *
scopus.language.iso eng *
scopus.publisher.name Elsevier Ltd *
scopus.relation.article 110050 *
scopus.relation.volume 190 *
scopus.subject.keywords Biomedical NLP; BLUE benchmark; Computational efficiency; Instance selection; Multi-task learning; *
scopus.title Efficient multi-task learning with instance selection for biomedical NLP *
scopus.titleeng Efficient multi-task learning with instance selection for biomedical NLP *
Appare nelle tipologie: 01.01 Articolo in rivista
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S0010482525004019-main.pdf

accesso aperto

Licenza: Creative commons
Dimensione 1.94 MB
Formato Adobe PDF
1.94 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/570501
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact