Background Due to the rapidly expanding body of biomedical literature, biologists require increasingly sophisticated and efficient systems to help them to search for relevant information. Such systems should account for the multiple written variants used to represent biomedical concepts, and allow the user to search for specific pieces of knowledge (or events) involving these concepts, e.g., protein-protein interactions. Such functionality requires access to detailed information about words used in the biomedical literature. Existing databases and ontologies often have a specific focus and are oriented towards human use. Consequently, biological knowledge is dispersed amongst many resources, which often do not attempt to account for the large and frequently changing set of variants that appear in the literature. Additionally, such resources typically do not provide information about how terms relate to each other in texts to describe events. Results This article provides an overview of the design, construction and evaluation of a large-scale lexical and conceptual resource for the biomedical domain, the BioLexicon. The resource can be exploited by text mining tools at several levels, e.g., part-of-speech tagging, recognition of biomedical entities, and the extraction of events in which they are involved. As such, the BioLexicon must account for real usage of words in biomedical texts. In particular, the BioLexicon gathers together different types of terms from several existing data resources into a single, unified repository, and augments them with new term variants automatically extracted from biomedical literature. Extraction of events is facilitated through the inclusion of biologically pertinent verbs (around which events are typically organized) together with information about typical patterns of grammatical and semantic behaviour, which are acquired from domain-specific texts. In order to foster interoperability, the BioLexicon is modelled using the Lexical Markup Framework, an ISO standard. Conclusions The BioLexicon contains over 2.2 M lexical entries and over 1.8 M terminological variants, as well as over 3.3 M semantic relations, including over 2 M synonymy relations. Its exploitation can benefit both application developers and users. We demonstrate some such benefits by describing integration of the resource into a number of different tools, and evaluating improvements in performance that this can bring.
The BioLexicon: a large-scale terminological resource for biomedical text mining
Simonetta Montemagni;Riccardo del Gratta;Simone Marchi;Monica Monachini;Valeria Quochi;Giulia Venturi;
2011
Abstract
Background Due to the rapidly expanding body of biomedical literature, biologists require increasingly sophisticated and efficient systems to help them to search for relevant information. Such systems should account for the multiple written variants used to represent biomedical concepts, and allow the user to search for specific pieces of knowledge (or events) involving these concepts, e.g., protein-protein interactions. Such functionality requires access to detailed information about words used in the biomedical literature. Existing databases and ontologies often have a specific focus and are oriented towards human use. Consequently, biological knowledge is dispersed amongst many resources, which often do not attempt to account for the large and frequently changing set of variants that appear in the literature. Additionally, such resources typically do not provide information about how terms relate to each other in texts to describe events. Results This article provides an overview of the design, construction and evaluation of a large-scale lexical and conceptual resource for the biomedical domain, the BioLexicon. The resource can be exploited by text mining tools at several levels, e.g., part-of-speech tagging, recognition of biomedical entities, and the extraction of events in which they are involved. As such, the BioLexicon must account for real usage of words in biomedical texts. In particular, the BioLexicon gathers together different types of terms from several existing data resources into a single, unified repository, and augments them with new term variants automatically extracted from biomedical literature. Extraction of events is facilitated through the inclusion of biologically pertinent verbs (around which events are typically organized) together with information about typical patterns of grammatical and semantic behaviour, which are acquired from domain-specific texts. In order to foster interoperability, the BioLexicon is modelled using the Lexical Markup Framework, an ISO standard. Conclusions The BioLexicon contains over 2.2 M lexical entries and over 1.8 M terminological variants, as well as over 3.3 M semantic relations, including over 2 M synonymy relations. Its exploitation can benefit both application developers and users. We demonstrate some such benefits by describing integration of the resource into a number of different tools, and evaluating improvements in performance that this can bring.| Campo DC | Valore | Lingua |
|---|---|---|
| dc.authority.ancejournal | BMC BIOINFORMATICS | - |
| dc.authority.orgunit | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | - |
| dc.authority.people | Paul Thompson | it |
| dc.authority.people | John McNaught | it |
| dc.authority.people | Simonetta Montemagni | it |
| dc.authority.people | Nicoletta Calzolari | it |
| dc.authority.people | Riccardo del Gratta | it |
| dc.authority.people | Vivian Lee | it |
| dc.authority.people | Simone Marchi | it |
| dc.authority.people | Monica Monachini | it |
| dc.authority.people | Piotr Pezik | it |
| dc.authority.people | Valeria Quochi | it |
| dc.authority.people | CJ Rupp | it |
| dc.authority.people | Yutaka Sasaki | it |
| dc.authority.people | Giulia Venturi | it |
| dc.authority.people | Dietrich RebholzSchuhmann | it |
| dc.authority.people | Sophia Ananiadou | it |
| dc.collection.id.s | b3f88f24-048a-4e43-8ab1-6697b90e068e | * |
| dc.collection.name | 01.01 Articolo in rivista | * |
| dc.contributor.appartenenza | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | * |
| dc.contributor.appartenenza.mi | 918 | * |
| dc.date.accessioned | 2024/02/21 05:50:47 | - |
| dc.date.available | 2024/02/21 05:50:47 | - |
| dc.date.issued | 2011 | - |
| dc.description.abstracteng | Background Due to the rapidly expanding body of biomedical literature, biologists require increasingly sophisticated and efficient systems to help them to search for relevant information. Such systems should account for the multiple written variants used to represent biomedical concepts, and allow the user to search for specific pieces of knowledge (or events) involving these concepts, e.g., protein-protein interactions. Such functionality requires access to detailed information about words used in the biomedical literature. Existing databases and ontologies often have a specific focus and are oriented towards human use. Consequently, biological knowledge is dispersed amongst many resources, which often do not attempt to account for the large and frequently changing set of variants that appear in the literature. Additionally, such resources typically do not provide information about how terms relate to each other in texts to describe events. Results This article provides an overview of the design, construction and evaluation of a large-scale lexical and conceptual resource for the biomedical domain, the BioLexicon. The resource can be exploited by text mining tools at several levels, e.g., part-of-speech tagging, recognition of biomedical entities, and the extraction of events in which they are involved. As such, the BioLexicon must account for real usage of words in biomedical texts. In particular, the BioLexicon gathers together different types of terms from several existing data resources into a single, unified repository, and augments them with new term variants automatically extracted from biomedical literature. Extraction of events is facilitated through the inclusion of biologically pertinent verbs (around which events are typically organized) together with information about typical patterns of grammatical and semantic behaviour, which are acquired from domain-specific texts. In order to foster interoperability, the BioLexicon is modelled using the Lexical Markup Framework, an ISO standard. Conclusions The BioLexicon contains over 2.2 M lexical entries and over 1.8 M terminological variants, as well as over 3.3 M semantic relations, including over 2 M synonymy relations. Its exploitation can benefit both application developers and users. We demonstrate some such benefits by describing integration of the resource into a number of different tools, and evaluating improvements in performance that this can bring. | - |
| dc.description.affiliations | School of Computer Science, University of Manchester; National Centre for Text Mining, Manchester Interdisciplinary Biocentre, University of Manchester; Manchester Interdisciplinary Biocentre, University of Manchester; Istituto di Linguistica Computazionale del CNR; European Bioinformatics Institute, Wellcome Trust Genome Campus; Toyota Technological Institute | - |
| dc.description.allpeople | Thompson, Paul; Mcnaught, John; Montemagni, Simonetta; Calzolari, Nicoletta; DEL GRATTA, Riccardo; Lee, Vivian; Marchi, Simone; Monachini, Monica; Pezik, Piotr; Quochi, Valeria; Rupp, Cj; Sasaki, Yutaka; Venturi, Giulia; Rebholzschuhmann, Dietrich; Ananiadou, Sophia | - |
| dc.description.allpeopleoriginal | Paul Thompson, John McNaught, Simonetta Montemagni, Nicoletta Calzolari, Riccardo del Gratta, Vivian Lee, Simone Marchi, Monica Monachini, Piotr Pezik, Valeria Quochi, CJ Rupp, Yutaka Sasaki, Giulia Venturi, Dietrich Rebholz-Schuhmann, Sophia Ananiadou | - |
| dc.description.fulltext | none | en |
| dc.description.note | ID_PUMA: cnr.ilc/2011-A0-011 | - |
| dc.description.numberofauthors | 15 | - |
| dc.identifier.doi | 10.1186/1471-2105-12-397 | - |
| dc.identifier.isi | WOS:000297641800001 | - |
| dc.identifier.scopus | 2-s2.0-80053915290 | - |
| dc.identifier.uri | https://hdl.handle.net/20.500.14243/175344 | - |
| dc.identifier.url | http://www.biomedcentral.com/1471-2105/12/397 | - |
| dc.language.iso | en | - |
| dc.miur.last.status.update | 2024-10-10T13:46:27Z | * |
| dc.relation.firstpage | 1 | - |
| dc.relation.issue | 397 | - |
| dc.relation.lastpage | 29 | - |
| dc.relation.numberofpages | 29 | - |
| dc.relation.volume | 12 | - |
| dc.subject.keywords | Text Mining | - |
| dc.subject.keywords | Information Extraction | - |
| dc.subject.keywords | Computational Lexicon | - |
| dc.subject.singlekeyword | Text Mining | * |
| dc.subject.singlekeyword | Information Extraction | * |
| dc.subject.singlekeyword | Computational Lexicon | * |
| dc.title | The BioLexicon: a large-scale terminological resource for biomedical text mining | en |
| dc.type.driver | info:eu-repo/semantics/article | - |
| dc.type.full | 01 Contributo su Rivista::01.01 Articolo in rivista | it |
| dc.type.miur | 262 | - |
| dc.type.referee | Sì, ma tipo non specificato | - |
| dc.ugov.descaux1 | 205232 | - |
| iris.isi.metadataErrorDescription | 0 | - |
| iris.isi.metadataErrorType | ERROR_NO_MATCH | - |
| iris.isi.metadataStatus | ERROR | - |
| iris.orcid.lastModifiedDate | 2024/04/04 17:36:55 | * |
| iris.orcid.lastModifiedMillisecond | 1712245015377 | * |
| iris.scopus.extIssued | 2011 | - |
| iris.scopus.extTitle | The BioLexicon: A large-scale terminological resource for biomedical text mining | - |
| iris.sitodocente.maxattempts | 1 | - |
| iris.unpaywall.bestoahost | publisher | * |
| iris.unpaywall.bestoaversion | publishedVersion | * |
| iris.unpaywall.doi | 10.1186/1471-2105-12-397 | * |
| iris.unpaywall.hosttype | publisher | * |
| iris.unpaywall.isoa | true | * |
| iris.unpaywall.journalisindoaj | true | * |
| iris.unpaywall.landingpage | https://doi.org/10.1186/1471-2105-12-397 | * |
| iris.unpaywall.license | cc-by | * |
| iris.unpaywall.metadataCallLastModified | 13/03/2025 05:50:20 | - |
| iris.unpaywall.metadataCallLastModifiedMillisecond | 1741841420980 | - |
| iris.unpaywall.oastatus | gold | * |
| iris.unpaywall.pdfurl | https://bmcbioinformatics.biomedcentral.com/counter/pdf/10.1186/1471-2105-12-397 | * |
| scopus.authority.ancejournal | BMC BIOINFORMATICS###1471-2105 | * |
| scopus.category | 1315 | * |
| scopus.category | 1303 | * |
| scopus.category | 1312 | * |
| scopus.category | 1706 | * |
| scopus.category | 2604 | * |
| scopus.contributor.affiliation | University of Manchester | - |
| scopus.contributor.affiliation | University of Manchester | - |
| scopus.contributor.affiliation | Istituto di Linguistica Computazionale del CNR | - |
| scopus.contributor.affiliation | Istituto di Linguistica Computazionale del CNR | - |
| scopus.contributor.affiliation | Istituto di Linguistica Computazionale del CNR | - |
| scopus.contributor.affiliation | Wellcome Trust Genome Campus | - |
| scopus.contributor.affiliation | Istituto di Linguistica Computazionale del CNR | - |
| scopus.contributor.affiliation | Istituto di Linguistica Computazionale del CNR | - |
| scopus.contributor.affiliation | Wellcome Trust Genome Campus | - |
| scopus.contributor.affiliation | Istituto di Linguistica Computazionale del CNR | - |
| scopus.contributor.affiliation | University of Manchester | - |
| scopus.contributor.affiliation | Toyota Technological Institute | - |
| scopus.contributor.affiliation | Istituto di Linguistica Computazionale del CNR | - |
| scopus.contributor.affiliation | Wellcome Trust Genome Campus | - |
| scopus.contributor.affiliation | University of Manchester | - |
| scopus.contributor.afid | 60003771 | - |
| scopus.contributor.afid | 60003771 | - |
| scopus.contributor.afid | 60008941 | - |
| scopus.contributor.afid | 60008941 | - |
| scopus.contributor.afid | 60008941 | - |
| scopus.contributor.afid | 60026124 | - |
| scopus.contributor.afid | 60008941 | - |
| scopus.contributor.afid | 60008941 | - |
| scopus.contributor.afid | 60026124 | - |
| scopus.contributor.afid | 60008941 | - |
| scopus.contributor.afid | 60003771 | - |
| scopus.contributor.afid | 60006081 | - |
| scopus.contributor.afid | 60008941 | - |
| scopus.contributor.afid | 60026124 | - |
| scopus.contributor.afid | 60003771 | - |
| scopus.contributor.auid | 57820641400 | - |
| scopus.contributor.auid | 22953888200 | - |
| scopus.contributor.auid | 15056781100 | - |
| scopus.contributor.auid | 8845912500 | - |
| scopus.contributor.auid | 34976432900 | - |
| scopus.contributor.auid | 36602778700 | - |
| scopus.contributor.auid | 27567818000 | - |
| scopus.contributor.auid | 23397766600 | - |
| scopus.contributor.auid | 24332242800 | - |
| scopus.contributor.auid | 34977412400 | - |
| scopus.contributor.auid | 37666044700 | - |
| scopus.contributor.auid | 35956948800 | - |
| scopus.contributor.auid | 27568199800 | - |
| scopus.contributor.auid | 6507852707 | - |
| scopus.contributor.auid | 6602788919 | - |
| scopus.contributor.country | United Kingdom | - |
| scopus.contributor.country | United Kingdom | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.country | United Kingdom | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.country | United Kingdom | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.country | United Kingdom | - |
| scopus.contributor.country | Japan | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.country | United Kingdom | - |
| scopus.contributor.country | United Kingdom | - |
| scopus.contributor.dptid | 103240669 | - |
| scopus.contributor.dptid | 103240669 | - |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | 103240669 | - |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | 103240669 | - |
| scopus.contributor.name | Paul | - |
| scopus.contributor.name | John | - |
| scopus.contributor.name | Simonetta | - |
| scopus.contributor.name | Nicoletta | - |
| scopus.contributor.name | Riccardo | - |
| scopus.contributor.name | Vivian | - |
| scopus.contributor.name | Simone | - |
| scopus.contributor.name | Monica | - |
| scopus.contributor.name | Piotr | - |
| scopus.contributor.name | Valeria | - |
| scopus.contributor.name | CJ | - |
| scopus.contributor.name | Yutaka | - |
| scopus.contributor.name | Giulia | - |
| scopus.contributor.name | Dietrich | - |
| scopus.contributor.name | Sophia | - |
| scopus.contributor.subaffiliation | Manchester Interdisciplinary Biocentre; | - |
| scopus.contributor.subaffiliation | Manchester Interdisciplinary Biocentre; | - |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | European Bioinformatics Institute; | - |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | European Bioinformatics Institute; | - |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | Manchester Interdisciplinary Biocentre; | - |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | European Bioinformatics Institute; | - |
| scopus.contributor.subaffiliation | Manchester Interdisciplinary Biocentre; | - |
| scopus.contributor.surname | Thompson | - |
| scopus.contributor.surname | McNaught | - |
| scopus.contributor.surname | Montemagni | - |
| scopus.contributor.surname | Calzolari | - |
| scopus.contributor.surname | del Gratta | - |
| scopus.contributor.surname | Lee | - |
| scopus.contributor.surname | Marchi | - |
| scopus.contributor.surname | Monachini | - |
| scopus.contributor.surname | Pezik | - |
| scopus.contributor.surname | Quochi | - |
| scopus.contributor.surname | Rupp | - |
| scopus.contributor.surname | Sasaki | - |
| scopus.contributor.surname | Venturi | - |
| scopus.contributor.surname | Rebholz-Schuhmann | - |
| scopus.contributor.surname | Ananiadou | - |
| scopus.date.issued | 2011 | * |
| scopus.description.abstracteng | Background: Due to the rapidly expanding body of biomedical literature, biologists require increasingly sophisticated and efficient systems to help them to search for relevant information. Such systems should account for the multiple written variants used to represent biomedical concepts, and allow the user to search for specific pieces of knowledge (or events) involving these concepts, e.g., protein-protein interactions. Such functionality requires access to detailed information about words used in the biomedical literature. Existing databases and ontologies often have a specific focus and are oriented towards human use. Consequently, biological knowledge is dispersed amongst many resources, which often do not attempt to account for the large and frequently changing set of variants that appear in the literature. Additionally, such resources typically do not provide information about how terms relate to each other in texts to describe events.Results: This article provides an overview of the design, construction and evaluation of a large-scale lexical and conceptual resource for the biomedical domain, the BioLexicon. The resource can be exploited by text mining tools at several levels, e.g., part-of-speech tagging, recognition of biomedical entities, and the extraction of events in which they are involved. As such, the BioLexicon must account for real usage of words in biomedical texts. In particular, the BioLexicon gathers together different types of terms from several existing data resources into a single, unified repository, and augments them with new term variants automatically extracted from biomedical literature. Extraction of events is facilitated through the inclusion of biologically pertinent verbs (around which events are typically organized) together with information about typical patterns of grammatical and semantic behaviour, which are acquired from domain-specific texts. In order to foster interoperability, the BioLexicon is modelled using the Lexical Markup Framework, an ISO standard.Conclusions: The BioLexicon contains over 2.2 M lexical entries and over 1.8 M terminological variants, as well as over 3.3 M semantic relations, including over 2 M synonymy relations. Its exploitation can benefit both application developers and users. We demonstrate some such benefits by describing integration of the resource into a number of different tools, and evaluating improvements in performance that this can bring. © 2011 Thompson et al; licensee BioMed Central Ltd. | * |
| scopus.description.allpeopleoriginal | Thompson P.; McNaught J.; Montemagni S.; Calzolari N.; del Gratta R.; Lee V.; Marchi S.; Monachini M.; Pezik P.; Quochi V.; Rupp C.J.; Sasaki Y.; Venturi G.; Rebholz-Schuhmann D.; Ananiadou S. | * |
| scopus.differences | scopus.description.allpeopleoriginal | * |
| scopus.differences | scopus.description.abstracteng | * |
| scopus.differences | scopus.language.iso | * |
| scopus.document.type | ar | * |
| scopus.document.types | ar | * |
| scopus.funding.funders | 501100000276 - Department of Health and Social Care; 501100000265 - Medical Research Council; 501100000272 - National Institute for Health Research; 100010269 - Wellcome Trust; 501100000289 - Cancer Research UK; 501100000274 - British Heart Foundation; 501100000589 - Chief Scientist Office; 100014013 - UK Research and Innovation; 501100000268 - Biotechnology and Biological Sciences Research Council; 501100000780 - European Commission; | * |
| scopus.funding.ids | BB/G013160/1; FP6-028099; | * |
| scopus.identifier.doi | 10.1186/1471-2105-12-397 | * |
| scopus.identifier.eissn | 1471-2105 | * |
| scopus.identifier.pmid | 21992002 | * |
| scopus.identifier.pui | 51667516 | * |
| scopus.identifier.scopus | 2-s2.0-80053915290 | * |
| scopus.journal.sourceid | 17929 | * |
| scopus.language.iso | eng | * |
| scopus.relation.article | 397 | * |
| scopus.relation.volume | 12 | * |
| scopus.title | The BioLexicon: A large-scale terminological resource for biomedical text mining | * |
| scopus.titleeng | The BioLexicon: A large-scale terminological resource for biomedical text mining | * |
| Appare nelle tipologie: | 01.01 Articolo in rivista | |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


