CNR Institutional Research Information System

Background Due to the rapidly expanding body of biomedical literature, biologists require increasingly sophisticated and efficient systems to help them to search for relevant information. Such systems should account for the multiple written variants used to represent biomedical concepts, and allow the user to search for specific pieces of knowledge (or events) involving these concepts, e.g., protein-protein interactions. Such functionality requires access to detailed information about words used in the biomedical literature. Existing databases and ontologies often have a specific focus and are oriented towards human use. Consequently, biological knowledge is dispersed amongst many resources, which often do not attempt to account for the large and frequently changing set of variants that appear in the literature. Additionally, such resources typically do not provide information about how terms relate to each other in texts to describe events. Results This article provides an overview of the design, construction and evaluation of a large-scale lexical and conceptual resource for the biomedical domain, the BioLexicon. The resource can be exploited by text mining tools at several levels, e.g., part-of-speech tagging, recognition of biomedical entities, and the extraction of events in which they are involved. As such, the BioLexicon must account for real usage of words in biomedical texts. In particular, the BioLexicon gathers together different types of terms from several existing data resources into a single, unified repository, and augments them with new term variants automatically extracted from biomedical literature. Extraction of events is facilitated through the inclusion of biologically pertinent verbs (around which events are typically organized) together with information about typical patterns of grammatical and semantic behaviour, which are acquired from domain-specific texts. In order to foster interoperability, the BioLexicon is modelled using the Lexical Markup Framework, an ISO standard. Conclusions The BioLexicon contains over 2.2 M lexical entries and over 1.8 M terminological variants, as well as over 3.3 M semantic relations, including over 2 M synonymy relations. Its exploitation can benefit both application developers and users. We demonstrate some such benefits by describing integration of the resource into a number of different tools, and evaluating improvements in performance that this can bring.

The BioLexicon: a large-scale terminological resource for biomedical text mining

Paul Thompson;John McNaught;Simonetta Montemagni;Nicoletta Calzolari;Riccardo del Gratta;Vivian Lee;Simone Marchi;Monica Monachini;Piotr Pezik;Valeria Quochi;CJ Rupp;Yutaka Sasaki;Giulia Venturi;Dietrich RebholzSchuhmann;Sophia Ananiadou

2011

Abstract

Background Due to the rapidly expanding body of biomedical literature, biologists require increasingly sophisticated and efficient systems to help them to search for relevant information. Such systems should account for the multiple written variants used to represent biomedical concepts, and allow the user to search for specific pieces of knowledge (or events) involving these concepts, e.g., protein-protein interactions. Such functionality requires access to detailed information about words used in the biomedical literature. Existing databases and ontologies often have a specific focus and are oriented towards human use. Consequently, biological knowledge is dispersed amongst many resources, which often do not attempt to account for the large and frequently changing set of variants that appear in the literature. Additionally, such resources typically do not provide information about how terms relate to each other in texts to describe events. Results This article provides an overview of the design, construction and evaluation of a large-scale lexical and conceptual resource for the biomedical domain, the BioLexicon. The resource can be exploited by text mining tools at several levels, e.g., part-of-speech tagging, recognition of biomedical entities, and the extraction of events in which they are involved. As such, the BioLexicon must account for real usage of words in biomedical texts. In particular, the BioLexicon gathers together different types of terms from several existing data resources into a single, unified repository, and augments them with new term variants automatically extracted from biomedical literature. Extraction of events is facilitated through the inclusion of biologically pertinent verbs (around which events are typically organized) together with information about typical patterns of grammatical and semantic behaviour, which are acquired from domain-specific texts. In order to foster interoperability, the BioLexicon is modelled using the Lexical Markup Framework, an ISO standard. Conclusions The BioLexicon contains over 2.2 M lexical entries and over 1.8 M terminological variants, as well as over 3.3 M semantic relations, including over 2 M synonymy relations. Its exploitation can benefit both application developers and users. We demonstrate some such benefits by describing integration of the resource into a number of different tools, and evaluating improvements in performance that this can bring.

Scheda breve

Scheda completa

Scheda completa (DC)

Campo DC	Valore	Lingua
dc.authority.ancejournal	BMC BIOINFORMATICS	-
dc.authority.orgunit	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	-
dc.authority.people	Paul Thompson	it
dc.authority.people	John McNaught	it
dc.authority.people	Simonetta Montemagni	it
dc.authority.people	Nicoletta Calzolari	it
dc.authority.people	Riccardo del Gratta	it
dc.authority.people	Vivian Lee	it
dc.authority.people	Simone Marchi	it
dc.authority.people	Monica Monachini	it
dc.authority.people	Piotr Pezik	it
dc.authority.people	Valeria Quochi	it
dc.authority.people	CJ Rupp	it
dc.authority.people	Yutaka Sasaki	it
dc.authority.people	Giulia Venturi	it
dc.authority.people	Dietrich RebholzSchuhmann	it
dc.authority.people	Sophia Ananiadou	it
dc.collection.id.s	b3f88f24-048a-4e43-8ab1-6697b90e068e	*
dc.collection.name	01.01 Articolo in rivista	*
dc.contributor.appartenenza	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	*
dc.contributor.appartenenza.mi	918	*
dc.date.accessioned	2024/02/21 05:50:47	-
dc.date.available	2024/02/21 05:50:47	-
dc.date.issued	2011	-
dc.description.abstracteng	Background Due to the rapidly expanding body of biomedical literature, biologists require increasingly sophisticated and efficient systems to help them to search for relevant information. Such systems should account for the multiple written variants used to represent biomedical concepts, and allow the user to search for specific pieces of knowledge (or events) involving these concepts, e.g., protein-protein interactions. Such functionality requires access to detailed information about words used in the biomedical literature. Existing databases and ontologies often have a specific focus and are oriented towards human use. Consequently, biological knowledge is dispersed amongst many resources, which often do not attempt to account for the large and frequently changing set of variants that appear in the literature. Additionally, such resources typically do not provide information about how terms relate to each other in texts to describe events. Results This article provides an overview of the design, construction and evaluation of a large-scale lexical and conceptual resource for the biomedical domain, the BioLexicon. The resource can be exploited by text mining tools at several levels, e.g., part-of-speech tagging, recognition of biomedical entities, and the extraction of events in which they are involved. As such, the BioLexicon must account for real usage of words in biomedical texts. In particular, the BioLexicon gathers together different types of terms from several existing data resources into a single, unified repository, and augments them with new term variants automatically extracted from biomedical literature. Extraction of events is facilitated through the inclusion of biologically pertinent verbs (around which events are typically organized) together with information about typical patterns of grammatical and semantic behaviour, which are acquired from domain-specific texts. In order to foster interoperability, the BioLexicon is modelled using the Lexical Markup Framework, an ISO standard. Conclusions The BioLexicon contains over 2.2 M lexical entries and over 1.8 M terminological variants, as well as over 3.3 M semantic relations, including over 2 M synonymy relations. Its exploitation can benefit both application developers and users. We demonstrate some such benefits by describing integration of the resource into a number of different tools, and evaluating improvements in performance that this can bring.	-
dc.description.affiliations	School of Computer Science, University of Manchester; National Centre for Text Mining, Manchester Interdisciplinary Biocentre, University of Manchester; Manchester Interdisciplinary Biocentre, University of Manchester; Istituto di Linguistica Computazionale del CNR; European Bioinformatics Institute, Wellcome Trust Genome Campus; Toyota Technological Institute	-
dc.description.allpeople	Thompson, Paul; Mcnaught, John; Montemagni, Simonetta; Calzolari, Nicoletta; DEL GRATTA, Riccardo; Lee, Vivian; Marchi, Simone; Monachini, Monica; Pezik, Piotr; Quochi, Valeria; Rupp, Cj; Sasaki, Yutaka; Venturi, Giulia; Rebholzschuhmann, Dietrich; Ananiadou, Sophia	-
dc.description.allpeopleoriginal	Paul Thompson, John McNaught, Simonetta Montemagni, Nicoletta Calzolari, Riccardo del Gratta, Vivian Lee, Simone Marchi, Monica Monachini, Piotr Pezik, Valeria Quochi, CJ Rupp, Yutaka Sasaki, Giulia Venturi, Dietrich Rebholz-Schuhmann, Sophia Ananiadou	-
dc.description.fulltext	none	en
dc.description.note	ID_PUMA: cnr.ilc/2011-A0-011	-
dc.description.numberofauthors	15	-
dc.identifier.doi	10.1186/1471-2105-12-397	-
dc.identifier.isi	WOS:000297641800001	-
dc.identifier.scopus	2-s2.0-80053915290	-
dc.identifier.uri	https://hdl.handle.net/20.500.14243/175344	-
dc.identifier.url	http://www.biomedcentral.com/1471-2105/12/397	-
dc.language.iso	en	-
dc.miur.last.status.update	2024-10-10T13:46:27Z	*
dc.relation.firstpage	1	-
dc.relation.issue	397	-
dc.relation.lastpage	29	-
dc.relation.numberofpages	29	-
dc.relation.volume	12	-
dc.subject.keywords	Text Mining	-
dc.subject.keywords	Information Extraction	-
dc.subject.keywords	Computational Lexicon	-
dc.subject.singlekeyword	Text Mining	*
dc.subject.singlekeyword	Information Extraction	*
dc.subject.singlekeyword	Computational Lexicon	*
dc.title	The BioLexicon: a large-scale terminological resource for biomedical text mining	en
dc.type.driver	info:eu-repo/semantics/article	-
dc.type.full	01 Contributo su Rivista::01.01 Articolo in rivista	it
dc.type.miur	262	-
dc.type.referee	Sì, ma tipo non specificato	-
dc.ugov.descaux1	205232	-
iris.isi.extIssued	2011	-
iris.isi.extTitle	The BioLexicon: a large-scale terminological resource for biomedical text mining	-
iris.orcid.lastModifiedDate	2024/04/04 17:36:55	*
iris.orcid.lastModifiedMillisecond	1712245015377	*
iris.scopus.extIssued	2011	-
iris.scopus.extTitle	The BioLexicon: A large-scale terminological resource for biomedical text mining	-
iris.sitodocente.maxattempts	1	-
iris.unpaywall.bestoahost	publisher	*
iris.unpaywall.bestoaversion	publishedVersion	*
iris.unpaywall.doi	10.1186/1471-2105-12-397	*
iris.unpaywall.hosttype	publisher	*
iris.unpaywall.isoa	true	*
iris.unpaywall.journalisindoaj	true	*
iris.unpaywall.landingpage	https://doi.org/10.1186/1471-2105-12-397	*
iris.unpaywall.license	cc-by	*
iris.unpaywall.metadataCallLastModified	15/03/2026 04:52:10	-
iris.unpaywall.metadataCallLastModifiedMillisecond	1773546730604	-
iris.unpaywall.oastatus	gold	*
iris.unpaywall.pdfurl	https://bmcbioinformatics.biomedcentral.com/counter/pdf/10.1186/1471-2105-12-397	*
isi.authority.ancejournal	BMC BIOINFORMATICS###1471-2105	*
isi.authority.sdg	Goal 3: Good health and well-being###12083	*
isi.category	MC	*
isi.category	CO	*
isi.category	DB	*
isi.contributor.affiliation	University of Manchester	-
isi.contributor.affiliation	University of Manchester	-
isi.contributor.affiliation	Consiglio Nazionale delle Ricerche (CNR)	-
isi.contributor.affiliation	Consiglio Nazionale delle Ricerche (CNR)	-
isi.contributor.affiliation	Consiglio Nazionale delle Ricerche (CNR)	-
isi.contributor.affiliation	European Molecular Biology Laboratory (EMBL)	-
isi.contributor.affiliation	Consiglio Nazionale delle Ricerche (CNR)	-
isi.contributor.affiliation	Consiglio Nazionale delle Ricerche (CNR)	-
isi.contributor.affiliation	European Molecular Biology Laboratory (EMBL)	-
isi.contributor.affiliation	Consiglio Nazionale delle Ricerche (CNR)	-
isi.contributor.affiliation	University of Manchester	-
isi.contributor.affiliation	University of Manchester	-
isi.contributor.affiliation	Consiglio Nazionale delle Ricerche (CNR)	-
isi.contributor.affiliation	European Molecular Biology Laboratory (EMBL)	-
isi.contributor.affiliation	University of Manchester	-
isi.contributor.country	England	-
isi.contributor.country	England	-
isi.contributor.country	Italy	-
isi.contributor.country	Italy	-
isi.contributor.country	Italy	-
isi.contributor.country	England	-
isi.contributor.country	Italy	-
isi.contributor.country	Italy	-
isi.contributor.country	England	-
isi.contributor.country	Italy	-
isi.contributor.country	England	-
isi.contributor.country	England	-
isi.contributor.country	Italy	-
isi.contributor.country	England	-
isi.contributor.country	England	-
isi.contributor.name	Paul	-
isi.contributor.name	John	-
isi.contributor.name	Simonetta	-
isi.contributor.name	Nicoletta	-
isi.contributor.name	Riccardo	-
isi.contributor.name	Vivian	-
isi.contributor.name	Simone	-
isi.contributor.name	Monica	-
isi.contributor.name	Piotr	-
isi.contributor.name	Valeria	-
isi.contributor.name	C. J.	-
isi.contributor.name	Yutaka	-
isi.contributor.name	Giulia	-
isi.contributor.name	Dietrich	-
isi.contributor.name	Sophia	-
isi.contributor.researcherId	MLV-9755-2025	-
isi.contributor.researcherId	DHA-6073-2022	-
isi.contributor.researcherId	B-8000-2015	-
isi.contributor.researcherId	B-9275-2008	-
isi.contributor.researcherId	FXJ-7381-2022	-
isi.contributor.researcherId	MKF-0382-2025	-
isi.contributor.researcherId	A-4098-2016	-
isi.contributor.researcherId	F-3077-2015	-
isi.contributor.researcherId	DWG-0320-2022	-
isi.contributor.researcherId	E-7468-2011	-
isi.contributor.researcherId	HWL-1675-2023	-
isi.contributor.researcherId	FYT-2992-2022	-
isi.contributor.researcherId	AAY-3932-2020	-
isi.contributor.researcherId	DYC-4467-2022	-
isi.contributor.researcherId	GBF-3762-2022	-
isi.contributor.subaffiliation	Sch Comp Sci	-
isi.contributor.subaffiliation	Sch Comp Sci	-
isi.contributor.subaffiliation	Ist Linguist Computaz	-
isi.contributor.subaffiliation	Ist Linguist Computaz	-
isi.contributor.subaffiliation	Ist Linguist Computaz	-
isi.contributor.subaffiliation		-
isi.contributor.subaffiliation	Ist Linguist Computaz	-
isi.contributor.subaffiliation	Ist Linguist Computaz	-
isi.contributor.subaffiliation		-
isi.contributor.subaffiliation	Ist Linguist Computaz	-
isi.contributor.subaffiliation	Sch Comp Sci	-
isi.contributor.subaffiliation	Sch Comp Sci	-
isi.contributor.subaffiliation	Ist Linguist Computaz	-
isi.contributor.subaffiliation		-
isi.contributor.subaffiliation	Sch Comp Sci	-
isi.contributor.surname	Thompson	-
isi.contributor.surname	McNaught	-
isi.contributor.surname	Montemagni	-
isi.contributor.surname	Calzolari	-
isi.contributor.surname	del Gratta	-
isi.contributor.surname	Lee	-
isi.contributor.surname	Marchi	-
isi.contributor.surname	Monachini	-
isi.contributor.surname	Pezik	-
isi.contributor.surname	Quochi	-
isi.contributor.surname	Rupp	-
isi.contributor.surname	Sasaki	-
isi.contributor.surname	Venturi	-
isi.contributor.surname	Rebholz-Schuhmann	-
isi.contributor.surname	Ananiadou	-
isi.date.issued	2011	*
isi.description.abstracteng	Background: Due to the rapidly expanding body of biomedical literature, biologists require increasingly sophisticated and efficient systems to help them to search for relevant information. Such systems should account for the multiple written variants used to represent biomedical concepts, and allow the user to search for specific pieces of knowledge (or events) involving these concepts, e. g., protein-protein interactions. Such functionality requires access to detailed information about words used in the biomedical literature. Existing databases and ontologies often have a specific focus and are oriented towards human use. Consequently, biological knowledge is dispersed amongst many resources, which often do not attempt to account for the large and frequently changing set of variants that appear in the literature. Additionally, such resources typically do not provide information about how terms relate to each other in texts to describe events.Results: This article provides an overview of the design, construction and evaluation of a large-scale lexical and conceptual resource for the biomedical domain, the BioLexicon. The resource can be exploited by text mining tools at several levels, e. g., part-of-speech tagging, recognition of biomedical entities, and the extraction of events in which they are involved. As such, the BioLexicon must account for real usage of words in biomedical texts. In particular, the BioLexicon gathers together different types of terms from several existing data resources into a single, unified repository, and augments them with new term variants automatically extracted from biomedical literature. Extraction of events is facilitated through the inclusion of biologically pertinent verbs (around which events are typically organized) together with information about typical patterns of grammatical and semantic behaviour, which are acquired from domain-specific texts. In order to foster interoperability, the BioLexicon is modelled using the Lexical Markup Framework, an ISO standard.Conclusions: The BioLexicon contains over 2.2 M lexical entries and over 1.8 M terminological variants, as well as over 3.3 M semantic relations, including over 2 M synonymy relations. Its exploitation can benefit both application developers and users. We demonstrate some such benefits by describing integration of the resource into a number of different tools, and evaluating improvements in performance that this can bring.	*
isi.description.allpeopleoriginal	Thompson, P; McNaught, J; Montemagni, S; Calzolari, N; del Gratta, R; Lee, V; Marchi, S; Monachini, M; Pezik, P; Quochi, V; Rupp, CJ; Sasaki, Y; Venturi, G; Rebholz-Schuhmann, D; Ananiadou, S;	*
isi.document.sourcetype	WOS.SCI	*
isi.document.type	Article	*
isi.document.types	Article	*
isi.identifier.doi	10.1186/1471-2105-12-397	*
isi.identifier.isi	WOS:000297641800001	*
isi.journal.journaltitle	BMC BIOINFORMATICS	*
isi.journal.journaltitleabbrev	BMC BIOINFORMATICS	*
isi.language.original	English	*
isi.publisher.place	CAMPUS, 4 CRINAN ST, LONDON N1 9XW, ENGLAND	*
isi.relation.volume	12	*
isi.title	The BioLexicon: a large-scale terminological resource for biomedical text mining	*
scopus.authority.ancejournal	BMC BIOINFORMATICS###1471-2105	*
scopus.category	1315	*
scopus.category	1303	*
scopus.category	1312	*
scopus.category	1706	*
scopus.category	2604	*
scopus.contributor.affiliation	University of Manchester	-
scopus.contributor.affiliation	University of Manchester	-
scopus.contributor.affiliation	Istituto di Linguistica Computazionale del CNR	-
scopus.contributor.affiliation	Istituto di Linguistica Computazionale del CNR	-
scopus.contributor.affiliation	Istituto di Linguistica Computazionale del CNR	-
scopus.contributor.affiliation	Wellcome Trust Genome Campus	-
scopus.contributor.affiliation	Istituto di Linguistica Computazionale del CNR	-
scopus.contributor.affiliation	Istituto di Linguistica Computazionale del CNR	-
scopus.contributor.affiliation	Wellcome Trust Genome Campus	-
scopus.contributor.affiliation	Istituto di Linguistica Computazionale del CNR	-
scopus.contributor.affiliation	University of Manchester	-
scopus.contributor.affiliation	Toyota Technological Institute	-
scopus.contributor.affiliation	Istituto di Linguistica Computazionale del CNR	-
scopus.contributor.affiliation	Wellcome Trust Genome Campus	-
scopus.contributor.affiliation	University of Manchester	-
scopus.contributor.afid	60003771	-
scopus.contributor.afid	60003771	-
scopus.contributor.afid	60008941	-
scopus.contributor.afid	60008941	-
scopus.contributor.afid	60008941	-
scopus.contributor.afid	60026124	-
scopus.contributor.afid	60008941	-
scopus.contributor.afid	60008941	-
scopus.contributor.afid	60026124	-
scopus.contributor.afid	60008941	-
scopus.contributor.afid	60003771	-
scopus.contributor.afid	60006081	-
scopus.contributor.afid	60008941	-
scopus.contributor.afid	60026124	-
scopus.contributor.afid	60003771	-
scopus.contributor.auid	57820641400	-
scopus.contributor.auid	22953888200	-
scopus.contributor.auid	15056781100	-
scopus.contributor.auid	8845912500	-
scopus.contributor.auid	34976432900	-
scopus.contributor.auid	36602778700	-
scopus.contributor.auid	27567818000	-
scopus.contributor.auid	23397766600	-
scopus.contributor.auid	24332242800	-
scopus.contributor.auid	34977412400	-
scopus.contributor.auid	37666044700	-
scopus.contributor.auid	35956948800	-
scopus.contributor.auid	27568199800	-
scopus.contributor.auid	6507852707	-
scopus.contributor.auid	6602788919	-
scopus.contributor.country	United Kingdom	-
scopus.contributor.country	United Kingdom	-
scopus.contributor.country	Italy	-
scopus.contributor.country	Italy	-
scopus.contributor.country	Italy	-
scopus.contributor.country	United Kingdom	-
scopus.contributor.country	Italy	-
scopus.contributor.country	Italy	-
scopus.contributor.country	United Kingdom	-
scopus.contributor.country	Italy	-
scopus.contributor.country	United Kingdom	-
scopus.contributor.country	Japan	-
scopus.contributor.country	Italy	-
scopus.contributor.country	United Kingdom	-
scopus.contributor.country	United Kingdom	-
scopus.contributor.dptid	103240669	-
scopus.contributor.dptid	103240669	-
scopus.contributor.dptid		-
scopus.contributor.dptid		-
scopus.contributor.dptid		-
scopus.contributor.dptid		-
scopus.contributor.dptid		-
scopus.contributor.dptid		-
scopus.contributor.dptid		-
scopus.contributor.dptid		-
scopus.contributor.dptid	103240669	-
scopus.contributor.dptid		-
scopus.contributor.dptid		-
scopus.contributor.dptid		-
scopus.contributor.dptid	103240669	-
scopus.contributor.name	Paul	-
scopus.contributor.name	John	-
scopus.contributor.name	Simonetta	-
scopus.contributor.name	Nicoletta	-
scopus.contributor.name	Riccardo	-
scopus.contributor.name	Vivian	-
scopus.contributor.name	Simone	-
scopus.contributor.name	Monica	-
scopus.contributor.name	Piotr	-
scopus.contributor.name	Valeria	-
scopus.contributor.name	CJ	-
scopus.contributor.name	Yutaka	-
scopus.contributor.name	Giulia	-
scopus.contributor.name	Dietrich	-
scopus.contributor.name	Sophia	-
scopus.contributor.subaffiliation	Manchester Interdisciplinary Biocentre;	-
scopus.contributor.subaffiliation	Manchester Interdisciplinary Biocentre;	-
scopus.contributor.subaffiliation		-
scopus.contributor.subaffiliation		-
scopus.contributor.subaffiliation		-
scopus.contributor.subaffiliation	European Bioinformatics Institute;	-
scopus.contributor.subaffiliation		-
scopus.contributor.subaffiliation		-
scopus.contributor.subaffiliation	European Bioinformatics Institute;	-
scopus.contributor.subaffiliation		-
scopus.contributor.subaffiliation	Manchester Interdisciplinary Biocentre;	-
scopus.contributor.subaffiliation		-
scopus.contributor.subaffiliation		-
scopus.contributor.subaffiliation	European Bioinformatics Institute;	-
scopus.contributor.subaffiliation	Manchester Interdisciplinary Biocentre;	-
scopus.contributor.surname	Thompson	-
scopus.contributor.surname	McNaught	-
scopus.contributor.surname	Montemagni	-
scopus.contributor.surname	Calzolari	-
scopus.contributor.surname	del Gratta	-
scopus.contributor.surname	Lee	-
scopus.contributor.surname	Marchi	-
scopus.contributor.surname	Monachini	-
scopus.contributor.surname	Pezik	-
scopus.contributor.surname	Quochi	-
scopus.contributor.surname	Rupp	-
scopus.contributor.surname	Sasaki	-
scopus.contributor.surname	Venturi	-
scopus.contributor.surname	Rebholz-Schuhmann	-
scopus.contributor.surname	Ananiadou	-
scopus.date.issued	2011	*
scopus.description.abstracteng	Background: Due to the rapidly expanding body of biomedical literature, biologists require increasingly sophisticated and efficient systems to help them to search for relevant information. Such systems should account for the multiple written variants used to represent biomedical concepts, and allow the user to search for specific pieces of knowledge (or events) involving these concepts, e.g., protein-protein interactions. Such functionality requires access to detailed information about words used in the biomedical literature. Existing databases and ontologies often have a specific focus and are oriented towards human use. Consequently, biological knowledge is dispersed amongst many resources, which often do not attempt to account for the large and frequently changing set of variants that appear in the literature. Additionally, such resources typically do not provide information about how terms relate to each other in texts to describe events.Results: This article provides an overview of the design, construction and evaluation of a large-scale lexical and conceptual resource for the biomedical domain, the BioLexicon. The resource can be exploited by text mining tools at several levels, e.g., part-of-speech tagging, recognition of biomedical entities, and the extraction of events in which they are involved. As such, the BioLexicon must account for real usage of words in biomedical texts. In particular, the BioLexicon gathers together different types of terms from several existing data resources into a single, unified repository, and augments them with new term variants automatically extracted from biomedical literature. Extraction of events is facilitated through the inclusion of biologically pertinent verbs (around which events are typically organized) together with information about typical patterns of grammatical and semantic behaviour, which are acquired from domain-specific texts. In order to foster interoperability, the BioLexicon is modelled using the Lexical Markup Framework, an ISO standard.Conclusions: The BioLexicon contains over 2.2 M lexical entries and over 1.8 M terminological variants, as well as over 3.3 M semantic relations, including over 2 M synonymy relations. Its exploitation can benefit both application developers and users. We demonstrate some such benefits by describing integration of the resource into a number of different tools, and evaluating improvements in performance that this can bring. © 2011 Thompson et al; licensee BioMed Central Ltd.	*
scopus.description.allpeopleoriginal	Thompson P.; McNaught J.; Montemagni S.; Calzolari N.; del Gratta R.; Lee V.; Marchi S.; Monachini M.; Pezik P.; Quochi V.; Rupp C.J.; Sasaki Y.; Venturi G.; Rebholz-Schuhmann D.; Ananiadou S.	*
scopus.differences	scopus.description.allpeopleoriginal	*
scopus.differences	scopus.description.abstracteng	*
scopus.differences	scopus.language.iso	*
scopus.document.type	ar	*
scopus.document.types	ar	*
scopus.funding.funders	501100000276 - Department of Health and Social Care; 501100000265 - Medical Research Council; 501100000272 - National Institute for Health Research; 100010269 - Wellcome Trust; 501100000289 - Cancer Research UK; 501100000274 - British Heart Foundation; 501100000589 - Chief Scientist Office; 100014013 - UK Research and Innovation; 501100000268 - Biotechnology and Biological Sciences Research Council; 501100000268 - Biotechnology and Biological Sciences Research Council; 501100000780 - European Commission; 501100000780 - European Commission;	*
scopus.funding.ids	BB/G013160/1; FP6-028099;	*
scopus.identifier.doi	10.1186/1471-2105-12-397	*
scopus.identifier.eissn	1471-2105	*
scopus.identifier.pmid	21992002	*
scopus.identifier.pui	51667516	*
scopus.identifier.scopus	2-s2.0-80053915290	*
scopus.journal.sourceid	17929	*
scopus.language.iso	eng	*
scopus.relation.article	397	*
scopus.relation.volume	12	*
scopus.title	The BioLexicon: A large-scale terminological resource for biomedical text mining	*
scopus.titleeng	The BioLexicon: A large-scale terminological resource for biomedical text mining	*
Appare nelle tipologie:	01.01 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/175344

Citazioni

ND

50

39

social impact