CNR Institutional Research Information System

This paper describes the design, implementation and population of a lexical resource for biology and bioinformatics (the BioLexicon) developed within an ongoing European project. The aim of this project is text-based knowledge harvesting for support to information extraction and text mining in the biomedical domain. The BioLexicon is a large-scale lexical-terminological resource encoding different information types in one single integrated resource. In the design of the resource we follow the ISO/DIS 24613 "Lexical Mark-up Framework" standard, which ensures reusability of the information encoded and easy exchange of both data and architecture. The design of the resource also takes into account the needs of our text mining partners who automatically extract syntactic and semantic information from texts and feed it to the lexicon. The present contribution first describes in detail the model of the BioLexicon along its three main layers: morphology, syntax and semantics; then, it briefly describes the database implementation of the model and the population strategy followed within the project, together with an example. The BioLexicon database in fact comes equipped with automatic uploading procedures based on a common exchange XML format, which guarantees that the lexicon can be properly populated with data coming from different sources.

A lexicon for biology and bioinformatics: the BOOTStrep experience

Quochi V;Monachini M;Del Gratta R;Calzolari N

2008

Abstract

This paper describes the design, implementation and population of a lexical resource for biology and bioinformatics (the BioLexicon) developed within an ongoing European project. The aim of this project is text-based knowledge harvesting for support to information extraction and text mining in the biomedical domain. The BioLexicon is a large-scale lexical-terminological resource encoding different information types in one single integrated resource. In the design of the resource we follow the ISO/DIS 24613 "Lexical Mark-up Framework" standard, which ensures reusability of the information encoded and easy exchange of both data and architecture. The design of the resource also takes into account the needs of our text mining partners who automatically extract syntactic and semantic information from texts and feed it to the lexicon. The present contribution first describes in detail the model of the BioLexicon along its three main layers: morphology, syntax and semantics; then, it briefly describes the database implementation of the model and the population strategy followed within the project, together with an example. The BioLexicon database in fact comes equipped with automatic uploading procedures based on a common exchange XML format, which guarantees that the lexicon can be properly populated with data coming from different sources.

Scheda breve

Scheda completa

Scheda completa (DC)

Campo DC	Valore	Lingua
dc.authority.orgunit	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	en
dc.authority.people	Quochi V	en
dc.authority.people	Monachini M	en
dc.authority.people	Del Gratta R	en
dc.authority.people	Calzolari N	en
dc.collection.id.s	71c7200a-7c5f-4e83-8d57-d3d2ba88f40d	*
dc.collection.name	04.01 Contributo in Atti di convegno	*
dc.contributor.appartenenza	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	*
dc.contributor.appartenenza.mi	918	*
dc.date.accessioned	2024/02/19 19:38:11	-
dc.date.available	2024/02/19 19:38:11	-
dc.date.firstsubmission	2024/10/02 15:55:20	*
dc.date.issued	2008	-
dc.date.submission	2024/12/06 16:43:49	*
dc.description.abstract	This paper describes the design, implementation and population of a lexical resource for biology and bioinformatics (the BioLexicon) developed within an ongoing European project. The aim of this project is text-based knowledge harvesting for support to information extraction and text mining in the biomedical domain. The BioLexicon is a large-scale lexical-terminological resource encoding different information types in one single integrated resource. In the design of the resource we follow the ISO/DIS 24613 "Lexical Mark-up Framework" standard, which ensures reusability of the information encoded and easy exchange of both data and architecture. The design of the resource also takes into account the needs of our text mining partners who automatically extract syntactic and semantic information from texts and feed it to the lexicon. The present contribution first describes in detail the model of the BioLexicon along its three main layers: morphology, syntax and semantics; then, it briefly describes the database implementation of the model and the population strategy followed within the project, together with an example. The BioLexicon database in fact comes equipped with automatic uploading procedures based on a common exchange XML format, which guarantees that the lexicon can be properly populated with data coming from different sources.	-
dc.description.affiliations	Istituto di Linguistica Computazionale "A. Zampolli"	-
dc.description.allpeople	Quochi, V; Monachini, M; Del Gratta, R; Calzolari, N	-
dc.description.allpeopleoriginal	Quochi V.; Monachini M.; Del Gratta R.; Calzolari N.	en
dc.description.fulltext	open	en
dc.description.numberofauthors	4	-
dc.identifier.isbn	2-9517408-4-0	en
dc.identifier.isi	WOS:000324028902062	en
dc.identifier.scopus	2-s2.0-84874250555	en
dc.identifier.uri	https://hdl.handle.net/20.500.14243/65076	-
dc.identifier.url	http://www.lrec-conf.org/proceedings/lrec2008/pdf/576_paper.pdf	en
dc.language.iso	eng	en
dc.miur.last.status.update	2024-10-02T13:51:20Z	*
dc.publisher.country	FRA	en
dc.publisher.name	European Language Resources Association ELRA	en
dc.publisher.place	Paris	en
dc.relation.conferencedate	26-05/1-06-2008	en
dc.relation.conferencename	LREC 2008, Sixth International Conference on Language Resources and Evaluation	en
dc.relation.conferenceplace	Marrakech, Marocco	en
dc.relation.firstpage	2285	en
dc.relation.ispartofbook	LREC 2008, Sixth International Conference on Language Resources and Evaluation	en
dc.relation.lastpage	2292	en
dc.relation.numberofpages	8	en
dc.subject.keywordseng	Lexicon	-
dc.subject.keywordseng	Ontologies	-
dc.subject.keywordseng	Lexical database	-
dc.subject.singlekeyword	Lexicon	*
dc.subject.singlekeyword	Ontologies	*
dc.subject.singlekeyword	Lexical database	*
dc.title	A lexicon for biology and bioinformatics: the BOOTStrep experience	en
dc.type.driver	info:eu-repo/semantics/conferenceObject	-
dc.type.full	04 Contributo in convegno::04.01 Contributo in Atti di convegno	it
dc.type.miur	273	-
dc.type.referee	Sì, ma tipo non specificato	en
dc.ugov.descaux1	84700	-
iris.isi.extIssued	2008	-
iris.isi.extTitle	A Lexicon for Biology and Bioinformatics: The BOOTStrep Experience	-
iris.mediafilter.data	2025/04/02 00:20:29	*
iris.orcid.lastModifiedDate	2024/12/16 17:20:51	*
iris.orcid.lastModifiedMillisecond	1734366051218	*
iris.scopus.extIssued	2008	-
iris.scopus.extTitle	A Lexicon for biology and bioinformatics: The BOOTStrep experience	-
iris.scopus.ideLinkStatusDate	2024/04/10 09:22:16	*
iris.scopus.ideLinkStatusMillisecond	1712733736255	*
iris.sitodocente.maxattempts	1	-
isi.category	OT	*
isi.contributor.affiliation	Consiglio Nazionale delle Ricerche (CNR)	-
isi.contributor.affiliation	Consiglio Nazionale delle Ricerche (CNR)	-
isi.contributor.affiliation	Consiglio Nazionale delle Ricerche (CNR)	-
isi.contributor.affiliation	Consiglio Nazionale delle Ricerche (CNR)	-
isi.contributor.country	Italy	-
isi.contributor.country	Italy	-
isi.contributor.country	Italy	-
isi.contributor.country	Italy	-
isi.contributor.name	Valeria	-
isi.contributor.name	Monica	-
isi.contributor.name	Riccardo	-
isi.contributor.name	Nicoletta	-
isi.contributor.researcherId	E-7468-2011	-
isi.contributor.researcherId	F-3077-2015	-
isi.contributor.researcherId	FXJ-7381-2022	-
isi.contributor.researcherId	B-9275-2008	-
isi.contributor.subaffiliation	Ist Linguist Computaz	-
isi.contributor.subaffiliation	Ist Linguist Computaz	-
isi.contributor.subaffiliation	Ist Linguist Computaz	-
isi.contributor.subaffiliation	Ist Linguist Computaz	-
isi.contributor.surname	Quochi	-
isi.contributor.surname	Monachini	-
isi.contributor.surname	Del Gratta	-
isi.contributor.surname	Calzolari	-
isi.date.issued	2008	*
isi.description.allpeopleoriginal	Quochi, V; Monachini, M; Del Gratta, R; Calzolari, N;	*
isi.document.sourcetype	WOS.ISSHP	*
isi.document.type	Proceedings Paper	*
isi.document.types	Proceedings Paper	*
isi.identifier.isi	WOS:000324028902062	*
isi.journal.journaltitle	SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008	*
isi.language.original	English	*
isi.publisher.place	55-57, RUE BRILLAT-SAVARIN, PARIS, 75013, FRANCE	*
isi.relation.firstpage	2285	*
isi.relation.lastpage	2292	*
isi.title	A Lexicon for Biology and Bioinformatics: The BOOTStrep Experience	*
scopus.category	1203	*
scopus.category	3304	*
scopus.category	3310	*
scopus.category	3309	*
scopus.contributor.affiliation	CNR	-
scopus.contributor.affiliation	CNR	-
scopus.contributor.affiliation	CNR	-
scopus.contributor.affiliation	CNR	-
scopus.contributor.afid	60008941	-
scopus.contributor.afid	60008941	-
scopus.contributor.afid	60008941	-
scopus.contributor.afid	60008941	-
scopus.contributor.auid	34977412400	-
scopus.contributor.auid	23397766600	-
scopus.contributor.auid	34976432900	-
scopus.contributor.auid	8845912500	-
scopus.contributor.country	Italy	-
scopus.contributor.country	Italy	-
scopus.contributor.country	Italy	-
scopus.contributor.country	Italy	-
scopus.contributor.dptid		-
scopus.contributor.dptid		-
scopus.contributor.dptid		-
scopus.contributor.dptid		-
scopus.contributor.name	Valeria	-
scopus.contributor.name	Monica	-
scopus.contributor.name	Riccardo	-
scopus.contributor.name	Nicoletta	-
scopus.contributor.subaffiliation	Istituto di Linguistica Computazionale;	-
scopus.contributor.subaffiliation	Istituto di Linguistica Computazionale;	-
scopus.contributor.subaffiliation	Istituto di Linguistica Computazionale;	-
scopus.contributor.subaffiliation	Istituto di Linguistica Computazionale;	-
scopus.contributor.surname	Quochi	-
scopus.contributor.surname	Monachini	-
scopus.contributor.surname	Del Gratta	-
scopus.contributor.surname	Calzolari	-
scopus.date.issued	2008	*
scopus.description.abstract	This paper describes the design, implementation and population of a lexical resource for biology and bioinformatics (the BioLexicon) developed within an ongoing European project. The aim of this project is text-based knowledge harvesting for support to information extraction and text mining in the biomedical domain. The BioLexicon is a large-scale lexical-terminological resource encoding different information types in one single integrated resource. In the design of the resource we follow the ISO/DIS 24613 "Lexical Mark-up Framework" standard, which ensures reusability of the information encoded and easy exchange of both data and architecture. The design of the resource also takes into account the needs of our text mining partners who automatically extract syntactic and semantic information from texts and feed it to the lexicon. The present contribution first describes in detail the model of the BioLexicon along its three main layers: morphology, syntax and semantics; then, it briefly describes the database implementation of the model and the population strategy followed within the project, together with an example. The BioLexicon database in fact comes equipped with automatic uploading procedures based on a common exchange XML format, which guarantees that the lexicon can be properly populated with data coming from different sources.	*
scopus.description.allpeopleoriginal	Quochi V.; Monachini M.; Del Gratta R.; Calzolari N.	*
scopus.differences	scopus.relation.conferencename	*
scopus.differences	scopus.publisher.name	*
scopus.differences	scopus.relation.conferencedate	*
scopus.differences	scopus.identifier.isbn	*
scopus.differences	scopus.relation.conferenceplace	*
scopus.document.type	cp	*
scopus.document.types	cp	*
scopus.funding.funders	501100000780 - European Commission;	*
scopus.funding.ids	FP6-028099;	*
scopus.identifier.isbn	9782951740846	*
scopus.identifier.pui	619617295	*
scopus.identifier.scopus	2-s2.0-84874250555	*
scopus.journal.sourceid	21100842264	*
scopus.language.iso	eng	*
scopus.publisher.name	European Language Resources Association (ELRA)	*
scopus.relation.conferencedate	2008	*
scopus.relation.conferencename	6th International Conference on Language Resources and Evaluation, LREC 2008	*
scopus.relation.conferenceplace	Palais des Congres Mansour Eddahbi, mar	*
scopus.relation.firstpage	2285	*
scopus.relation.lastpage	2292	*
scopus.title	A Lexicon for biology and bioinformatics: The BOOTStrep experience	*
scopus.titleeng	A Lexicon for biology and bioinformatics: The BOOTStrep experience	*
Appare nelle tipologie:	04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
prod_84700-doc_85050.pdf accesso aperto Descrizione: A lexicon for biology and bioinformatics: the BOOTStrep experience Licenza: Creative commons Dimensione 485.13 kB Formato Adobe PDF Visualizza/Apri	485.13 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/65076

Citazioni

ND

11

3

social impact