This paper describes the design, implementation and population of a lexical resource for biology and bioinformatics (the BioLexicon) developed within an ongoing European project. The aim of this project is text-based knowledge harvesting for support to information extraction and text mining in the biomedical domain. The BioLexicon is a large-scale lexical-terminological resource encoding different information types in one single integrated resource. In the design of the resource we follow the ISO/DIS 24613 "Lexical Mark-up Framework" standard, which ensures reusability of the information encoded and easy exchange of both data and architecture. The design of the resource also takes into account the needs of our text mining partners who automatically extract syntactic and semantic information from texts and feed it to the lexicon. The present contribution first describes in detail the model of the BioLexicon along its three main layers: morphology, syntax and semantics; then, it briefly describes the database implementation of the model and the population strategy followed within the project, together with an example. The BioLexicon database in fact comes equipped with automatic uploading procedures based on a common exchange XML format, which guarantees that the lexicon can be properly populated with data coming from different sources.

A lexicon for biology and bioinformatics: the BOOTStrep experience

Quochi V;Monachini M;Del Gratta R;
2008

Abstract

This paper describes the design, implementation and population of a lexical resource for biology and bioinformatics (the BioLexicon) developed within an ongoing European project. The aim of this project is text-based knowledge harvesting for support to information extraction and text mining in the biomedical domain. The BioLexicon is a large-scale lexical-terminological resource encoding different information types in one single integrated resource. In the design of the resource we follow the ISO/DIS 24613 "Lexical Mark-up Framework" standard, which ensures reusability of the information encoded and easy exchange of both data and architecture. The design of the resource also takes into account the needs of our text mining partners who automatically extract syntactic and semantic information from texts and feed it to the lexicon. The present contribution first describes in detail the model of the BioLexicon along its three main layers: morphology, syntax and semantics; then, it briefly describes the database implementation of the model and the population strategy followed within the project, together with an example. The BioLexicon database in fact comes equipped with automatic uploading procedures based on a common exchange XML format, which guarantees that the lexicon can be properly populated with data coming from different sources.
Campo DC Valore Lingua
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC en
dc.authority.people Quochi V en
dc.authority.people Monachini M en
dc.authority.people Del Gratta R en
dc.authority.people Calzolari N en
dc.collection.id.s 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d *
dc.collection.name 04.01 Contributo in Atti di convegno *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.date.accessioned 2024/02/19 19:38:11 -
dc.date.available 2024/02/19 19:38:11 -
dc.date.firstsubmission 2024/10/02 15:55:20 *
dc.date.issued 2008 -
dc.date.submission 2024/12/06 16:43:49 *
dc.description.abstract This paper describes the design, implementation and population of a lexical resource for biology and bioinformatics (the BioLexicon) developed within an ongoing European project. The aim of this project is text-based knowledge harvesting for support to information extraction and text mining in the biomedical domain. The BioLexicon is a large-scale lexical-terminological resource encoding different information types in one single integrated resource. In the design of the resource we follow the ISO/DIS 24613 "Lexical Mark-up Framework" standard, which ensures reusability of the information encoded and easy exchange of both data and architecture. The design of the resource also takes into account the needs of our text mining partners who automatically extract syntactic and semantic information from texts and feed it to the lexicon. The present contribution first describes in detail the model of the BioLexicon along its three main layers: morphology, syntax and semantics; then, it briefly describes the database implementation of the model and the population strategy followed within the project, together with an example. The BioLexicon database in fact comes equipped with automatic uploading procedures based on a common exchange XML format, which guarantees that the lexicon can be properly populated with data coming from different sources. -
dc.description.affiliations Istituto di Linguistica Computazionale "A. Zampolli" -
dc.description.allpeople Quochi, V; Monachini, M; Del Gratta, R; Calzolari, N -
dc.description.allpeopleoriginal Quochi V.; Monachini M.; Del Gratta R.; Calzolari N. en
dc.description.fulltext open en
dc.description.numberofauthors 4 -
dc.identifier.isbn 2-9517408-4-0 en
dc.identifier.isi WOS:000324028902062 en
dc.identifier.scopus 2-s2.0-84874250555 en
dc.identifier.uri https://hdl.handle.net/20.500.14243/65076 -
dc.identifier.url http://www.lrec-conf.org/proceedings/lrec2008/pdf/576_paper.pdf en
dc.language.iso eng en
dc.miur.last.status.update 2024-10-02T13:51:20Z *
dc.publisher.country FRA en
dc.publisher.name European Language Resources Association ELRA en
dc.publisher.place Paris en
dc.relation.conferencedate 26-05/1-06-2008 en
dc.relation.conferencename LREC 2008, Sixth International Conference on Language Resources and Evaluation en
dc.relation.conferenceplace Marrakech, Marocco en
dc.relation.firstpage 2285 en
dc.relation.ispartofbook LREC 2008, Sixth International Conference on Language Resources and Evaluation en
dc.relation.lastpage 2292 en
dc.relation.numberofpages 8 en
dc.subject.keywordseng Lexicon -
dc.subject.keywordseng Ontologies -
dc.subject.keywordseng Lexical database -
dc.subject.singlekeyword Lexicon *
dc.subject.singlekeyword Ontologies *
dc.subject.singlekeyword Lexical database *
dc.title A lexicon for biology and bioinformatics: the BOOTStrep experience en
dc.type.driver info:eu-repo/semantics/conferenceObject -
dc.type.full 04 Contributo in convegno::04.01 Contributo in Atti di convegno it
dc.type.miur 273 -
dc.type.referee Sì, ma tipo non specificato en
dc.ugov.descaux1 84700 -
iris.isi.metadataErrorDescription 0 -
iris.isi.metadataErrorType ERROR_NO_MATCH -
iris.isi.metadataStatus ERROR -
iris.mediafilter.data 2025/04/02 00:20:29 *
iris.orcid.lastModifiedDate 2024/12/16 17:20:51 *
iris.orcid.lastModifiedMillisecond 1734366051218 *
iris.scopus.extIssued 2008 -
iris.scopus.extTitle A Lexicon for biology and bioinformatics: The BOOTStrep experience -
iris.scopus.ideLinkStatusDate 2024/04/10 09:22:16 *
iris.scopus.ideLinkStatusMillisecond 1712733736255 *
iris.sitodocente.maxattempts 1 -
scopus.category 1203 *
scopus.category 3304 *
scopus.category 3310 *
scopus.category 3309 *
scopus.contributor.affiliation CNR -
scopus.contributor.affiliation CNR -
scopus.contributor.affiliation CNR -
scopus.contributor.affiliation CNR -
scopus.contributor.afid 60008941 -
scopus.contributor.afid 60008941 -
scopus.contributor.afid 60008941 -
scopus.contributor.afid 60008941 -
scopus.contributor.auid 34977412400 -
scopus.contributor.auid 23397766600 -
scopus.contributor.auid 34976432900 -
scopus.contributor.auid 8845912500 -
scopus.contributor.country Italy -
scopus.contributor.country Italy -
scopus.contributor.country Italy -
scopus.contributor.country Italy -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.name Valeria -
scopus.contributor.name Monica -
scopus.contributor.name Riccardo -
scopus.contributor.name Nicoletta -
scopus.contributor.subaffiliation Istituto di Linguistica Computazionale; -
scopus.contributor.subaffiliation Istituto di Linguistica Computazionale; -
scopus.contributor.subaffiliation Istituto di Linguistica Computazionale; -
scopus.contributor.subaffiliation Istituto di Linguistica Computazionale; -
scopus.contributor.surname Quochi -
scopus.contributor.surname Monachini -
scopus.contributor.surname Del Gratta -
scopus.contributor.surname Calzolari -
scopus.date.issued 2008 *
scopus.description.abstract This paper describes the design, implementation and population of a lexical resource for biology and bioinformatics (the BioLexicon) developed within an ongoing European project. The aim of this project is text-based knowledge harvesting for support to information extraction and text mining in the biomedical domain. The BioLexicon is a large-scale lexical-terminological resource encoding different information types in one single integrated resource. In the design of the resource we follow the ISO/DIS 24613 "Lexical Mark-up Framework" standard, which ensures reusability of the information encoded and easy exchange of both data and architecture. The design of the resource also takes into account the needs of our text mining partners who automatically extract syntactic and semantic information from texts and feed it to the lexicon. The present contribution first describes in detail the model of the BioLexicon along its three main layers: morphology, syntax and semantics; then, it briefly describes the database implementation of the model and the population strategy followed within the project, together with an example. The BioLexicon database in fact comes equipped with automatic uploading procedures based on a common exchange XML format, which guarantees that the lexicon can be properly populated with data coming from different sources. *
scopus.description.allpeopleoriginal Quochi V.; Monachini M.; Del Gratta R.; Calzolari N. *
scopus.differences scopus.relation.conferencename *
scopus.differences scopus.publisher.name *
scopus.differences scopus.relation.conferencedate *
scopus.differences scopus.identifier.isbn *
scopus.differences scopus.relation.conferenceplace *
scopus.document.type cp *
scopus.document.types cp *
scopus.funding.funders 501100000780 - European Commission; *
scopus.funding.ids FP6-028099; *
scopus.identifier.isbn 9782951740846 *
scopus.identifier.pui 619617295 *
scopus.identifier.scopus 2-s2.0-84874250555 *
scopus.journal.sourceid 21100842264 *
scopus.language.iso eng *
scopus.publisher.name European Language Resources Association (ELRA) *
scopus.relation.conferencedate 2008 *
scopus.relation.conferencename 6th International Conference on Language Resources and Evaluation, LREC 2008 *
scopus.relation.conferenceplace Palais des Congres Mansour Eddahbi, mar *
scopus.relation.firstpage 2285 *
scopus.relation.lastpage 2292 *
scopus.title A Lexicon for biology and bioinformatics: The BOOTStrep experience *
scopus.titleeng A Lexicon for biology and bioinformatics: The BOOTStrep experience *
Appare nelle tipologie: 04.01 Contributo in Atti di convegno
File in questo prodotto:
File Dimensione Formato  
prod_84700-doc_85050.pdf

accesso aperto

Descrizione: A lexicon for biology and bioinformatics: the BOOTStrep experience
Licenza: Creative commons
Dimensione 485.13 kB
Formato Adobe PDF
485.13 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/65076
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 11
  • ???jsp.display-item.citation.isi??? 3
social impact