This paper proposes to advance in the current state-of-the-art of automatic Language Resource (LR) building by taking into consideration three elements: (1) the knowledge available in existing LRs, (2) the vast amount of information available from the collaborative paradigm that has emerged from the Web 2.0 and (3) the use of standards to improve interoperability. We present a case study in which a set of LRs for different languages (WordNet for English and Spanish and Parole-Simple-Clips for Italian) are extended with Named Entities (NE) by exploiting Wikipedia and the aforementioned LRs. The practical result is a multilingual NE lexicon connected to these LRs and to two ontologies: SUMO and SIMPLE. Furthermore, the paper addresses an important problem which affects the Computational Linguistics area in the present, interoperability, by making use of the ISO LMF standard to encode this lexicon. The different steps of the procedure (mapping, disambiguation, extraction, NE identification and postprocessing) are comprehensively explained and evaluated. The resulting resource contains 974,567, 137,583 and 125,806 NEs for English, Spanish and Italian respectively. Finally, in order to check the usefulness of the constructed resource, we apply it into a stateof-the-art Question Answering system and evaluate its impact; the NE lexicon improves the system's accuracy by 28.1%. Compared to previous approaches to build NE repositories, the current proposal represents a step forward in terms of automation, language independence, amount of NEs acquired and richness of the information represented.

Web 2.0, Language Resources and standards to automatically build a multilingual Named Entity Lexicon

Monachini Monica;
2012

Abstract

This paper proposes to advance in the current state-of-the-art of automatic Language Resource (LR) building by taking into consideration three elements: (1) the knowledge available in existing LRs, (2) the vast amount of information available from the collaborative paradigm that has emerged from the Web 2.0 and (3) the use of standards to improve interoperability. We present a case study in which a set of LRs for different languages (WordNet for English and Spanish and Parole-Simple-Clips for Italian) are extended with Named Entities (NE) by exploiting Wikipedia and the aforementioned LRs. The practical result is a multilingual NE lexicon connected to these LRs and to two ontologies: SUMO and SIMPLE. Furthermore, the paper addresses an important problem which affects the Computational Linguistics area in the present, interoperability, by making use of the ISO LMF standard to encode this lexicon. The different steps of the procedure (mapping, disambiguation, extraction, NE identification and postprocessing) are comprehensively explained and evaluated. The resulting resource contains 974,567, 137,583 and 125,806 NEs for English, Spanish and Italian respectively. Finally, in order to check the usefulness of the constructed resource, we apply it into a stateof-the-art Question Answering system and evaluate its impact; the NE lexicon improves the system's accuracy by 28.1%. Compared to previous approaches to build NE repositories, the current proposal represents a step forward in terms of automation, language independence, amount of NEs acquired and richness of the information represented.
Campo DC Valore Lingua
dc.authority.ancejournal LANGUAGE RESOURCES AND EVALUATION -
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC -
dc.authority.people Toral Antonio it
dc.authority.people Ferrández Sergio it
dc.authority.people Monachini Monica it
dc.authority.people Munoz Rafael it
dc.collection.id.s b3f88f24-048a-4e43-8ab1-6697b90e068e *
dc.collection.name 01.01 Articolo in rivista *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.date.accessioned 2024/02/16 01:39:29 -
dc.date.available 2024/02/16 01:39:29 -
dc.date.issued 2012 -
dc.description.abstracteng This paper proposes to advance in the current state-of-the-art of automatic Language Resource (LR) building by taking into consideration three elements: (1) the knowledge available in existing LRs, (2) the vast amount of information available from the collaborative paradigm that has emerged from the Web 2.0 and (3) the use of standards to improve interoperability. We present a case study in which a set of LRs for different languages (WordNet for English and Spanish and Parole-Simple-Clips for Italian) are extended with Named Entities (NE) by exploiting Wikipedia and the aforementioned LRs. The practical result is a multilingual NE lexicon connected to these LRs and to two ontologies: SUMO and SIMPLE. Furthermore, the paper addresses an important problem which affects the Computational Linguistics area in the present, interoperability, by making use of the ISO LMF standard to encode this lexicon. The different steps of the procedure (mapping, disambiguation, extraction, NE identification and postprocessing) are comprehensively explained and evaluated. The resulting resource contains 974,567, 137,583 and 125,806 NEs for English, Spanish and Italian respectively. Finally, in order to check the usefulness of the constructed resource, we apply it into a stateof-the-art Question Answering system and evaluate its impact; the NE lexicon improves the system's accuracy by 28.1%. Compared to previous approaches to build NE repositories, the current proposal represents a step forward in terms of automation, language independence, amount of NEs acquired and richness of the information represented. -
dc.description.affiliations [1] CNR-ILC, Pisa; [2] Dublin City University, Ireland; [3] University of Alicante, Spain -
dc.description.allpeople Toral, Antonio; Ferrández, Sergio; Monachini, Monica; Munoz, Rafael -
dc.description.allpeopleoriginal Toral, Antonio [2]; Ferrández, Sergio [3]; Monachini, Monica [1]; Munoz, Rafael [3] -
dc.description.fulltext none en
dc.description.note ID_PUMA: /cnr.ilc/2012-A0-017 -
dc.description.numberofauthors 4 -
dc.identifier.doi 10.1007/s10579-011-9148-x -
dc.identifier.isi WOS:000310164600002 -
dc.identifier.scopus 2-s2.0-84867866200 -
dc.identifier.uri https://hdl.handle.net/20.500.14243/4454 -
dc.identifier.url http://link.springer.com/content/pdf/10.1007%2Fs10579-011-9148-x.pdf -
dc.language.iso eng -
dc.relation.firstpage 383 -
dc.relation.issue 3 -
dc.relation.lastpage 419 -
dc.relation.volume 46 -
dc.subject.keywords Language Resources; Named Entities; Web 2.0; Standards -
dc.subject.singlekeyword Language Resources *
dc.subject.singlekeyword Named Entities *
dc.subject.singlekeyword Web 2.0 *
dc.subject.singlekeyword Standards *
dc.title Web 2.0, Language Resources and standards to automatically build a multilingual Named Entity Lexicon en
dc.type.driver info:eu-repo/semantics/article -
dc.type.full 01 Contributo su Rivista::01.01 Articolo in rivista it
dc.type.miur 262 -
dc.type.referee Sì, ma tipo non specificato -
dc.ugov.descaux1 218786 -
iris.isi.extIssued 2012 -
iris.isi.extTitle Web 2.0, Language Resources and standards to automatically build a multilingual Named Entity Lexicon -
iris.orcid.lastModifiedDate 2024/04/04 13:08:40 *
iris.orcid.lastModifiedMillisecond 1712228920819 *
iris.scopus.extIssued 2012 -
iris.scopus.extTitle Web 2.0, Language Resources and standards to automatically build a multilingual Named Entity Lexicon -
iris.sitodocente.maxattempts 4 -
iris.unpaywall.doi 10.1007/s10579-011-9148-x *
iris.unpaywall.isoa false *
iris.unpaywall.journalisindoaj false *
iris.unpaywall.metadataCallLastModified 03/04/2026 04:36:20 -
iris.unpaywall.metadataCallLastModifiedMillisecond 1775183780743 -
iris.unpaywall.oastatus closed *
isi.authority.ancejournal LANGUAGE RESOURCES AND EVALUATION###1574-020X *
isi.category EV *
isi.contributor.affiliation Dublin City University -
isi.contributor.affiliation Universitat d'Alacant -
isi.contributor.affiliation Consiglio Nazionale delle Ricerche (CNR) -
isi.contributor.affiliation Universitat d'Alacant -
isi.contributor.country Ireland -
isi.contributor.country Spain -
isi.contributor.country Italy -
isi.contributor.country Spain -
isi.contributor.name Antonio -
isi.contributor.name Sergio -
isi.contributor.name Monica -
isi.contributor.name Rafael -
isi.contributor.researcherId OQJ-6695-2025 -
isi.contributor.researcherId GER-7675-2022 -
isi.contributor.researcherId F-3077-2015 -
isi.contributor.researcherId H-3101-2015 -
isi.contributor.subaffiliation NCLT -
isi.contributor.subaffiliation Nat Language Proc & Informat Syst Grp -
isi.contributor.subaffiliation Ist Linguist Computaz -
isi.contributor.subaffiliation Nat Language Proc & Informat Syst Grp -
isi.contributor.surname Toral -
isi.contributor.surname Ferrandez -
isi.contributor.surname Monachini -
isi.contributor.surname Munoz -
isi.date.issued 2012 *
isi.description.abstracteng This paper proposes to advance in the current state-of-the-art of automatic Language Resource (LR) building by taking into consideration three elements: (1) the knowledge available in existing LRs, (2) the vast amount of information available from the collaborative paradigm that has emerged from the Web 2.0 and (3) the use of standards to improve interoperability. We present a case study in which a set of LRs for different languages (WordNet for English and Spanish and Parole-Simple-Clips for Italian) are extended with Named Entities (NE) by exploiting Wikipedia and the aforementioned LRs. The practical result is a multilingual NE lexicon connected to these LRs and to two ontologies: SUMO and SIMPLE. Furthermore, the paper addresses an important problem which affects the Computational Linguistics area in the present, interoperability, by making use of the ISO LMF standard to encode this lexicon. The different steps of the procedure (mapping, disambiguation, extraction, NE identification and postprocessing) are comprehensively explained and evaluated. The resulting resource contains 974,567, 137,583 and 125,806 NEs for English, Spanish and Italian respectively. Finally, in order to check the usefulness of the constructed resource, we apply it into a state-of-the-art Question Answering system and evaluate its impact; the NE lexicon improves the system's accuracy by 28.1%. Compared to previous approaches to build NE repositories, the current proposal represents a step forward in terms of automation, language independence, amount of NEs acquired and richness of the information represented. *
isi.description.allpeopleoriginal Toral, A; Ferrández, S; Monachini, M; Muñoz, R; *
isi.document.sourcetype WOS.SCI *
isi.document.type Article *
isi.document.types Article *
isi.identifier.doi 10.1007/s10579-011-9148-x *
isi.identifier.eissn 1574-0218 *
isi.identifier.isi WOS:000310164600002 *
isi.journal.journaltitle LANGUAGE RESOURCES AND EVALUATION *
isi.journal.journaltitleabbrev LANG RESOUR EVAL *
isi.language.original English *
isi.publisher.place VAN GODEWIJCKSTRAAT 30, 3311 GZ DORDRECHT, NETHERLANDS *
isi.relation.firstpage 383 *
isi.relation.issue 3 *
isi.relation.lastpage 419 *
isi.relation.volume 46 *
isi.title Web 2.0, Language Resources and standards to automatically build a multilingual Named Entity Lexicon *
scopus.authority.ancejournal LANGUAGE RESOURCES AND EVALUATION###1574-020X *
scopus.category 1203 *
scopus.category 3304 *
scopus.category 3310 *
scopus.category 3309 *
scopus.contributor.affiliation Dublin City University -
scopus.contributor.affiliation University of Alicante -
scopus.contributor.affiliation Consiglio Nazionale delle Ricerche -
scopus.contributor.affiliation University of Alicante -
scopus.contributor.afid 60025059 -
scopus.contributor.afid 60010844 -
scopus.contributor.afid 60008941 -
scopus.contributor.afid 60010844 -
scopus.contributor.auid 8839393900 -
scopus.contributor.auid 23090852100 -
scopus.contributor.auid 23397766600 -
scopus.contributor.auid 7202035977 -
scopus.contributor.country Ireland -
scopus.contributor.country Spain -
scopus.contributor.country Italy -
scopus.contributor.country Spain -
scopus.contributor.dptid 113135934 -
scopus.contributor.dptid 103585731 -
scopus.contributor.dptid -
scopus.contributor.dptid 103585731 -
scopus.contributor.name Antonio -
scopus.contributor.name Sergio -
scopus.contributor.name Monica -
scopus.contributor.name Rafael -
scopus.contributor.subaffiliation NCLT, School of Computing; -
scopus.contributor.subaffiliation Natural Language Processing and Information Systems Group;Department of Computing Languages and Systems; -
scopus.contributor.subaffiliation Istituto di Linguistica Computazionale; -
scopus.contributor.subaffiliation Natural Language Processing and Information Systems Group;Department of Computing Languages and Systems; -
scopus.contributor.surname Toral -
scopus.contributor.surname Ferrández -
scopus.contributor.surname Monachini -
scopus.contributor.surname Muñoz -
scopus.date.issued 2012 *
scopus.description.abstracteng This paper proposes to advance in the current state-of-the-art of automatic Language Resource (LR) building by taking into consideration three elements: (1) the knowledge available in existing LRs, (2) the vast amount of information available from the collaborative paradigm that has emerged from the Web 2. 0 and (3) the use of standards to improve interoperability. We present a case study in which a set of LRs for different languages (WordNet for English and Spanish and Parole-Simple-Clips for Italian) are extended with Named Entities (NE) by exploiting Wikipedia and the aforementioned LRs. The practical result is a multilingual NE lexicon connected to these LRs and to two ontologies: SUMO and SIMPLE. Furthermore, the paper addresses an important problem which affects the Computational Linguistics area in the present, interoperability, by making use of the ISO LMF standard to encode this lexicon. The different steps of the procedure (mapping, disambiguation, extraction, NE identification and postprocessing) are comprehensively explained and evaluated. The resulting resource contains 974,567, 137,583 and 125,806 NEs for English, Spanish and Italian respectively. Finally, in order to check the usefulness of the constructed resource, we apply it into a state-of-the-art Question Answering system and evaluate its impact; the NE lexicon improves the system's accuracy by 28. 1%. Compared to previous approaches to build NE repositories, the current proposal represents a step forward in terms of automation, language independence, amount of NEs acquired and richness of the information represented. © 2011 Springer Science+Business Media B.V. *
scopus.description.allpeopleoriginal Toral A.; Ferrandez S.; Monachini M.; Munoz R. *
scopus.differences scopus.subject.keywords *
scopus.differences scopus.description.allpeopleoriginal *
scopus.differences scopus.description.abstracteng *
scopus.document.type ar *
scopus.document.types ar *
scopus.funding.funders 501100000780 - European Commission; *
scopus.identifier.doi 10.1007/s10579-011-9148-x *
scopus.identifier.eissn 1572-8412 *
scopus.identifier.pui 51482810 *
scopus.identifier.scopus 2-s2.0-84867866200 *
scopus.journal.sourceid 145663 *
scopus.language.iso eng *
scopus.relation.firstpage 383 *
scopus.relation.issue 3 *
scopus.relation.lastpage 419 *
scopus.relation.volume 46 *
scopus.subject.keywords Language Resources; Named Entities; Standards; Web 2.0; *
scopus.title Web 2.0, Language Resources and standards to automatically build a multilingual Named Entity Lexicon *
scopus.titleeng Web 2.0, Language Resources and standards to automatically build a multilingual Named Entity Lexicon *
Appare nelle tipologie: 01.01 Articolo in rivista
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/4454
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 4
social impact