CNR Institutional Research Information System

This paper proposes to advance in the current state-of-the-art of automatic Language Resource (LR) building by taking into consideration three elements: (1) the knowledge available in existing LRs, (2) the vast amount of information available from the collaborative paradigm that has emerged from the Web 2.0 and (3) the use of standards to improve interoperability. We present a case study in which a set of LRs for different languages (WordNet for English and Spanish and Parole-Simple-Clips for Italian) are extended with Named Entities (NE) by exploiting Wikipedia and the aforementioned LRs. The practical result is a multilingual NE lexicon connected to these LRs and to two ontologies: SUMO and SIMPLE. Furthermore, the paper addresses an important problem which affects the Computational Linguistics area in the present, interoperability, by making use of the ISO LMF standard to encode this lexicon. The different steps of the procedure (mapping, disambiguation, extraction, NE identification and postprocessing) are comprehensively explained and evaluated. The resulting resource contains 974,567, 137,583 and 125,806 NEs for English, Spanish and Italian respectively. Finally, in order to check the usefulness of the constructed resource, we apply it into a stateof-the-art Question Answering system and evaluate its impact; the NE lexicon improves the system's accuracy by 28.1%. Compared to previous approaches to build NE repositories, the current proposal represents a step forward in terms of automation, language independence, amount of NEs acquired and richness of the information represented.

Web 2.0, Language Resources and standards to automatically build a multilingual Named Entity Lexicon

Toral Antonio;Ferrández Sergio;Monachini Monica;Munoz Rafael

2012

Abstract

This paper proposes to advance in the current state-of-the-art of automatic Language Resource (LR) building by taking into consideration three elements: (1) the knowledge available in existing LRs, (2) the vast amount of information available from the collaborative paradigm that has emerged from the Web 2.0 and (3) the use of standards to improve interoperability. We present a case study in which a set of LRs for different languages (WordNet for English and Spanish and Parole-Simple-Clips for Italian) are extended with Named Entities (NE) by exploiting Wikipedia and the aforementioned LRs. The practical result is a multilingual NE lexicon connected to these LRs and to two ontologies: SUMO and SIMPLE. Furthermore, the paper addresses an important problem which affects the Computational Linguistics area in the present, interoperability, by making use of the ISO LMF standard to encode this lexicon. The different steps of the procedure (mapping, disambiguation, extraction, NE identification and postprocessing) are comprehensively explained and evaluated. The resulting resource contains 974,567, 137,583 and 125,806 NEs for English, Spanish and Italian respectively. Finally, in order to check the usefulness of the constructed resource, we apply it into a stateof-the-art Question Answering system and evaluate its impact; the NE lexicon improves the system's accuracy by 28.1%. Compared to previous approaches to build NE repositories, the current proposal represents a step forward in terms of automation, language independence, amount of NEs acquired and richness of the information represented.

Scheda breve

Scheda completa

Scheda completa (DC)

Campo DC	Valore	Lingua
dc.authority.ancejournal	LANGUAGE RESOURCES AND EVALUATION	-
dc.authority.orgunit	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	-
dc.authority.people	Toral Antonio	it
dc.authority.people	Ferrández Sergio	it
dc.authority.people	Monachini Monica	it
dc.authority.people	Munoz Rafael	it
dc.collection.id.s	b3f88f24-048a-4e43-8ab1-6697b90e068e	*
dc.collection.name	01.01 Articolo in rivista	*
dc.contributor.appartenenza	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	*
dc.contributor.appartenenza.mi	918	*
dc.date.accessioned	2024/02/16 01:39:29	-
dc.date.available	2024/02/16 01:39:29	-
dc.date.issued	2012	-
dc.description.abstracteng	This paper proposes to advance in the current state-of-the-art of automatic Language Resource (LR) building by taking into consideration three elements: (1) the knowledge available in existing LRs, (2) the vast amount of information available from the collaborative paradigm that has emerged from the Web 2.0 and (3) the use of standards to improve interoperability. We present a case study in which a set of LRs for different languages (WordNet for English and Spanish and Parole-Simple-Clips for Italian) are extended with Named Entities (NE) by exploiting Wikipedia and the aforementioned LRs. The practical result is a multilingual NE lexicon connected to these LRs and to two ontologies: SUMO and SIMPLE. Furthermore, the paper addresses an important problem which affects the Computational Linguistics area in the present, interoperability, by making use of the ISO LMF standard to encode this lexicon. The different steps of the procedure (mapping, disambiguation, extraction, NE identification and postprocessing) are comprehensively explained and evaluated. The resulting resource contains 974,567, 137,583 and 125,806 NEs for English, Spanish and Italian respectively. Finally, in order to check the usefulness of the constructed resource, we apply it into a stateof-the-art Question Answering system and evaluate its impact; the NE lexicon improves the system's accuracy by 28.1%. Compared to previous approaches to build NE repositories, the current proposal represents a step forward in terms of automation, language independence, amount of NEs acquired and richness of the information represented.	-
dc.description.affiliations	[1] CNR-ILC, Pisa; [2] Dublin City University, Ireland; [3] University of Alicante, Spain	-
dc.description.allpeople	Toral, Antonio; Ferrández, Sergio; Monachini, Monica; Munoz, Rafael	-
dc.description.allpeopleoriginal	Toral, Antonio [2]; Ferrández, Sergio [3]; Monachini, Monica [1]; Munoz, Rafael [3]	-
dc.description.fulltext	none	en
dc.description.note	ID_PUMA: /cnr.ilc/2012-A0-017	-
dc.description.numberofauthors	4	-
dc.identifier.doi	10.1007/s10579-011-9148-x	-
dc.identifier.isi	WOS:000310164600002	-
dc.identifier.scopus	2-s2.0-84867866200	-
dc.identifier.uri	https://hdl.handle.net/20.500.14243/4454	-
dc.identifier.url	http://link.springer.com/content/pdf/10.1007%2Fs10579-011-9148-x.pdf	-
dc.language.iso	eng	-
dc.relation.firstpage	383	-
dc.relation.issue	3	-
dc.relation.lastpage	419	-
dc.relation.volume	46	-
dc.subject.keywords	Language Resources; Named Entities; Web 2.0; Standards	-
dc.subject.singlekeyword	Language Resources	*
dc.subject.singlekeyword	Named Entities	*
dc.subject.singlekeyword	Web 2.0	*
dc.subject.singlekeyword	Standards	*
dc.title	Web 2.0, Language Resources and standards to automatically build a multilingual Named Entity Lexicon	en
dc.type.driver	info:eu-repo/semantics/article	-
dc.type.full	01 Contributo su Rivista::01.01 Articolo in rivista	it
dc.type.miur	262	-
dc.type.referee	Sì, ma tipo non specificato	-
dc.ugov.descaux1	218786	-
iris.isi.extIssued	2012	-
iris.isi.extTitle	Web 2.0, Language Resources and standards to automatically build a multilingual Named Entity Lexicon	-
iris.orcid.lastModifiedDate	2024/04/04 13:08:40	*
iris.orcid.lastModifiedMillisecond	1712228920819	*
iris.scopus.extIssued	2012	-
iris.scopus.extTitle	Web 2.0, Language Resources and standards to automatically build a multilingual Named Entity Lexicon	-
iris.sitodocente.maxattempts	4	-
iris.unpaywall.doi	10.1007/s10579-011-9148-x	*
iris.unpaywall.isoa	false	*
iris.unpaywall.journalisindoaj	false	*
iris.unpaywall.metadataCallLastModified	03/04/2026 04:36:20	-
iris.unpaywall.metadataCallLastModifiedMillisecond	1775183780743	-
iris.unpaywall.oastatus	closed	*
isi.authority.ancejournal	LANGUAGE RESOURCES AND EVALUATION###1574-020X	*
isi.category	EV	*
isi.contributor.affiliation	Dublin City University	-
isi.contributor.affiliation	Universitat d'Alacant	-
isi.contributor.affiliation	Consiglio Nazionale delle Ricerche (CNR)	-
isi.contributor.affiliation	Universitat d'Alacant	-
isi.contributor.country	Ireland	-
isi.contributor.country	Spain	-
isi.contributor.country	Italy	-
isi.contributor.country	Spain	-
isi.contributor.name	Antonio	-
isi.contributor.name	Sergio	-
isi.contributor.name	Monica	-
isi.contributor.name	Rafael	-
isi.contributor.researcherId	OQJ-6695-2025	-
isi.contributor.researcherId	GER-7675-2022	-
isi.contributor.researcherId	F-3077-2015	-
isi.contributor.researcherId	H-3101-2015	-
isi.contributor.subaffiliation	NCLT	-
isi.contributor.subaffiliation	Nat Language Proc & Informat Syst Grp	-
isi.contributor.subaffiliation	Ist Linguist Computaz	-
isi.contributor.subaffiliation	Nat Language Proc & Informat Syst Grp	-
isi.contributor.surname	Toral	-
isi.contributor.surname	Ferrandez	-
isi.contributor.surname	Monachini	-
isi.contributor.surname	Munoz	-
isi.date.issued	2012	*
isi.description.abstracteng	This paper proposes to advance in the current state-of-the-art of automatic Language Resource (LR) building by taking into consideration three elements: (1) the knowledge available in existing LRs, (2) the vast amount of information available from the collaborative paradigm that has emerged from the Web 2.0 and (3) the use of standards to improve interoperability. We present a case study in which a set of LRs for different languages (WordNet for English and Spanish and Parole-Simple-Clips for Italian) are extended with Named Entities (NE) by exploiting Wikipedia and the aforementioned LRs. The practical result is a multilingual NE lexicon connected to these LRs and to two ontologies: SUMO and SIMPLE. Furthermore, the paper addresses an important problem which affects the Computational Linguistics area in the present, interoperability, by making use of the ISO LMF standard to encode this lexicon. The different steps of the procedure (mapping, disambiguation, extraction, NE identification and postprocessing) are comprehensively explained and evaluated. The resulting resource contains 974,567, 137,583 and 125,806 NEs for English, Spanish and Italian respectively. Finally, in order to check the usefulness of the constructed resource, we apply it into a state-of-the-art Question Answering system and evaluate its impact; the NE lexicon improves the system's accuracy by 28.1%. Compared to previous approaches to build NE repositories, the current proposal represents a step forward in terms of automation, language independence, amount of NEs acquired and richness of the information represented.	*
isi.description.allpeopleoriginal	Toral, A; Ferrández, S; Monachini, M; Muñoz, R;	*
isi.document.sourcetype	WOS.SCI	*
isi.document.type	Article	*
isi.document.types	Article	*
isi.identifier.doi	10.1007/s10579-011-9148-x	*
isi.identifier.eissn	1574-0218	*
isi.identifier.isi	WOS:000310164600002	*
isi.journal.journaltitle	LANGUAGE RESOURCES AND EVALUATION	*
isi.journal.journaltitleabbrev	LANG RESOUR EVAL	*
isi.language.original	English	*
isi.publisher.place	VAN GODEWIJCKSTRAAT 30, 3311 GZ DORDRECHT, NETHERLANDS	*
isi.relation.firstpage	383	*
isi.relation.issue	3	*
isi.relation.lastpage	419	*
isi.relation.volume	46	*
isi.title	Web 2.0, Language Resources and standards to automatically build a multilingual Named Entity Lexicon	*
scopus.authority.ancejournal	LANGUAGE RESOURCES AND EVALUATION###1574-020X	*
scopus.category	1203	*
scopus.category	3304	*
scopus.category	3310	*
scopus.category	3309	*
scopus.contributor.affiliation	Dublin City University	-
scopus.contributor.affiliation	University of Alicante	-
scopus.contributor.affiliation	Consiglio Nazionale delle Ricerche	-
scopus.contributor.affiliation	University of Alicante	-
scopus.contributor.afid	60025059	-
scopus.contributor.afid	60010844	-
scopus.contributor.afid	60008941	-
scopus.contributor.afid	60010844	-
scopus.contributor.auid	8839393900	-
scopus.contributor.auid	23090852100	-
scopus.contributor.auid	23397766600	-
scopus.contributor.auid	7202035977	-
scopus.contributor.country	Ireland	-
scopus.contributor.country	Spain	-
scopus.contributor.country	Italy	-
scopus.contributor.country	Spain	-
scopus.contributor.dptid	113135934	-
scopus.contributor.dptid	103585731	-
scopus.contributor.dptid		-
scopus.contributor.dptid	103585731	-
scopus.contributor.name	Antonio	-
scopus.contributor.name	Sergio	-
scopus.contributor.name	Monica	-
scopus.contributor.name	Rafael	-
scopus.contributor.subaffiliation	NCLT, School of Computing;	-
scopus.contributor.subaffiliation	Natural Language Processing and Information Systems Group;Department of Computing Languages and Systems;	-
scopus.contributor.subaffiliation	Istituto di Linguistica Computazionale;	-
scopus.contributor.subaffiliation	Natural Language Processing and Information Systems Group;Department of Computing Languages and Systems;	-
scopus.contributor.surname	Toral	-
scopus.contributor.surname	Ferrández	-
scopus.contributor.surname	Monachini	-
scopus.contributor.surname	Muñoz	-
scopus.date.issued	2012	*
scopus.description.abstracteng	This paper proposes to advance in the current state-of-the-art of automatic Language Resource (LR) building by taking into consideration three elements: (1) the knowledge available in existing LRs, (2) the vast amount of information available from the collaborative paradigm that has emerged from the Web 2. 0 and (3) the use of standards to improve interoperability. We present a case study in which a set of LRs for different languages (WordNet for English and Spanish and Parole-Simple-Clips for Italian) are extended with Named Entities (NE) by exploiting Wikipedia and the aforementioned LRs. The practical result is a multilingual NE lexicon connected to these LRs and to two ontologies: SUMO and SIMPLE. Furthermore, the paper addresses an important problem which affects the Computational Linguistics area in the present, interoperability, by making use of the ISO LMF standard to encode this lexicon. The different steps of the procedure (mapping, disambiguation, extraction, NE identification and postprocessing) are comprehensively explained and evaluated. The resulting resource contains 974,567, 137,583 and 125,806 NEs for English, Spanish and Italian respectively. Finally, in order to check the usefulness of the constructed resource, we apply it into a state-of-the-art Question Answering system and evaluate its impact; the NE lexicon improves the system's accuracy by 28. 1%. Compared to previous approaches to build NE repositories, the current proposal represents a step forward in terms of automation, language independence, amount of NEs acquired and richness of the information represented. © 2011 Springer Science+Business Media B.V.	*
scopus.description.allpeopleoriginal	Toral A.; Ferrandez S.; Monachini M.; Munoz R.	*
scopus.differences	scopus.subject.keywords	*
scopus.differences	scopus.description.allpeopleoriginal	*
scopus.differences	scopus.description.abstracteng	*
scopus.document.type	ar	*
scopus.document.types	ar	*
scopus.funding.funders	501100000780 - European Commission;	*
scopus.identifier.doi	10.1007/s10579-011-9148-x	*
scopus.identifier.eissn	1572-8412	*
scopus.identifier.pui	51482810	*
scopus.identifier.scopus	2-s2.0-84867866200	*
scopus.journal.sourceid	145663	*
scopus.language.iso	eng	*
scopus.relation.firstpage	383	*
scopus.relation.issue	3	*
scopus.relation.lastpage	419	*
scopus.relation.volume	46	*
scopus.subject.keywords	Language Resources; Named Entities; Standards; Web 2.0;	*
scopus.title	Web 2.0, Language Resources and standards to automatically build a multilingual Named Entity Lexicon	*
scopus.titleeng	Web 2.0, Language Resources and standards to automatically build a multilingual Named Entity Lexicon	*
Appare nelle tipologie:	01.01 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/4454

Citazioni

ND

4

4

social impact