CNR Institutional Research Information System

This paper presents the results of a study on grey literature (GL) in the field of Natural Language Processing (NLP). Our data has been collected in a corpus of ca 13,000 records corresponding to the titles of papers presented at International Conferences from 1950 to June 2008. A statistical representation of the most significant terms relative to GL in NLP and other interrelated disciplines associates old and new words, highlighting the terminological changes that have taken place in the course of time. Aim of our study is to contribute to the creation of language resources for the extraction of GL coming from the Web in order to help prevent the disappearance of documents containing NLP words that have undergone rapid development over the last decades. This paper is organised as follows: after a general introduction to our work, section 2 provides a historical overview of NLP; sections 3 and 4 offer an account of the most relevant terms used by specialists in different periods, and indicative of the changes that have taken place; section 5 describes the methodology we have used and also contains information on our GL database and a graphical representation of the data. Finally, the conclusions stress the need to integrate pre-existing or obsolete words and expressions, creating NLP synonym relations.

Grey Literature for Natural Language Processing: a Terminological and Statistical Approach.

Cignoni L;Pardelli G;Sassi M

2009

Abstract

This paper presents the results of a study on grey literature (GL) in the field of Natural Language Processing (NLP). Our data has been collected in a corpus of ca 13,000 records corresponding to the titles of papers presented at International Conferences from 1950 to June 2008. A statistical representation of the most significant terms relative to GL in NLP and other interrelated disciplines associates old and new words, highlighting the terminological changes that have taken place in the course of time. Aim of our study is to contribute to the creation of language resources for the extraction of GL coming from the Web in order to help prevent the disappearance of documents containing NLP words that have undergone rapid development over the last decades. This paper is organised as follows: after a general introduction to our work, section 2 provides a historical overview of NLP; sections 3 and 4 offer an account of the most relevant terms used by specialists in different periods, and indicative of the changes that have taken place; section 5 describes the methodology we have used and also contains information on our GL database and a graphical representation of the data. Finally, the conclusions stress the need to integrate pre-existing or obsolete words and expressions, creating NLP synonym relations.

Scheda breve

Scheda completa

Scheda completa (DC)

Campo DC	Valore	Lingua
dc.authority.ancejournal	THE GL-CONFERENCE SERIES. CONFERENCE PROCEEDINGS	-
dc.authority.orgunit	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	-
dc.authority.people	Cignoni L	it
dc.authority.people	Pardelli G	it
dc.authority.people	Sassi M	it
dc.collection.id.s	71c7200a-7c5f-4e83-8d57-d3d2ba88f40d	*
dc.collection.name	04.01 Contributo in Atti di convegno	*
dc.contributor.appartenenza	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	*
dc.contributor.appartenenza.mi	918	*
dc.date.accessioned	2024/02/19 19:51:17	-
dc.date.available	2024/02/19 19:51:17	-
dc.date.issued	2009	-
dc.description.abstracteng	This paper presents the results of a study on grey literature (GL) in the field of Natural Language Processing (NLP). Our data has been collected in a corpus of ca 13,000 records corresponding to the titles of papers presented at International Conferences from 1950 to June 2008. A statistical representation of the most significant terms relative to GL in NLP and other interrelated disciplines associates old and new words, highlighting the terminological changes that have taken place in the course of time. Aim of our study is to contribute to the creation of language resources for the extraction of GL coming from the Web in order to help prevent the disappearance of documents containing NLP words that have undergone rapid development over the last decades. This paper is organised as follows: after a general introduction to our work, section 2 provides a historical overview of NLP; sections 3 and 4 offer an account of the most relevant terms used by specialists in different periods, and indicative of the changes that have taken place; section 5 describes the methodology we have used and also contains information on our GL database and a graphical representation of the data. Finally, the conclusions stress the need to integrate pre-existing or obsolete words and expressions, creating NLP synonym relations.	-
dc.description.affiliations	Istituto di Linguistica Computazionale "Antonio Zampolli", CNR-ILC, Pisa, Italy	-
dc.description.allpeople	Cignoni, L; Pardelli, G; Sassi, M	-
dc.description.allpeopleoriginal	Cignoni L.; Pardelli G.; Sassi M.	-
dc.description.fulltext	none	en
dc.description.note	ISI Web of Science (WOS) (Codice:000264705400010) Google Scholar (Codice:http://scholar.google.it/scholar?cites=7898910591977305850&as_sdt=2005&sciodt=0,5&hl=it) PuMa (Codice:cnr.ilc/2009-A2-003) OpenGrey (Codice:http://hdl.handle.net/10068/697993) La letteratura grigia LG prodotta nel settore del trattamento automatico della lingua in mezzo secolo di ricerche linguistico-computazionali viene indagata attraverso repository open access dedicati. In questo studio, termini vecchi e nuovi vengono messi a confronto; il cambiamento delle parole attraverso il tempo si snoda su una indicizzazione operata dal Sistema DBT Data Base Testuale (Brevetto CNR-ILC ). Dall'aggettivo dei primi anni '60 si arriva alla creazione di MWE come e . I termini della disciplina non solo più termini di nicchia come nel recente passato, ma sono diventati ancore di recupero informativo. Oggi essi circolano nelle fonti documentarie della rete telematica per arricchire la conoscenza del comune cittadino e offrire allo specialista ulteriori risorse da indagare e da confrontare per ulteriori ricerche.	-
dc.description.numberofauthors	3	-
dc.identifier.isbn	978-90-77484-11-1	-
dc.identifier.isi	WOS:000264705400010	-
dc.identifier.uri	https://hdl.handle.net/20.500.14243/65114	-
dc.language.iso	eng	-
dc.publisher.country	NLD	-
dc.publisher.name	TextRelease	-
dc.publisher.place	Amsterdam	-
dc.relation.alleditors	Dominic J. Farace; Jerry Frantzen; GreyNet, Grey Literature Network Service	-
dc.relation.conferencedate	DEC 08-09, 2008	-
dc.relation.conferencename	Tenth International Conference on Grey Literature: Designing the Grey Grid for Information Society	-
dc.relation.conferenceplace	Amsterdam	-
dc.relation.firstpage	93	-
dc.relation.ispartofbook	Designing the Grey Grid for Information Society	-
dc.relation.lastpage	100	-
dc.relation.numberofpages	8	-
dc.subject.keywords	Computational Linguistics	-
dc.subject.keywords	Terminology	-
dc.subject.keywords	Grey Literature	-
dc.subject.singlekeyword	Computational Linguistics	*
dc.subject.singlekeyword	Terminology	*
dc.subject.singlekeyword	Grey Literature	*
dc.title	Grey Literature for Natural Language Processing: a Terminological and Statistical Approach.	en
dc.type.driver	info:eu-repo/semantics/conferenceObject	-
dc.type.full	04 Contributo in convegno::04.01 Contributo in Atti di convegno	it
dc.type.miur	273	-
dc.type.referee	Sì, ma tipo non specificato	-
dc.ugov.descaux1	84740	-
iris.isi.metadataErrorDescription	0	-
iris.isi.metadataErrorType	ERROR_NO_MATCH	-
iris.isi.metadataStatus	ERROR	-
iris.orcid.lastModifiedDate	2024/04/04 17:39:56	*
iris.orcid.lastModifiedMillisecond	1712245196397	*
iris.sitodocente.maxattempts	2	-
Appare nelle tipologie:	04.01 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/65114

Citazioni

ND

ND

0

social impact