CNR Institutional Research Information System

"When we read the articles or papers of a particular domain, we can recognize some lexical items in the texts as technical terms. In a domain where new knowledge is generated, new terms are constantly created to fulfil the needs of the domain, while others become obsolete. In addition, existing terms may undergo changes of meaning..." (Kageura K.,1998/1999). According to Kaugera, our aim with this work is to make a "journey" in the Grey Literature (GL) domain in order to offer an overall vision on the terms used and the links" "between them. Moreover, by performing a terminological comparison over a given period of time it could be possible to trace the presence of obsolete words as well as of neologisms in the most recent research fields.Within this scenario, the work analyzes a corpus constituted of the entire amount of full" "research papers published in the GL conference series over a time span of more than one decade (2003-2014) with the aim of creating a terminological map of relevant words. "... corpora used to extract terminological units can be further investigated to find semantic and conceptual information on terms or to represent conceptual relationships between terms. (Bourigault D. et al., 2001). Another interesting inquiry is the terminology used in the GL conferences for describing the types of documents (Pej?ová P. et al., 2012). The work is split up in four sections: creation of the corpus by acquiring the digital papers of GL conference proceedings (GL5 - GL16)1; data cleaning; data processing; terminological" "analysis and comparison. The corpus - made up of 231 research papers (for a total amount of 785.042 tokens) - was processed using a Natural Language Processing (NLP) tool for term extraction developed at the Institute of Computational Linguistics "Antonio Zampolli" of CNR (Goggi et al. 2015; 2016). This tool is what is called a "pipeline" (that is, a sequence of different tools) which extracts lexical knowledge from texts: in short, this is a rule system tool for knowledge extraction and document indexing that combines NLP technologies for term extraction and techniques to measure the associative strength of multi-words. This tool extracts a list of single (monograms) and multi-word terms (bigrams and trigrams) ordered by frequency with respect to the context. The pipeline - used as semantic engine within the MAPS project - has been customized for the extraction of terms from our corpus. This survey on the results of the information extraction process performed by the described NLP tool has been a sort of linguistic path in the past and present of terminology used in GL proceedings. By means of samplings, it has been possible to obtain the terminological flow in GL domain and to determine if and how the lexicon was evolving over these twelve years and investigate on its dynamic nature.

A terminological "journey" in the Grey Literature domain

Bartolini R;Pardelli G;Goggi S;Giannini S;Biagioni S

2016

Abstract

"When we read the articles or papers of a particular domain, we can recognize some lexical items in the texts as technical terms. In a domain where new knowledge is generated, new terms are constantly created to fulfil the needs of the domain, while others become obsolete. In addition, existing terms may undergo changes of meaning..." (Kageura K.,1998/1999). According to Kaugera, our aim with this work is to make a "journey" in the Grey Literature (GL) domain in order to offer an overall vision on the terms used and the links" "between them. Moreover, by performing a terminological comparison over a given period of time it could be possible to trace the presence of obsolete words as well as of neologisms in the most recent research fields.Within this scenario, the work analyzes a corpus constituted of the entire amount of full" "research papers published in the GL conference series over a time span of more than one decade (2003-2014) with the aim of creating a terminological map of relevant words. "... corpora used to extract terminological units can be further investigated to find semantic and conceptual information on terms or to represent conceptual relationships between terms. (Bourigault D. et al., 2001). Another interesting inquiry is the terminology used in the GL conferences for describing the types of documents (Pej?ová P. et al., 2012). The work is split up in four sections: creation of the corpus by acquiring the digital papers of GL conference proceedings (GL5 - GL16)1; data cleaning; data processing; terminological" "analysis and comparison. The corpus - made up of 231 research papers (for a total amount of 785.042 tokens) - was processed using a Natural Language Processing (NLP) tool for term extraction developed at the Institute of Computational Linguistics "Antonio Zampolli" of CNR (Goggi et al. 2015; 2016). This tool is what is called a "pipeline" (that is, a sequence of different tools) which extracts lexical knowledge from texts: in short, this is a rule system tool for knowledge extraction and document indexing that combines NLP technologies for term extraction and techniques to measure the associative strength of multi-words. This tool extracts a list of single (monograms) and multi-word terms (bigrams and trigrams) ordered by frequency with respect to the context. The pipeline - used as semantic engine within the MAPS project - has been customized for the extraction of terms from our corpus. This survey on the results of the information extraction process performed by the described NLP tool has been a sort of linguistic path in the past and present of terminology used in GL proceedings. By means of samplings, it has been possible to obtain the terminological flow in GL domain and to determine if and how the lexicon was evolving over these twelve years and investigate on its dynamic nature.

Scheda breve

Scheda completa

Scheda completa (DC)

Campo DC	Valore	Lingua
dc.authority.orgunit	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	-
dc.authority.orgunit	Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI	-
dc.authority.people	Bartolini R	it
dc.authority.people	Pardelli G	it
dc.authority.people	Goggi S	it
dc.authority.people	Giannini S	it
dc.authority.people	Biagioni S	it
dc.collection.id.s	69aaa6b3-f0f0-47c1-b9a1-040bae867ec3	*
dc.collection.name	04.02 Abstract in Atti di convegno	*
dc.contributor.appartenenza	Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI	*
dc.contributor.appartenenza	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	*
dc.contributor.appartenenza.mi	918	*
dc.contributor.appartenenza.mi	973	*
dc.date.accessioned	2024/02/21 09:40:51	-
dc.date.available	2024/02/21 09:40:51	-
dc.date.issued	2016	-
dc.description.abstract	"When we read the articles or papers of a particular domain, we can recognize some lexical items in the texts as technical terms. In a domain where new knowledge is generated, new terms are constantly created to fulfil the needs of the domain, while others become obsolete. In addition, existing terms may undergo changes of meaning..." (Kageura K.,1998/1999). According to Kaugera, our aim with this work is to make a "journey" in the Grey Literature (GL) domain in order to offer an overall vision on the terms used and the links" "between them. Moreover, by performing a terminological comparison over a given period of time it could be possible to trace the presence of obsolete words as well as of neologisms in the most recent research fields.Within this scenario, the work analyzes a corpus constituted of the entire amount of full" "research papers published in the GL conference series over a time span of more than one decade (2003-2014) with the aim of creating a terminological map of relevant words. "... corpora used to extract terminological units can be further investigated to find semantic and conceptual information on terms or to represent conceptual relationships between terms. (Bourigault D. et al., 2001). Another interesting inquiry is the terminology used in the GL conferences for describing the types of documents (Pej?ová P. et al., 2012). The work is split up in four sections: creation of the corpus by acquiring the digital papers of GL conference proceedings (GL5 - GL16)1; data cleaning; data processing; terminological" "analysis and comparison. The corpus - made up of 231 research papers (for a total amount of 785.042 tokens) - was processed using a Natural Language Processing (NLP) tool for term extraction developed at the Institute of Computational Linguistics "Antonio Zampolli" of CNR (Goggi et al. 2015; 2016). This tool is what is called a "pipeline" (that is, a sequence of different tools) which extracts lexical knowledge from texts: in short, this is a rule system tool for knowledge extraction and document indexing that combines NLP technologies for term extraction and techniques to measure the associative strength of multi-words. This tool extracts a list of single (monograms) and multi-word terms (bigrams and trigrams) ordered by frequency with respect to the context. The pipeline - used as semantic engine within the MAPS project - has been customized for the extraction of terms from our corpus. This survey on the results of the information extraction process performed by the described NLP tool has been a sort of linguistic path in the past and present of terminology used in GL proceedings. By means of samplings, it has been possible to obtain the terminological flow in GL domain and to determine if and how the lexicon was evolving over these twelve years and investigate on its dynamic nature.	-
dc.description.affiliations	CNR-ILC, Pisa, Italy; CNR-ILC, Pisa, Italy; CNR-ILC, Pisa, Italy; CNR-ISTI, Pisa, Italy; CNR-ISTI, Pisa, Italy	-
dc.description.allpeople	Bartolini, R; Pardelli, G; Goggi, S; Giannini, S; Biagioni, S	-
dc.description.allpeopleoriginal	Bartolini R.; Pardelli G.; Goggi S.; Giannini S.; Biagioni S.	-
dc.description.fulltext	open	en
dc.description.numberofauthors	5	-
dc.identifier.isbn	978-90-77484-29-6	-
dc.identifier.uri	https://hdl.handle.net/20.500.14243/320039	-
dc.language.iso	eng	-
dc.relation.alleditors	Dominic Farace, Jerry Frantzen	-
dc.relation.conferencedate	28-29 November 2016	-
dc.relation.conferencename	GL18 - Eighteenth International Conference on Grey Literature: Leveraging Diversity in Grey Literature	-
dc.relation.conferenceplace	New York, US	-
dc.relation.firstpage	79	-
dc.relation.ispartofbook	Leveraging Diversity in Grey Literature	-
dc.relation.lastpage	84	-
dc.subject.keywords	Grey Literature	-
dc.subject.keywords	Digital Repositories	-
dc.subject.keywords	Open Access	-
dc.subject.singlekeyword	Grey Literature	*
dc.subject.singlekeyword	Digital Repositories	*
dc.subject.singlekeyword	Open Access	*
dc.title	A terminological "journey" in the Grey Literature domain	en
dc.type.driver	info:eu-repo/semantics/conferenceObject	-
dc.type.full	04 Contributo in convegno::04.02 Abstract in Atti di convegno	it
dc.type.miur	274	-
dc.type.referee	Sì, ma tipo non specificato	-
dc.ugov.descaux1	362848	-
iris.mediafilter.data	2025/04/23 04:30:35	*
iris.orcid.lastModifiedDate	2024/04/04 14:01:28	*
iris.orcid.lastModifiedMillisecond	1712232088198	*
iris.sitodocente.maxattempts	1	-
Appare nelle tipologie:	04.02 Abstract in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
prod_362848-doc_181313.pdf accesso aperto Descrizione: Presentazione - A terminological "journey" in the Grey Literature domain Tipologia: Versione Editoriale (PDF) Dimensione 915.28 kB Formato Adobe PDF Visualizza/Apri	915.28 kB	Adobe PDF	Visualizza/Apri
prod_362848-doc_119547.pdf accesso aperto Descrizione: A terminological "journey" in the Grey Literature domain Tipologia: Versione Editoriale (PDF) Dimensione 1.47 MB Formato Adobe PDF Visualizza/Apri	1.47 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/320039

Citazioni

ND

ND

ND

social impact