"When we read the articles or papers of a particular domain, we can recognize some lexical items in the texts as technical terms. In a domain where new knowledge is generated, new terms are constantly created to fulfil the needs of the domain, while others become obsolete. In addition, existing terms may undergo changes of meaning..." (Kageura K.,1998/1999). According to Kaugera, our aim with this work is to make a "journey" in the Grey Literature (GL) domain in order to offer an overall vision on the terms used and the links" "between them. Moreover, by performing a terminological comparison over a given period of time it could be possible to trace the presence of obsolete words as well as of neologisms in the most recent research fields.Within this scenario, the work analyzes a corpus constituted of the entire amount of full" "research papers published in the GL conference series over a time span of more than one decade (2003-2014) with the aim of creating a terminological map of relevant words. "... corpora used to extract terminological units can be further investigated to find semantic and conceptual information on terms or to represent conceptual relationships between terms. (Bourigault D. et al., 2001). Another interesting inquiry is the terminology used in the GL conferences for describing the types of documents (Pej?ová P. et al., 2012). The work is split up in four sections: creation of the corpus by acquiring the digital papers of GL conference proceedings (GL5 - GL16)1; data cleaning; data processing; terminological" "analysis and comparison. The corpus - made up of 231 research papers (for a total amount of 785.042 tokens) - was processed using a Natural Language Processing (NLP) tool for term extraction developed at the Institute of Computational Linguistics "Antonio Zampolli" of CNR (Goggi et al. 2015; 2016). This tool is what is called a "pipeline" (that is, a sequence of different tools) which extracts lexical knowledge from texts: in short, this is a rule system tool for knowledge extraction and document indexing that combines NLP technologies for term extraction and techniques to measure the associative strength of multi-words. This tool extracts a list of single (monograms) and multi-word terms (bigrams and trigrams) ordered by frequency with respect to the context. The pipeline - used as semantic engine within the MAPS project - has been customized for the extraction of terms from our corpus. This survey on the results of the information extraction process performed by the described NLP tool has been a sort of linguistic path in the past and present of terminology used in GL proceedings. By means of samplings, it has been possible to obtain the terminological flow in GL domain and to determine if and how the lexicon was evolving over these twelve years and investigate on its dynamic nature.

A terminological "journey" in the Grey Literature domain

Bartolini R;Pardelli G;Goggi S;Giannini S;Biagioni S
2016

Abstract

"When we read the articles or papers of a particular domain, we can recognize some lexical items in the texts as technical terms. In a domain where new knowledge is generated, new terms are constantly created to fulfil the needs of the domain, while others become obsolete. In addition, existing terms may undergo changes of meaning..." (Kageura K.,1998/1999). According to Kaugera, our aim with this work is to make a "journey" in the Grey Literature (GL) domain in order to offer an overall vision on the terms used and the links" "between them. Moreover, by performing a terminological comparison over a given period of time it could be possible to trace the presence of obsolete words as well as of neologisms in the most recent research fields.Within this scenario, the work analyzes a corpus constituted of the entire amount of full" "research papers published in the GL conference series over a time span of more than one decade (2003-2014) with the aim of creating a terminological map of relevant words. "... corpora used to extract terminological units can be further investigated to find semantic and conceptual information on terms or to represent conceptual relationships between terms. (Bourigault D. et al., 2001). Another interesting inquiry is the terminology used in the GL conferences for describing the types of documents (Pej?ová P. et al., 2012). The work is split up in four sections: creation of the corpus by acquiring the digital papers of GL conference proceedings (GL5 - GL16)1; data cleaning; data processing; terminological" "analysis and comparison. The corpus - made up of 231 research papers (for a total amount of 785.042 tokens) - was processed using a Natural Language Processing (NLP) tool for term extraction developed at the Institute of Computational Linguistics "Antonio Zampolli" of CNR (Goggi et al. 2015; 2016). This tool is what is called a "pipeline" (that is, a sequence of different tools) which extracts lexical knowledge from texts: in short, this is a rule system tool for knowledge extraction and document indexing that combines NLP technologies for term extraction and techniques to measure the associative strength of multi-words. This tool extracts a list of single (monograms) and multi-word terms (bigrams and trigrams) ordered by frequency with respect to the context. The pipeline - used as semantic engine within the MAPS project - has been customized for the extraction of terms from our corpus. This survey on the results of the information extraction process performed by the described NLP tool has been a sort of linguistic path in the past and present of terminology used in GL proceedings. By means of samplings, it has been possible to obtain the terminological flow in GL domain and to determine if and how the lexicon was evolving over these twelve years and investigate on its dynamic nature.
Campo DC Valore Lingua
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC -
dc.authority.orgunit Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI -
dc.authority.people Bartolini R it
dc.authority.people Pardelli G it
dc.authority.people Goggi S it
dc.authority.people Giannini S it
dc.authority.people Biagioni S it
dc.collection.id.s 69aaa6b3-f0f0-47c1-b9a1-040bae867ec3 *
dc.collection.name 04.02 Abstract in Atti di convegno *
dc.contributor.appartenenza Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.contributor.appartenenza.mi 973 *
dc.date.accessioned 2024/02/21 09:40:51 -
dc.date.available 2024/02/21 09:40:51 -
dc.date.issued 2016 -
dc.description.abstract "When we read the articles or papers of a particular domain, we can recognize some lexical items in the texts as technical terms. In a domain where new knowledge is generated, new terms are constantly created to fulfil the needs of the domain, while others become obsolete. In addition, existing terms may undergo changes of meaning..." (Kageura K.,1998/1999). According to Kaugera, our aim with this work is to make a "journey" in the Grey Literature (GL) domain in order to offer an overall vision on the terms used and the links" "between them. Moreover, by performing a terminological comparison over a given period of time it could be possible to trace the presence of obsolete words as well as of neologisms in the most recent research fields.Within this scenario, the work analyzes a corpus constituted of the entire amount of full" "research papers published in the GL conference series over a time span of more than one decade (2003-2014) with the aim of creating a terminological map of relevant words. "... corpora used to extract terminological units can be further investigated to find semantic and conceptual information on terms or to represent conceptual relationships between terms. (Bourigault D. et al., 2001). Another interesting inquiry is the terminology used in the GL conferences for describing the types of documents (Pej?ová P. et al., 2012). The work is split up in four sections: creation of the corpus by acquiring the digital papers of GL conference proceedings (GL5 - GL16)1; data cleaning; data processing; terminological" "analysis and comparison. The corpus - made up of 231 research papers (for a total amount of 785.042 tokens) - was processed using a Natural Language Processing (NLP) tool for term extraction developed at the Institute of Computational Linguistics "Antonio Zampolli" of CNR (Goggi et al. 2015; 2016). This tool is what is called a "pipeline" (that is, a sequence of different tools) which extracts lexical knowledge from texts: in short, this is a rule system tool for knowledge extraction and document indexing that combines NLP technologies for term extraction and techniques to measure the associative strength of multi-words. This tool extracts a list of single (monograms) and multi-word terms (bigrams and trigrams) ordered by frequency with respect to the context. The pipeline - used as semantic engine within the MAPS project - has been customized for the extraction of terms from our corpus. This survey on the results of the information extraction process performed by the described NLP tool has been a sort of linguistic path in the past and present of terminology used in GL proceedings. By means of samplings, it has been possible to obtain the terminological flow in GL domain and to determine if and how the lexicon was evolving over these twelve years and investigate on its dynamic nature. -
dc.description.affiliations CNR-ILC, Pisa, Italy; CNR-ILC, Pisa, Italy; CNR-ILC, Pisa, Italy; CNR-ISTI, Pisa, Italy; CNR-ISTI, Pisa, Italy -
dc.description.allpeople Bartolini, R; Pardelli, G; Goggi, S; Giannini, S; Biagioni, S -
dc.description.allpeopleoriginal Bartolini R.; Pardelli G.; Goggi S.; Giannini S.; Biagioni S. -
dc.description.fulltext open en
dc.description.numberofauthors 5 -
dc.identifier.isbn 978-90-77484-29-6 -
dc.identifier.uri https://hdl.handle.net/20.500.14243/320039 -
dc.language.iso eng -
dc.relation.alleditors Dominic Farace, Jerry Frantzen -
dc.relation.conferencedate 28-29 November 2016 -
dc.relation.conferencename GL18 - Eighteenth International Conference on Grey Literature: Leveraging Diversity in Grey Literature -
dc.relation.conferenceplace New York, US -
dc.relation.firstpage 79 -
dc.relation.ispartofbook Leveraging Diversity in Grey Literature -
dc.relation.lastpage 84 -
dc.subject.keywords Grey Literature -
dc.subject.keywords Digital Repositories -
dc.subject.keywords Open Access -
dc.subject.singlekeyword Grey Literature *
dc.subject.singlekeyword Digital Repositories *
dc.subject.singlekeyword Open Access *
dc.title A terminological "journey" in the Grey Literature domain en
dc.type.driver info:eu-repo/semantics/conferenceObject -
dc.type.full 04 Contributo in convegno::04.02 Abstract in Atti di convegno it
dc.type.miur 274 -
dc.type.referee Sì, ma tipo non specificato -
dc.ugov.descaux1 362848 -
iris.mediafilter.data 2025/04/23 04:30:35 *
iris.orcid.lastModifiedDate 2024/04/04 14:01:28 *
iris.orcid.lastModifiedMillisecond 1712232088198 *
iris.sitodocente.maxattempts 1 -
Appare nelle tipologie: 04.02 Abstract in Atti di convegno
File in questo prodotto:
File Dimensione Formato  
prod_362848-doc_181313.pdf

accesso aperto

Descrizione: Presentazione - A terminological "journey" in the Grey Literature domain
Tipologia: Versione Editoriale (PDF)
Dimensione 915.28 kB
Formato Adobe PDF
915.28 kB Adobe PDF Visualizza/Apri
prod_362848-doc_119547.pdf

accesso aperto

Descrizione: A terminological "journey" in the Grey Literature domain
Tipologia: Versione Editoriale (PDF)
Dimensione 1.47 MB
Formato Adobe PDF
1.47 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/320039
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact