Bibliometric Map of Keywords on the topic of Digital Scholarship - Open Dataset

Raffaghelli, Juliana Elisa; E, J; Manganello, F; Persico, Donatella Giovanna; D,

The concept of Digital Scolarship -DS-(Borgman, 2007; Pearce, Weller, Scanlon, & Kinsley, 2012; Weller, 2011) defines new forms of academics' professional practices linked to the changing cultural, social and working context of the digital age. However, the empirical research efforts relating this construct seem to emerge in a rather chaotic conceptual and methodological landscape, where several disciplines are contributing. In line with this problem, in this research work the authors have formulated the following operational hypothesis: as a mixed disciplinary topic of research, the DS is at its very first stages with high dispersion and fragmentation of conceptual bases for both further theoretical elaboration as well as empirical research. Going in the direction of this endeavor, in this research work the authors have carried out a systematic review of literature based on 45 journal articles coming out from 4 relevant scientific information databases. The present dataset, introducing the data used to build a Cross-citation Bibliometric Map, has been used as complementary method integrating an approach of systematic review of the literature. The bibliometric maps are a form of representation of scientific networks, used in the Scientometrics as a mean to "mapping science" or understanding connections between researchers and research. Bibliometric maps are based on three main elements: statistical analysis of written publications (often including text and data mining); methods of visualization (distance-based; graph-based; timeline-based) and digital tools supporting analysis and visualization. The bibliometric maps consist of nodes as well as edges; while the first constituting element may represent publications, journals, researchers or keywords, the second represents forms of relationship between the nodes. Taking into consideration the type of nodes, the focus of analysis and type of emerging map are diversified. The most frequent types of relationship studied by bibliometric maps are: citation relations, co-authorship relations and key-word co-occurrence (Van Eck & Waltman, op.cit, p. 2-4). Within a sample of publications that are normally representative of a specific area or field of research, the first type explore the relationship between publications, the second the connections between a network of researchers, and the third, the distribution of topics. The forms of visualization explore not only a current, static relationship but also highlight eventual groups (clusters) that are "closer" within the relationship, as well as the evolution (if we take into consideration the timeline). The bibliometric maps are not a matter of educational research; however, the availability of new techniques to process scientific information about a specific field of research (i.e., the processes of metadata tracking modelling information across big scientific databases like WOS or SCOPUS; the adoption of data mining to explore big masses of data; and the existing softwares connected to advanced ways of representing/visualizing data) endow researchers to explore the literature in diversified ways in order to confirm or discard assumptions about the trends of research. the map of co-occurrences of keywords is a representation based on the number of joint repetitions of keywords within a "corpus" extracted from all the titles, keywords and abstracts of the articles within the sample. The process of extraction is automatically performed by the main scientific databases WOS and SCOPUS, but the software adopted "VOSViewer" do not allow integrations and these last cannot be done manually due to the complexity of metadata treated. Therefore, the analysis of keywords was applied only to the 34 articles within SCOPUS, that were also present in WOS. The 10 articles that could not be analyzed were indexed only by Google Scholar. On the basis of the extracted corpus, the analysis of co-occurences was carried out through text mining techniques undertaken with the software VOSViewer. Within the key-words bibliometric map, every node is a keyword, while the relationship is expressed by the co-occurrences, i.e., at least one co-occurrency within the corpus (or lines within the corpus, this value can be adjusted) between one term and the other. The node's size is determined by the number of co-occurrences of a certain term with regard to the whole terms within the corpus under analysis. Every term is further aggregated considering the type of co-occurrences and hence the closeness between terms, within "clusters". In line with this approach, the software VOSViewer extracts all the "noun-phrases" from the corpus (title, keywords, abstracts of the 34 articles as exported from SCOPUS); therefore, the terms are organized by topics automatically generated by the software. In this case, from the original corpus, the software extracted 1001 relevant terms. The number of co-occurrences, normally 10 for big samples, was set in 3 (a relationship between terms does exist if there are at least 3 co-occurrences within the corpus) given the small size of the sample (5093 words); the method for counting was "normalized" as recommended by the authors to relativize the weight of terms. A total number of 81 nodes emerged; however, only the 60% of these terms are considered within the software for representation taking into account the effective relationships (weight measured in number of co-occurrences); 4 terms containing names (of authors or institutions) were manually deleted; the final visualization used in the research report was based hence in 45 of these terms. Please consider the Full report to contextualize this dataset: https://www.researchgate.net/publication/288994707