In this study we present a Natural Language Processing (NLP)-based stylometric approach for tracking the evolution of written language competence in Italian L1 learners. The approach relies on a wide set of linguistically motivated features capturing stylistic aspects of a text, which were extracted from students' essays contained in CItA (Corpus Italiano di Apprendenti L1), the first longitudinal corpus of texts written by Italian L1 learners enrolled in the first and second year of lower secondary school. We address the problem of modeling written language development as a supervised classification task consisting in predicting the chronological order of essays written by the same student at different temporal spans. The promising results obtained in several classification scenarios allow us to conclude that it is possible to automatically model the highly relevant changes affecting written language evolution across time, as well as identifying which features are more predictive of this process. In the last part of the article, we focus the attention on the possible influence of background variables on language learning and we present preliminary results of a pilot study aiming at understanding how the observed developmental patterns are affected by information related to the school environment of the student.

A NLP-based stylometric approach for tracking the evolution of L1 written language competence

Miaschi;Alessio;Brunato;Dominique;Dell'Orletta;Felice
2021

Abstract

In this study we present a Natural Language Processing (NLP)-based stylometric approach for tracking the evolution of written language competence in Italian L1 learners. The approach relies on a wide set of linguistically motivated features capturing stylistic aspects of a text, which were extracted from students' essays contained in CItA (Corpus Italiano di Apprendenti L1), the first longitudinal corpus of texts written by Italian L1 learners enrolled in the first and second year of lower secondary school. We address the problem of modeling written language development as a supervised classification task consisting in predicting the chronological order of essays written by the same student at different temporal spans. The promising results obtained in several classification scenarios allow us to conclude that it is possible to automatically model the highly relevant changes affecting written language evolution across time, as well as identifying which features are more predictive of this process. In the last part of the article, we focus the attention on the possible influence of background variables on language learning and we present preliminary results of a pilot study aiming at understanding how the observed developmental patterns are affected by information related to the school environment of the student.
Campo DC Valore Lingua
dc.authority.ancejournal JOURNAL OF WRITING RESEARCH en
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC en
dc.authority.people Miaschi en
dc.authority.people Alessio en
dc.authority.people Brunato en
dc.authority.people Dominique en
dc.authority.people Dell'Orletta en
dc.authority.people Felice en
dc.collection.id.s b3f88f24-048a-4e43-8ab1-6697b90e068e *
dc.collection.name 01.01 Articolo in rivista *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.date.accessioned 2024/02/20 17:36:36 -
dc.date.available 2024/02/20 17:36:36 -
dc.date.firstsubmission 2025/01/24 11:02:41 *
dc.date.issued 2021 -
dc.date.submission 2025/01/29 09:52:41 *
dc.description.abstracteng In this study we present a Natural Language Processing (NLP)-based stylometric approach for tracking the evolution of written language competence in Italian L1 learners. The approach relies on a wide set of linguistically motivated features capturing stylistic aspects of a text, which were extracted from students' essays contained in CItA (Corpus Italiano di Apprendenti L1), the first longitudinal corpus of texts written by Italian L1 learners enrolled in the first and second year of lower secondary school. We address the problem of modeling written language development as a supervised classification task consisting in predicting the chronological order of essays written by the same student at different temporal spans. The promising results obtained in several classification scenarios allow us to conclude that it is possible to automatically model the highly relevant changes affecting written language evolution across time, as well as identifying which features are more predictive of this process. In the last part of the article, we focus the attention on the possible influence of background variables on language learning and we present preliminary results of a pilot study aiming at understanding how the observed developmental patterns are affected by information related to the school environment of the student. -
dc.description.affiliations Università di Pisa; Istituto di Linguistica Computazionale (ILC-CNR) -
dc.description.allpeople Miaschi, Alessio; Miaschi, Alessio; Brunato, DOMINIQUE PIERINA; Brunato, DOMINIQUE PIERINA; Dell'Orletta, Felice; Dell'Orletta, Felice -
dc.description.allpeopleoriginal Miaschi, Alessio and Brunato, Dominique and Dell'Orletta, Felice en
dc.description.fulltext open en
dc.description.note Query date: 2021-06-09 en
dc.description.numberofauthors 6 -
dc.identifier.doi 10.17239/jowr-2021.13.01.03 en
dc.identifier.isi WOS:000659987400003 -
dc.identifier.scopus 2-s2.0-85108566169 en
dc.identifier.uri https://hdl.handle.net/20.500.14243/402654 -
dc.identifier.url https://www.jowr.org/abstracts/vol13_1/Miaschi_et_al_2021_13_1_abstract.html en
dc.language.iso eng en
dc.miur.last.status.update 2025-01-24T14:40:15Z *
dc.relation.firstpage 71 en
dc.relation.lastpage 105 en
dc.relation.medium ELETTRONICO en
dc.relation.numberofpages 35 en
dc.relation.volume vol. 13 en
dc.subject.keywordseng stylometry -
dc.subject.keywordseng computational linguistics -
dc.subject.keywordseng language competence -
dc.subject.singlekeyword stylometry *
dc.subject.singlekeyword computational linguistics *
dc.subject.singlekeyword language competence *
dc.title A NLP-based stylometric approach for tracking the evolution of L1 written language competence en
dc.type.circulation Internazionale en
dc.type.driver info:eu-repo/semantics/article -
dc.type.full 01 Contributo su Rivista::01.01 Articolo in rivista it
dc.type.impactfactor si en
dc.type.miur 262 -
dc.ugov.descaux1 454570 -
iris.isi.extIssued 2021 -
iris.isi.extTitle A NLP-based stylometric approach for tracking the evolution of L1 written language competence -
iris.mediafilter.data 2025/04/06 03:20:42 *
iris.orcid.lastModifiedDate 2025/02/25 07:11:25 *
iris.orcid.lastModifiedMillisecond 1740463885780 *
iris.scopus.extIssued 2021 -
iris.scopus.extTitle A NLP-based stylometric approach for tracking the evolution of L1 written language competence -
iris.sitodocente.maxattempts 1 -
iris.unpaywall.bestoahost publisher *
iris.unpaywall.bestoaversion publishedVersion *
iris.unpaywall.doi 10.17239/jowr-2021.13.01.03 *
iris.unpaywall.hosttype publisher *
iris.unpaywall.isoa true *
iris.unpaywall.journalisindoaj true *
iris.unpaywall.landingpage https://doi.org/10.17239/jowr-2021.13.01.03 *
iris.unpaywall.license cc-by-nc-nd *
iris.unpaywall.metadataCallLastModified 26/04/2026 07:32:12 -
iris.unpaywall.metadataCallLastModifiedMillisecond 1777181532735 -
iris.unpaywall.oastatus gold *
isi.authority.ancejournal JOURNAL OF WRITING RESEARCH###2030-1006 *
isi.authority.sdg Goal 4: Quality education###12084 *
isi.category HA *
isi.contributor.affiliation University of Pisa -
isi.contributor.affiliation Ist Linguist Computaz A Zampolli ILC CNR -
isi.contributor.affiliation Ist Linguist Computaz A Zampolli ILC CNR -
isi.contributor.country Italy -
isi.contributor.country Italy -
isi.contributor.country Italy -
isi.contributor.name Alessio -
isi.contributor.name Dominique -
isi.contributor.name Felice -
isi.contributor.researcherId GCD-5321-2022 -
isi.contributor.researcherId MCK-5206-2025 -
isi.contributor.researcherId AAX-1864-2020 -
isi.contributor.subaffiliation Dipartimento Informat -
isi.contributor.subaffiliation ItaliaNLP Lab -
isi.contributor.subaffiliation ItaliaNLP Lab -
isi.contributor.surname Miaschi -
isi.contributor.surname Brunato -
isi.contributor.surname Dell'Orletta -
isi.date.issued 2021 *
isi.description.abstracteng In this study we present a Natural Language Processing (NLP)-based stylometric approach for tracking the evolution of written language competence in Italian L1 learners. The approach relies on a wide set of linguistically motivated features capturing stylistic aspects of a text, which were extracted from students' essays contained in CItA (Corpus Italiano di Apprendenti L1), the first longitudinal corpus of texts written by Italian L1 learners enrolled in the first and second year of lower secondary school. We address the problem of modeling written language development as a supervised classification task consisting in predicting the chronological order of essays written by the same student at different temporal spans. The promising results obtained in several classification scenarios allow us to conclude that it is possible to automatically model the highly relevant changes affecting written language evolution across time, as well as identifying which features are more predictive of this process. In the last part of the article, we focus the attention on the possible influence of background variables on language learning and we present preliminary results of a pilot study aiming at understanding how the observed developmental patterns are affected by information related to the school environment of the student. *
isi.description.allpeopleoriginal Miaschi, A; Brunato, D; Dell'Orletta, F; *
isi.document.sourcetype WOS.ESCI *
isi.document.type Article *
isi.document.types Article *
isi.identifier.doi 10.17239/jowr-2021.13.01.03 *
isi.identifier.eissn 2294-3307 *
isi.identifier.isi WOS:000659987400003 *
isi.journal.journaltitle JOURNAL OF WRITING RESEARCH *
isi.journal.journaltitleabbrev J WRIT RES *
isi.language.original English *
isi.publisher.place CAMPUS GROENENBORGER, 171 GROENENBORGERLAAN, ANTWERP, 2020, BELGIUM *
isi.relation.firstpage 71 *
isi.relation.issue 1 *
isi.relation.lastpage 105 *
isi.relation.volume 13 *
isi.title A NLP-based stylometric approach for tracking the evolution of L1 written language competence *
scopus.authority.ancejournal JOURNAL OF WRITING RESEARCH###2030-1006 *
scopus.category 1203 *
scopus.category 3304 *
scopus.category 3310 *
scopus.category 1208 *
scopus.contributor.affiliation ItaliaNLP Lab -
scopus.contributor.affiliation Universita di Pisa -
scopus.contributor.affiliation Universita di Pisa -
scopus.contributor.afid 60008941 -
scopus.contributor.afid 60028868 -
scopus.contributor.afid 60028868 -
scopus.contributor.auid 57211678681 -
scopus.contributor.auid 55237740200 -
scopus.contributor.auid 57540567000 -
scopus.contributor.country Italy -
scopus.contributor.country -
scopus.contributor.country -
scopus.contributor.dptid 114087935 -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.name Alessio -
scopus.contributor.name Dominique -
scopus.contributor.name Felice -
scopus.contributor.subaffiliation Istituto di Linguistica Computazionale “A. Zampolli” (ILC-CNR); -
scopus.contributor.subaffiliation Dipartimento di Informatica; -
scopus.contributor.subaffiliation Dipartimento di Informatica; -
scopus.contributor.surname Miaschi -
scopus.contributor.surname Brunato -
scopus.contributor.surname Dell'Orletta -
scopus.date.issued 2021 *
scopus.description.abstracteng In this study we present a Natural Language Processing (NLP)-based stylometric approach for tracking the evolution of written language competence in Italian L1 learners. The approach relies on a wide set of linguistically motivated features capturing stylistic aspects of a text, which were extracted from students' essays contained in CItA (Corpus Italiano di Apprendenti L1), the first longitudinal corpus of texts written by Italian L1 learners enrolled in the first and second year of lower secondary school. We address the problem of modeling written language development as a supervised classification task consisting in predicting the chronological order of essays written by the same student at different temporal spans. The promising results obtained in several classification scenarios allow us to conclude that it is possible to automatically model the highly relevant changes affecting written language evolution across time, as well as identifying which features are more predictive of this process. In the last part of the article, we focus the attention on the possible influence of background variables on language learning and we present preliminary results of a pilot study aiming at understanding how the observed developmental patterns are affected by information related to the school environment of the student. *
scopus.description.allpeopleoriginal Miaschi A.; Brunato D.; Dell'Orletta F. *
scopus.differences scopus.subject.keywords *
scopus.differences scopus.description.allpeopleoriginal *
scopus.differences scopus.relation.issue *
scopus.differences scopus.relation.volume *
scopus.document.type ar *
scopus.document.types ar *
scopus.identifier.doi 10.17239/JOWR-2021.13.01.03 *
scopus.identifier.eissn 2294-3307 *
scopus.identifier.pui 2012939152 *
scopus.identifier.scopus 2-s2.0-85108566169 *
scopus.journal.sourceid 21100217021 *
scopus.language.iso eng *
scopus.publisher.name University of Antwerp *
scopus.relation.firstpage 71 *
scopus.relation.issue 1 *
scopus.relation.lastpage 105 *
scopus.relation.volume 13 *
scopus.subject.keywords Diachronic Evolution of Written Language Competence; Italian Learner Corpus; Learners' errors; Machine Learning; Natural Language Processing; Stylometry; *
scopus.title A NLP-based stylometric approach for tracking the evolution of L1 written language competence *
scopus.titleeng A NLP-based stylometric approach for tracking the evolution of L1 written language competence *
Appare nelle tipologie: 01.01 Articolo in rivista
File in questo prodotto:
File Dimensione Formato  
JoWR_2021_vol13_nr1_Miaschi_et_al (7).pdf

accesso aperto

Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 364.16 kB
Formato Adobe PDF
364.16 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/402654
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 11
  • ???jsp.display-item.citation.isi??? 5
social impact