CNR Institutional Research Information System

In this paper, we propose a comprehensive linguistic study aimed at assessing the implicit behavior of one of the most prominent Neural Language Models (NLM) based on Transformer architectures, BERT (Devlin et al., 2019), when dealing with a particular source of noisy data, namely essays written by L1 Italian learners containing a variety of errors targeting grammar, orthography and lexicon. Differently from previous works, we focus on the pre-training stage and we devise two complementary evaluation tasks aimed at assessing the impact of errors on sentence-level inner representations in terms of semantic robustness and linguistic sensitivity. While the first evaluation perspective is meant to probe the model's ability to encode the semantic similarity between sentences also in the presence of errors, the second type of probing task evaluates the influence of errors on BERT's implicit knowledge of a set of raw and morpho-syntactic properties of a sentence. Our experiments show that BERT's ability to compute sentence similarity and to correctly encode multi-leveled linguistic information of a sentence are differently modulated by the category of errors and that the error hierarchies in terms of robustness and sensitivity change across layer-wise representations.

On Robustness and Sensitivity of a Neural Language Model: A Case Study on Italian L1 Learner Errors

Miaschi;Alessio;Brunato;Dominique;Dell'Orletta;Felice;Venturi;Giulia

2022

Abstract

In this paper, we propose a comprehensive linguistic study aimed at assessing the implicit behavior of one of the most prominent Neural Language Models (NLM) based on Transformer architectures, BERT (Devlin et al., 2019), when dealing with a particular source of noisy data, namely essays written by L1 Italian learners containing a variety of errors targeting grammar, orthography and lexicon. Differently from previous works, we focus on the pre-training stage and we devise two complementary evaluation tasks aimed at assessing the impact of errors on sentence-level inner representations in terms of semantic robustness and linguistic sensitivity. While the first evaluation perspective is meant to probe the model's ability to encode the semantic similarity between sentences also in the presence of errors, the second type of probing task evaluates the influence of errors on BERT's implicit knowledge of a set of raw and morpho-syntactic properties of a sentence. Our experiments show that BERT's ability to compute sentence similarity and to correctly encode multi-leveled linguistic information of a sentence are differently modulated by the category of errors and that the error hierarchies in terms of robustness and sensitivity change across layer-wise representations.

Scheda breve

Scheda completa

Scheda completa (DC)

Campo DC	Valore	Lingua
dc.authority.ancejournal	IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING	en
dc.authority.orgunit	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	en
dc.authority.people	Miaschi	en
dc.authority.people	Alessio	en
dc.authority.people	Brunato	en
dc.authority.people	Dominique	en
dc.authority.people	Dell'Orletta	en
dc.authority.people	Felice	en
dc.authority.people	Venturi	en
dc.authority.people	Giulia	en
dc.collection.id.s	b3f88f24-048a-4e43-8ab1-6697b90e068e	*
dc.collection.name	01.01 Articolo in rivista	*
dc.contributor.appartenenza	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	*
dc.contributor.appartenenza.mi	918	*
dc.contributor.area	Non assegn	*
dc.contributor.area	Non assegn	*
dc.contributor.area	Non assegn	*
dc.contributor.area	Non assegn	*
dc.date.accessioned	2024/02/21 08:58:23	-
dc.date.available	2024/02/21 08:58:23	-
dc.date.firstsubmission	2025/01/24 16:14:33	*
dc.date.issued	2022	-
dc.date.submission	2025/01/24 16:16:47	*
dc.description.abstracteng	In this paper, we propose a comprehensive linguistic study aimed at assessing the implicit behavior of one of the most prominent Neural Language Models (NLM) based on Transformer architectures, BERT (Devlin et al., 2019), when dealing with a particular source of noisy data, namely essays written by L1 Italian learners containing a variety of errors targeting grammar, orthography and lexicon. Differently from previous works, we focus on the pre-training stage and we devise two complementary evaluation tasks aimed at assessing the impact of errors on sentence-level inner representations in terms of semantic robustness and linguistic sensitivity. While the first evaluation perspective is meant to probe the model's ability to encode the semantic similarity between sentences also in the presence of errors, the second type of probing task evaluates the influence of errors on BERT's implicit knowledge of a set of raw and morpho-syntactic properties of a sentence. Our experiments show that BERT's ability to compute sentence similarity and to correctly encode multi-leveled linguistic information of a sentence are differently modulated by the category of errors and that the error hierarchies in terms of robustness and sensitivity change across layer-wise representations.	-
dc.description.affiliations	Istituto di Linguistica Computazionale "A. Zampolli" (ILC-CNR), ItaliaNLP Lab, Pisa	-
dc.description.allpeople	Miaschi, Alessio; Alessio, ; Brunato, DOMINIQUE PIERINA; Dominique, ; Dell'Orletta, Felice; Felice, ; Venturi, Giulia; Giulia,	-
dc.description.allpeopleoriginal	Miaschi, Alessio and Brunato, Dominique and Dell'Orletta, Felice and Venturi, Giulia	en
dc.description.fulltext	open	en
dc.description.numberofauthors	8	-
dc.identifier.doi	10.1109/TASLP.2022.3226333	en
dc.identifier.isi	WOS:000896638000001	-
dc.identifier.scopus	2-s2.0-85144049924	-
dc.identifier.uri	https://hdl.handle.net/20.500.14243/417257	-
dc.identifier.url	https://doi.org/10.1109/TASLP.2022.3226333	en
dc.language.iso	eng	en
dc.miur.last.status.update	2024-12-20T12:25:06Z	*
dc.relation.firstpage	426	en
dc.relation.lastpage	438	en
dc.relation.numberofpages	13	en
dc.relation.volume	31	en
dc.subject.keywordseng	Natural Language Processing	-
dc.subject.keywordseng	Neural Language Model	-
dc.subject.keywordseng	Interpretability	-
dc.subject.singlekeyword	Natural Language Processing	*
dc.subject.singlekeyword	Neural Language Model	*
dc.subject.singlekeyword	Interpretability	*
dc.title	On Robustness and Sensitivity of a Neural Language Model: A Case Study on Italian L1 Learner Errors	en
dc.type.driver	info:eu-repo/semantics/article	-
dc.type.full	01 Contributo su Rivista::01.01 Articolo in rivista	it
dc.type.miur	262	-
dc.ugov.descaux1	475015	-
iris.isi.extIssued	2023	-
iris.isi.extTitle	On Robustness and Sensitivity of a Neural Language Model: A Case Study on Italian L1 Learner Errors	-
iris.mediafilter.data	2025/04/04 04:31:43	*
iris.orcid.lastModifiedDate	2025/05/07 01:10:11	*
iris.orcid.lastModifiedMillisecond	1746573011575	*
iris.scopus.extIssued	2023	-
iris.scopus.extTitle	On Robustness and Sensitivity of a Neural Language Model: A Case Study on Italian L1 Learner Errors	-
iris.sitodocente.maxattempts	1	-
iris.unpaywall.bestoahost	repository	*
iris.unpaywall.bestoaversion	publishedVersion	*
iris.unpaywall.doi	10.1109/taslp.2022.3226333	*
iris.unpaywall.hosttype	repository	*
iris.unpaywall.isoa	true	*
iris.unpaywall.journalisindoaj	false	*
iris.unpaywall.landingpage	https://zenodo.org/record/8092059	*
iris.unpaywall.license	cc-by	*
iris.unpaywall.metadataCallLastModified	10/05/2025 04:56:02	-
iris.unpaywall.metadataCallLastModifiedMillisecond	1746845762302	-
iris.unpaywall.oastatus	green	*
iris.unpaywall.pdfurl	https://zenodo.org/records/8092059/files/Information_Journal_Probing-pre-print.pdf	*
isi.authority.ancejournal	IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING###2329-9290	*
isi.category	AA	*
isi.category	IQ	*
isi.contributor.affiliation	Inst Computat Linguist A Zampolli ILC CNR	-
isi.contributor.affiliation	Inst Computat Linguist A Zampolli ILC CNR	-
isi.contributor.affiliation	Inst Computat Linguist A Zampolli ILC CNR	-
isi.contributor.affiliation	Inst Computat Linguist A Zampolli ILC CNR	-
isi.contributor.country	Italy	-
isi.contributor.country	Italy	-
isi.contributor.country	Italy	-
isi.contributor.country	Italy	-
isi.contributor.name	Alessio	-
isi.contributor.name	Dominique	-
isi.contributor.name	Felice	-
isi.contributor.name	Giulia	-
isi.contributor.researcherId	GCD-5321-2022	-
isi.contributor.researcherId	MCK-5206-2025	-
isi.contributor.researcherId	AAX-1864-2020	-
isi.contributor.researcherId	AAY-3932-2020	-
isi.contributor.subaffiliation	ItaliaNLP Lab	-
isi.contributor.subaffiliation	ItaliaNLP Lab	-
isi.contributor.subaffiliation	ItaliaNLP Lab	-
isi.contributor.subaffiliation	ItaliaNLP Lab	-
isi.contributor.surname	Miaschi	-
isi.contributor.surname	Brunato	-
isi.contributor.surname	Dell'Orletta	-
isi.contributor.surname	Venturi	-
isi.date.issued	2023	*
isi.description.abstracteng	In this paper, we propose a comprehensive linguistic study aimed at assessing the implicit behavior of one of the most prominent Neural Language Models (NLM) based on Transformer architectures, BERT Devlin et al., when dealing with a particular source of noisy data, namely essays written by L1 Italian learners containing a variety of errors targeting grammar, orthography and lexicon. Differently from previous works, we focus on the pre-training stage and we devise two complementary evaluation tasks aimed at assessing the impact of errors on sentence-level inner representations in terms of semantic robustness and linguistic sensitivity. While the first evaluation perspective is meant to probe the model's ability to encode the semantic similarity between sentences also in the presence of errors, the second type of probing task evaluates the influence of errors on BERT's implicit knowledge of a set of raw and morpho-syntactic properties of a sentence. Our experiments show that BERT's ability to compute sentence similarity and to correctly encode multi-leveled linguistic information of a sentence are differently modulated by the category of errors and that the error hierarchies in terms of robustness and sensitivity change across layer-wise representations.	*
isi.description.allpeopleoriginal	Miaschi, A; Brunato, D; Dell'Orletta, F; Venturi, G;	*
isi.document.sourcetype	WOS.SCI	*
isi.document.type	Article	*
isi.document.types	Article	*
isi.identifier.doi	10.1109/TASLP.2022.3226333	*
isi.identifier.eissn	2329-9304	*
isi.identifier.isi	WOS:000896638000001	*
isi.journal.journaltitle	IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING	*
isi.journal.journaltitleabbrev	IEEE-ACM T AUDIO SPE	*
isi.language.original	English	*
isi.publisher.place	445 HOES LANE, PISCATAWAY, NJ 08855-4141 USA	*
isi.relation.firstpage	426	*
isi.relation.lastpage	438	*
isi.relation.volume	31	*
isi.title	On Robustness and Sensitivity of a Neural Language Model: A Case Study on Italian L1 Learner Errors	*
scopus.authority.ancejournal	IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING###2329-9290	*
scopus.category	1701	*
scopus.category	3102	*
scopus.category	2605	*
scopus.category	2208	*
scopus.contributor.affiliation	ItaliaNLP Lab	-
scopus.contributor.affiliation	ItaliaNLP Lab	-
scopus.contributor.affiliation	ItaliaNLP Lab	-
scopus.contributor.affiliation	ItaliaNLP Lab	-
scopus.contributor.afid	60021199	-
scopus.contributor.afid	60021199	-
scopus.contributor.afid	60021199	-
scopus.contributor.afid	60021199	-
scopus.contributor.auid	57211678681	-
scopus.contributor.auid	55237740200	-
scopus.contributor.auid	57540567000	-
scopus.contributor.auid	27568199800	-
scopus.contributor.country	Italy	-
scopus.contributor.country	Italy	-
scopus.contributor.country	Italy	-
scopus.contributor.country	Italy	-
scopus.contributor.dptid	121833164	-
scopus.contributor.dptid	121833164	-
scopus.contributor.dptid	121833164	-
scopus.contributor.dptid	121833164	-
scopus.contributor.name	Alessio	-
scopus.contributor.name	Dominique	-
scopus.contributor.name	Felice	-
scopus.contributor.name	Giulia	-
scopus.contributor.subaffiliation	Institute for Computational Linguistics 'A. Zampolli' (ILC-CNR);	-
scopus.contributor.subaffiliation	Institute for Computational Linguistics 'A. Zampolli' (ILC-CNR);	-
scopus.contributor.subaffiliation	Institute for Computational Linguistics 'A. Zampolli' (ILC-CNR);	-
scopus.contributor.subaffiliation	Institute for Computational Linguistics 'A. Zampolli' (ILC-CNR);	-
scopus.contributor.surname	Miaschi	-
scopus.contributor.surname	Brunato	-
scopus.contributor.surname	Dell'orletta	-
scopus.contributor.surname	Venturi	-
scopus.date.issued	2023	*
scopus.description.abstracteng	In this paper, we propose a comprehensive linguistic study aimed at assessing the implicit behavior of one of the most prominent Neural Language Models (NLM) based on Transformer architectures, BERT Devlin et al., when dealing with a particular source of noisy data, namely essays written by L1 Italian learners containing a variety of errors targeting grammar, orthography and lexicon. Differently from previous works, we focus on the pre-training stage and we devise two complementary evaluation tasks aimed at assessing the impact of errors on sentence-level inner representations in terms of semantic robustness and linguistic sensitivity. While the first evaluation perspective is meant to probe the model's ability to encode the semantic similarity between sentences also in the presence of errors, the second type of probing task evaluates the influence of errors on BERT's implicit knowledge of a set of raw and morpho-syntactic properties of a sentence. Our experiments show that BERT's ability to compute sentence similarity and to correctly encode multi-leveled linguistic information of a sentence are differently modulated by the category of errors and that the error hierarchies in terms of robustness and sensitivity change across layer-wise representations.	*
scopus.description.allpeopleoriginal	Miaschi A.; Brunato D.; Dell'orletta F.; Venturi G.	*
scopus.differences	scopus.subject.keywords	*
scopus.differences	scopus.date.issued	*
scopus.differences	scopus.description.allpeopleoriginal	*
scopus.differences	scopus.description.abstracteng	*
scopus.document.type	ar	*
scopus.document.types	ar	*
scopus.identifier.doi	10.1109/TASLP.2022.3226333	*
scopus.identifier.eissn	2329-9304	*
scopus.identifier.pui	2021785938	*
scopus.identifier.scopus	2-s2.0-85144049924	*
scopus.journal.sourceid	21100368801	*
scopus.language.iso	eng	*
scopus.publisher.name	Institute of Electrical and Electronics Engineers Inc.	*
scopus.relation.firstpage	426	*
scopus.relation.lastpage	438	*
scopus.relation.volume	31	*
scopus.subject.keywords	interpretability; learner errors; NLP; transformers;	*
scopus.title	On Robustness and Sensitivity of a Neural Language Model: A Case Study on Italian L1 Learner Errors	*
scopus.titleeng	On Robustness and Sensitivity of a Neural Language Model: A Case Study on Italian L1 Learner Errors	*
Appare nelle tipologie:	01.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
prod_475015-doc_193995.pdf accesso aperto Descrizione: On_Robustness__and_Sensitivity_of_a_Neural_Language_Model Tipologia: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 1.93 MB Formato Adobe PDF Visualizza/Apri	1.93 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/417257

Citazioni

ND

2

2

social impact