In this paper, we present an overview of existing parallel corpora for Automatic Text Simplification (ATS) in different languages focusing on the approach adopted for their construction. We make the main distinction between manual and (semi)-automatic approaches in order to investigate in which respect complex and simple texts vary and whether and how the observed modifications may depend on the underlying approach. To this end, we perform a two-level comparison on Italian corpora, since this is the only language, with the exception of English, for which there are large parallel resources derived through the two approaches considered. The first level of comparison accounts for the main types of sentence transformations occurring in the simplification process, the second one examines the results of a linguistic profiling analysis based on Natural Language Processing techniques and carried out on the original and the simple version of the same texts. For both levels of analysis, we chose to focus our discussion mostly on sentence transformations and linguistic characteristics that pertain to the morpho-syntactic and syntactic structure of the sentence.

Linguistically-Based Comparison of Different Approaches to Building Corpora for Text Simplification: A Case Study on Italian

Dominique Brunato;Felice Dell'Orletta;Giulia Venturi
2022

Abstract

In this paper, we present an overview of existing parallel corpora for Automatic Text Simplification (ATS) in different languages focusing on the approach adopted for their construction. We make the main distinction between manual and (semi)-automatic approaches in order to investigate in which respect complex and simple texts vary and whether and how the observed modifications may depend on the underlying approach. To this end, we perform a two-level comparison on Italian corpora, since this is the only language, with the exception of English, for which there are large parallel resources derived through the two approaches considered. The first level of comparison accounts for the main types of sentence transformations occurring in the simplification process, the second one examines the results of a linguistic profiling analysis based on Natural Language Processing techniques and carried out on the original and the simple version of the same texts. For both levels of analysis, we chose to focus our discussion mostly on sentence transformations and linguistic characteristics that pertain to the morpho-syntactic and syntactic structure of the sentence.
Campo DC Valore Lingua
dc.authority.ancejournal FRONTIERS IN PSYCHOLOGY en
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC en
dc.authority.people Dominique Brunato en
dc.authority.people Felice Dell'Orletta en
dc.authority.people Giulia Venturi en
dc.collection.id.s b3f88f24-048a-4e43-8ab1-6697b90e068e *
dc.collection.name 01.01 Articolo in rivista *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.contributor.area Non assegn *
dc.contributor.area Non assegn *
dc.contributor.area Non assegn *
dc.date.accessioned 2024/02/20 13:08:27 -
dc.date.available 2024/02/20 13:08:27 -
dc.date.firstsubmission 2025/01/24 11:02:17 *
dc.date.issued 2022 -
dc.date.submission 2025/01/24 12:35:14 *
dc.description.abstracteng In this paper, we present an overview of existing parallel corpora for Automatic Text Simplification (ATS) in different languages focusing on the approach adopted for their construction. We make the main distinction between manual and (semi)-automatic approaches in order to investigate in which respect complex and simple texts vary and whether and how the observed modifications may depend on the underlying approach. To this end, we perform a two-level comparison on Italian corpora, since this is the only language, with the exception of English, for which there are large parallel resources derived through the two approaches considered. The first level of comparison accounts for the main types of sentence transformations occurring in the simplification process, the second one examines the results of a linguistic profiling analysis based on Natural Language Processing techniques and carried out on the original and the simple version of the same texts. For both levels of analysis, we chose to focus our discussion mostly on sentence transformations and linguistic characteristics that pertain to the morpho-syntactic and syntactic structure of the sentence. -
dc.description.affiliations Istituto di Linguistica Computazionale "A. Zampolli" (ILC-CNR) -
dc.description.allpeople Brunato, Dominique; Dell'Orletta, Felice; Venturi, Giulia -
dc.description.allpeopleoriginal Dominique Brunato, Felice Dell'Orletta, Giulia Venturi en
dc.description.fulltext open en
dc.description.numberofauthors 3 -
dc.identifier.doi 10.3389/fpsyg.2022.707630 en
dc.identifier.isi WOS:000780469600001 en
dc.identifier.scopus 2-s2.0-85127314427 -
dc.identifier.uri https://hdl.handle.net/20.500.14243/440157 -
dc.identifier.url https://www.frontiersin.org/articles/10.3389/fpsyg.2022.707630/full en
dc.language.iso eng en
dc.miur.last.status.update 2024-07-11T11:03:02Z *
dc.relation.firstpage 1 en
dc.relation.lastpage 19 en
dc.relation.numberofpages 19 en
dc.relation.volume 13 en
dc.subject.keywords linguistic complexity -
dc.subject.keywords corpus construction -
dc.subject.keywords text simplification -
dc.subject.singlekeyword linguistic complexity *
dc.subject.singlekeyword corpus construction *
dc.subject.singlekeyword text simplification *
dc.title Linguistically-Based Comparison of Different Approaches to Building Corpora for Text Simplification: A Case Study on Italian en
dc.type.driver info:eu-repo/semantics/article -
dc.type.full 01 Contributo su Rivista::01.01 Articolo in rivista it
dc.type.miur 262 -
dc.type.referee Sì, ma tipo non specificato en
dc.ugov.descaux1 464954 -
dc.ugov.descaux2 Open Access -
iris.isi.extIssued 2022 -
iris.isi.extTitle Linguistically-Based Comparison of Different Approaches to Building Corpora for Text Simplification: A Case Study on Italian -
iris.isi.metadataErrorDescription 0 -
iris.isi.metadataErrorType ERROR_NO_MATCH -
iris.isi.metadataStatus ERROR -
iris.mediafilter.data 2025/04/05 13:32:34 *
iris.orcid.lastModifiedDate 2025/02/05 10:14:13 *
iris.orcid.lastModifiedMillisecond 1738746853268 *
iris.scopus.extIssued 2022 -
iris.scopus.extTitle Linguistically-Based Comparison of Different Approaches to Building Corpora for Text Simplification: A Case Study on Italian -
iris.sitodocente.maxattempts 1 -
iris.unpaywall.bestoahost publisher *
iris.unpaywall.bestoaversion publishedVersion *
iris.unpaywall.doi 10.3389/fpsyg.2022.707630 *
iris.unpaywall.hosttype publisher *
iris.unpaywall.isoa true *
iris.unpaywall.journalisindoaj true *
iris.unpaywall.landingpage https://doi.org/10.3389/fpsyg.2022.707630 *
iris.unpaywall.license cc-by *
iris.unpaywall.metadataCallLastModified 26/04/2025 05:29:08 -
iris.unpaywall.metadataCallLastModifiedMillisecond 1745638148297 -
iris.unpaywall.oastatus gold *
iris.unpaywall.pdfurl https://www.frontiersin.org/articles/10.3389/fpsyg.2022.707630/pdf *
isi.category VJ *
isi.contributor.affiliation ItaliaNLP Lab -
isi.contributor.affiliation ItaliaNLP Lab -
isi.contributor.affiliation ItaliaNLP Lab -
isi.contributor.country Italy -
isi.contributor.country Italy -
isi.contributor.country Italy -
isi.contributor.name Dominique -
isi.contributor.name Felice -
isi.contributor.name Giulia -
isi.contributor.researcherId FZE-1113-2022 -
isi.contributor.researcherId AAX-1864-2020 -
isi.contributor.researcherId AAY-3932-2020 -
isi.contributor.subaffiliation Inst Computat Linguist A Zampolli ILC CNR -
isi.contributor.subaffiliation Inst Computat Linguist A Zampolli ILC CNR -
isi.contributor.subaffiliation Inst Computat Linguist A Zampolli ILC CNR -
isi.contributor.surname Brunato -
isi.contributor.surname Dell'Orletta -
isi.contributor.surname Venturi -
isi.date.issued 2022 *
isi.description.allpeopleoriginal Brunato, D; Dell'Orletta, F; Venturi, G; *
isi.document.type Article *
isi.identifier.doi 10.3389/fpsyg.2022.707630 *
isi.identifier.isi WOS:000780469600001 *
isi.journal.journaltitle FRONTIERS IN PSYCHOLOGY *
isi.journal.journaltitleabbrev FRONT PSYCHOL *
isi.language.original English *
isi.publisher.place AVENUE DU TRIBUNAL FEDERAL 34, LAUSANNE, CH-1015, SWITZERLAND *
scopus.authority.ancejournal FRONTIERS IN PSYCHOLOGY###1664-1078 *
scopus.category 3200 *
scopus.contributor.affiliation Institute for Computational Linguistics “A. Zampolli” (ILC-CNR) -
scopus.contributor.affiliation Institute for Computational Linguistics “A. Zampolli” (ILC-CNR) -
scopus.contributor.affiliation Institute for Computational Linguistics “A. Zampolli” (ILC-CNR) -
scopus.contributor.afid 60021199 -
scopus.contributor.afid 60021199 -
scopus.contributor.afid 60021199 -
scopus.contributor.auid 55237740200 -
scopus.contributor.auid 57540567000 -
scopus.contributor.auid 27568199800 -
scopus.contributor.country Italy -
scopus.contributor.country Italy -
scopus.contributor.country Italy -
scopus.contributor.dptid 121833164 -
scopus.contributor.dptid 121833164 -
scopus.contributor.dptid 121833164 -
scopus.contributor.name Dominique -
scopus.contributor.name Felice -
scopus.contributor.name Giulia -
scopus.contributor.subaffiliation ItaliaNLP Lab; -
scopus.contributor.subaffiliation ItaliaNLP Lab; -
scopus.contributor.subaffiliation ItaliaNLP Lab; -
scopus.contributor.surname Brunato -
scopus.contributor.surname Dell'Orletta -
scopus.contributor.surname Venturi -
scopus.date.issued 2022 *
scopus.description.abstracteng In this paper, we present an overview of existing parallel corpora for Automatic Text Simplification (ATS) in different languages focusing on the approach adopted for their construction. We make the main distinction between manual and (semi)–automatic approaches in order to investigate in which respect complex and simple texts vary and whether and how the observed modifications may depend on the underlying approach. To this end, we perform a two-level comparison on Italian corpora, since this is the only language, with the exception of English, for which there are large parallel resources derived through the two approaches considered. The first level of comparison accounts for the main types of sentence transformations occurring in the simplification process, the second one examines the results of a linguistic profiling analysis based on Natural Language Processing techniques and carried out on the original and the simple version of the same texts. For both levels of analysis, we chose to focus our discussion mostly on sentence transformations and linguistic characteristics that pertain to the morpho-syntactic and syntactic structure of the sentence. *
scopus.description.allpeopleoriginal Brunato D.; Dell'Orletta F.; Venturi G. *
scopus.differences scopus.subject.keywords *
scopus.differences scopus.description.allpeopleoriginal *
scopus.differences scopus.description.abstracteng *
scopus.document.type ar *
scopus.document.types ar *
scopus.funding.funders 501100000780 - European Commission; 100011102 - Seventh Framework Programme; 100011102 - Seventh Framework Programme; *
scopus.funding.ids 257410; *
scopus.identifier.doi 10.3389/fpsyg.2022.707630 *
scopus.identifier.eissn 1664-1078 *
scopus.identifier.pui 2015416102 *
scopus.identifier.scopus 2-s2.0-85127314427 *
scopus.journal.sourceid 21100216571 *
scopus.language.iso eng *
scopus.publisher.name Frontiers Media S.A. *
scopus.relation.article 707630 *
scopus.relation.volume 13 *
scopus.subject.keywords aligned corpora; corpus construction; Italian language; linguistic complexity; text simplification; *
scopus.title Linguistically-Based Comparison of Different Approaches to Building Corpora for Text Simplification: A Case Study on Italian *
scopus.titleeng Linguistically-Based Comparison of Different Approaches to Building Corpora for Text Simplification: A Case Study on Italian *
Appare nelle tipologie: 01.01 Articolo in rivista
File in questo prodotto:
File Dimensione Formato  
fpsyg-13-707630.pdf

accesso aperto

Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 638.27 kB
Formato Adobe PDF
638.27 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/440157
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 14
  • ???jsp.display-item.citation.isi??? 5
social impact