In this paper, we present an overview of existing parallel corpora for Automatic Text Simplification (ATS) in different languages focusing on the approach adopted for their construction. We make the main distinction between manual and (semi)-automatic approaches in order to investigate in which respect complex and simple texts vary and whether and how the observed modifications may depend on the underlying approach. To this end, we perform a two-level comparison on Italian corpora, since this is the only language, with the exception of English, for which there are large parallel resources derived through the two approaches considered. The first level of comparison accounts for the main types of sentence transformations occurring in the simplification process, the second one examines the results of a linguistic profiling analysis based on Natural Language Processing techniques and carried out on the original and the simple version of the same texts. For both levels of analysis, we chose to focus our discussion mostly on sentence transformations and linguistic characteristics that pertain to the morpho-syntactic and syntactic structure of the sentence.
Linguistically-Based Comparison of Different Approaches to Building Corpora for Text Simplification: A Case Study on Italian
Dominique Brunato;Felice Dell'Orletta;Giulia Venturi
2022
Abstract
In this paper, we present an overview of existing parallel corpora for Automatic Text Simplification (ATS) in different languages focusing on the approach adopted for their construction. We make the main distinction between manual and (semi)-automatic approaches in order to investigate in which respect complex and simple texts vary and whether and how the observed modifications may depend on the underlying approach. To this end, we perform a two-level comparison on Italian corpora, since this is the only language, with the exception of English, for which there are large parallel resources derived through the two approaches considered. The first level of comparison accounts for the main types of sentence transformations occurring in the simplification process, the second one examines the results of a linguistic profiling analysis based on Natural Language Processing techniques and carried out on the original and the simple version of the same texts. For both levels of analysis, we chose to focus our discussion mostly on sentence transformations and linguistic characteristics that pertain to the morpho-syntactic and syntactic structure of the sentence.| Campo DC | Valore | Lingua |
|---|---|---|
| dc.authority.ancejournal | FRONTIERS IN PSYCHOLOGY | en |
| dc.authority.orgunit | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | en |
| dc.authority.people | Dominique Brunato | en |
| dc.authority.people | Felice Dell'Orletta | en |
| dc.authority.people | Giulia Venturi | en |
| dc.collection.id.s | b3f88f24-048a-4e43-8ab1-6697b90e068e | * |
| dc.collection.name | 01.01 Articolo in rivista | * |
| dc.contributor.appartenenza | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | * |
| dc.contributor.appartenenza.mi | 918 | * |
| dc.contributor.area | Non assegn | * |
| dc.contributor.area | Non assegn | * |
| dc.contributor.area | Non assegn | * |
| dc.date.accessioned | 2024/02/20 13:08:27 | - |
| dc.date.available | 2024/02/20 13:08:27 | - |
| dc.date.firstsubmission | 2025/01/24 11:02:17 | * |
| dc.date.issued | 2022 | - |
| dc.date.submission | 2025/01/24 12:35:14 | * |
| dc.description.abstracteng | In this paper, we present an overview of existing parallel corpora for Automatic Text Simplification (ATS) in different languages focusing on the approach adopted for their construction. We make the main distinction between manual and (semi)-automatic approaches in order to investigate in which respect complex and simple texts vary and whether and how the observed modifications may depend on the underlying approach. To this end, we perform a two-level comparison on Italian corpora, since this is the only language, with the exception of English, for which there are large parallel resources derived through the two approaches considered. The first level of comparison accounts for the main types of sentence transformations occurring in the simplification process, the second one examines the results of a linguistic profiling analysis based on Natural Language Processing techniques and carried out on the original and the simple version of the same texts. For both levels of analysis, we chose to focus our discussion mostly on sentence transformations and linguistic characteristics that pertain to the morpho-syntactic and syntactic structure of the sentence. | - |
| dc.description.affiliations | Istituto di Linguistica Computazionale "A. Zampolli" (ILC-CNR) | - |
| dc.description.allpeople | Brunato, Dominique; Dell'Orletta, Felice; Venturi, Giulia | - |
| dc.description.allpeopleoriginal | Dominique Brunato, Felice Dell'Orletta, Giulia Venturi | en |
| dc.description.fulltext | open | en |
| dc.description.numberofauthors | 3 | - |
| dc.identifier.doi | 10.3389/fpsyg.2022.707630 | en |
| dc.identifier.isi | WOS:000780469600001 | en |
| dc.identifier.scopus | 2-s2.0-85127314427 | - |
| dc.identifier.uri | https://hdl.handle.net/20.500.14243/440157 | - |
| dc.identifier.url | https://www.frontiersin.org/articles/10.3389/fpsyg.2022.707630/full | en |
| dc.language.iso | eng | en |
| dc.miur.last.status.update | 2024-07-11T11:03:02Z | * |
| dc.relation.firstpage | 1 | en |
| dc.relation.lastpage | 19 | en |
| dc.relation.numberofpages | 19 | en |
| dc.relation.volume | 13 | en |
| dc.subject.keywords | linguistic complexity | - |
| dc.subject.keywords | corpus construction | - |
| dc.subject.keywords | text simplification | - |
| dc.subject.singlekeyword | linguistic complexity | * |
| dc.subject.singlekeyword | corpus construction | * |
| dc.subject.singlekeyword | text simplification | * |
| dc.title | Linguistically-Based Comparison of Different Approaches to Building Corpora for Text Simplification: A Case Study on Italian | en |
| dc.type.driver | info:eu-repo/semantics/article | - |
| dc.type.full | 01 Contributo su Rivista::01.01 Articolo in rivista | it |
| dc.type.miur | 262 | - |
| dc.type.referee | Sì, ma tipo non specificato | en |
| dc.ugov.descaux1 | 464954 | - |
| dc.ugov.descaux2 | Open Access | - |
| iris.isi.extIssued | 2022 | - |
| iris.isi.extTitle | Linguistically-Based Comparison of Different Approaches to Building Corpora for Text Simplification: A Case Study on Italian | - |
| iris.isi.metadataErrorDescription | 0 | - |
| iris.isi.metadataErrorType | ERROR_NO_MATCH | - |
| iris.isi.metadataStatus | ERROR | - |
| iris.mediafilter.data | 2025/04/05 13:32:34 | * |
| iris.orcid.lastModifiedDate | 2025/02/05 10:14:13 | * |
| iris.orcid.lastModifiedMillisecond | 1738746853268 | * |
| iris.scopus.extIssued | 2022 | - |
| iris.scopus.extTitle | Linguistically-Based Comparison of Different Approaches to Building Corpora for Text Simplification: A Case Study on Italian | - |
| iris.sitodocente.maxattempts | 1 | - |
| iris.unpaywall.bestoahost | publisher | * |
| iris.unpaywall.bestoaversion | publishedVersion | * |
| iris.unpaywall.doi | 10.3389/fpsyg.2022.707630 | * |
| iris.unpaywall.hosttype | publisher | * |
| iris.unpaywall.isoa | true | * |
| iris.unpaywall.journalisindoaj | true | * |
| iris.unpaywall.landingpage | https://doi.org/10.3389/fpsyg.2022.707630 | * |
| iris.unpaywall.license | cc-by | * |
| iris.unpaywall.metadataCallLastModified | 26/04/2025 05:29:08 | - |
| iris.unpaywall.metadataCallLastModifiedMillisecond | 1745638148297 | - |
| iris.unpaywall.oastatus | gold | * |
| iris.unpaywall.pdfurl | https://www.frontiersin.org/articles/10.3389/fpsyg.2022.707630/pdf | * |
| isi.category | VJ | * |
| isi.contributor.affiliation | ItaliaNLP Lab | - |
| isi.contributor.affiliation | ItaliaNLP Lab | - |
| isi.contributor.affiliation | ItaliaNLP Lab | - |
| isi.contributor.country | Italy | - |
| isi.contributor.country | Italy | - |
| isi.contributor.country | Italy | - |
| isi.contributor.name | Dominique | - |
| isi.contributor.name | Felice | - |
| isi.contributor.name | Giulia | - |
| isi.contributor.researcherId | FZE-1113-2022 | - |
| isi.contributor.researcherId | AAX-1864-2020 | - |
| isi.contributor.researcherId | AAY-3932-2020 | - |
| isi.contributor.subaffiliation | Inst Computat Linguist A Zampolli ILC CNR | - |
| isi.contributor.subaffiliation | Inst Computat Linguist A Zampolli ILC CNR | - |
| isi.contributor.subaffiliation | Inst Computat Linguist A Zampolli ILC CNR | - |
| isi.contributor.surname | Brunato | - |
| isi.contributor.surname | Dell'Orletta | - |
| isi.contributor.surname | Venturi | - |
| isi.date.issued | 2022 | * |
| isi.description.allpeopleoriginal | Brunato, D; Dell'Orletta, F; Venturi, G; | * |
| isi.document.type | Article | * |
| isi.identifier.doi | 10.3389/fpsyg.2022.707630 | * |
| isi.identifier.isi | WOS:000780469600001 | * |
| isi.journal.journaltitle | FRONTIERS IN PSYCHOLOGY | * |
| isi.journal.journaltitleabbrev | FRONT PSYCHOL | * |
| isi.language.original | English | * |
| isi.publisher.place | AVENUE DU TRIBUNAL FEDERAL 34, LAUSANNE, CH-1015, SWITZERLAND | * |
| scopus.authority.ancejournal | FRONTIERS IN PSYCHOLOGY###1664-1078 | * |
| scopus.category | 3200 | * |
| scopus.contributor.affiliation | Institute for Computational Linguistics “A. Zampolli” (ILC-CNR) | - |
| scopus.contributor.affiliation | Institute for Computational Linguistics “A. Zampolli” (ILC-CNR) | - |
| scopus.contributor.affiliation | Institute for Computational Linguistics “A. Zampolli” (ILC-CNR) | - |
| scopus.contributor.afid | 60021199 | - |
| scopus.contributor.afid | 60021199 | - |
| scopus.contributor.afid | 60021199 | - |
| scopus.contributor.auid | 55237740200 | - |
| scopus.contributor.auid | 57540567000 | - |
| scopus.contributor.auid | 27568199800 | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.dptid | 121833164 | - |
| scopus.contributor.dptid | 121833164 | - |
| scopus.contributor.dptid | 121833164 | - |
| scopus.contributor.name | Dominique | - |
| scopus.contributor.name | Felice | - |
| scopus.contributor.name | Giulia | - |
| scopus.contributor.subaffiliation | ItaliaNLP Lab; | - |
| scopus.contributor.subaffiliation | ItaliaNLP Lab; | - |
| scopus.contributor.subaffiliation | ItaliaNLP Lab; | - |
| scopus.contributor.surname | Brunato | - |
| scopus.contributor.surname | Dell'Orletta | - |
| scopus.contributor.surname | Venturi | - |
| scopus.date.issued | 2022 | * |
| scopus.description.abstracteng | In this paper, we present an overview of existing parallel corpora for Automatic Text Simplification (ATS) in different languages focusing on the approach adopted for their construction. We make the main distinction between manual and (semi)–automatic approaches in order to investigate in which respect complex and simple texts vary and whether and how the observed modifications may depend on the underlying approach. To this end, we perform a two-level comparison on Italian corpora, since this is the only language, with the exception of English, for which there are large parallel resources derived through the two approaches considered. The first level of comparison accounts for the main types of sentence transformations occurring in the simplification process, the second one examines the results of a linguistic profiling analysis based on Natural Language Processing techniques and carried out on the original and the simple version of the same texts. For both levels of analysis, we chose to focus our discussion mostly on sentence transformations and linguistic characteristics that pertain to the morpho-syntactic and syntactic structure of the sentence. | * |
| scopus.description.allpeopleoriginal | Brunato D.; Dell'Orletta F.; Venturi G. | * |
| scopus.differences | scopus.subject.keywords | * |
| scopus.differences | scopus.description.allpeopleoriginal | * |
| scopus.differences | scopus.description.abstracteng | * |
| scopus.document.type | ar | * |
| scopus.document.types | ar | * |
| scopus.funding.funders | 501100000780 - European Commission; 100011102 - Seventh Framework Programme; 100011102 - Seventh Framework Programme; | * |
| scopus.funding.ids | 257410; | * |
| scopus.identifier.doi | 10.3389/fpsyg.2022.707630 | * |
| scopus.identifier.eissn | 1664-1078 | * |
| scopus.identifier.pui | 2015416102 | * |
| scopus.identifier.scopus | 2-s2.0-85127314427 | * |
| scopus.journal.sourceid | 21100216571 | * |
| scopus.language.iso | eng | * |
| scopus.publisher.name | Frontiers Media S.A. | * |
| scopus.relation.article | 707630 | * |
| scopus.relation.volume | 13 | * |
| scopus.subject.keywords | aligned corpora; corpus construction; Italian language; linguistic complexity; text simplification; | * |
| scopus.title | Linguistically-Based Comparison of Different Approaches to Building Corpora for Text Simplification: A Case Study on Italian | * |
| scopus.titleeng | Linguistically-Based Comparison of Different Approaches to Building Corpora for Text Simplification: A Case Study on Italian | * |
| Appare nelle tipologie: | 01.01 Articolo in rivista | |
| File | Dimensione | Formato | |
|---|---|---|---|
|
fpsyg-13-707630.pdf
accesso aperto
Tipologia:
Versione Editoriale (PDF)
Licenza:
Creative commons
Dimensione
638.27 kB
Formato
Adobe PDF
|
638.27 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


