Recent advancements in Generative AI and Large Language Models (LLMs) have enabled the creation of highly realistic synthetic content, raising concerns about the potential for malicious use, such as misinformation and manipulation. Moreover, detecting Machine-Generated Text (MGT) remains challenging due to the lack of robust benchmarks that assess generalization to real-world scenarios. In this work, we evaluate the resilience of state-of-the-art MGT detectors (e.g., Mage, Radar, LLM-DetectAIve) to linguistically informed adversarial attacks. We develop a pipeline that fine-tunes language models using Direct Preference Optimization (DPO) to shift the MGT style toward human-written text (HWT), obtaining generations more challenging to detect by current models. Additionally, we analyze the linguistic shifts induced by the alignment and how detectors rely on “linguistic shortcuts” to detect texts. Our results show that detectors can be easily fooled with relatively few examples, resulting in a significant drop in detecting performances. This highlights the importance of improving detection methods and making them robust to unseen in-domain texts. We release code, models, and data to support future research on more robust MGT detection benchmarks.
Stress-testing machine generated text detection: shifting language models writing style to fool detectors
Pedrotti A.
;Papucci M.;Ciaccio C.;Miaschi A.;Puccetti G.;Dell'Orletta F.;Esuli A.
2025
Abstract
Recent advancements in Generative AI and Large Language Models (LLMs) have enabled the creation of highly realistic synthetic content, raising concerns about the potential for malicious use, such as misinformation and manipulation. Moreover, detecting Machine-Generated Text (MGT) remains challenging due to the lack of robust benchmarks that assess generalization to real-world scenarios. In this work, we evaluate the resilience of state-of-the-art MGT detectors (e.g., Mage, Radar, LLM-DetectAIve) to linguistically informed adversarial attacks. We develop a pipeline that fine-tunes language models using Direct Preference Optimization (DPO) to shift the MGT style toward human-written text (HWT), obtaining generations more challenging to detect by current models. Additionally, we analyze the linguistic shifts induced by the alignment and how detectors rely on “linguistic shortcuts” to detect texts. Our results show that detectors can be easily fooled with relatively few examples, resulting in a significant drop in detecting performances. This highlights the importance of improving detection methods and making them robust to unseen in-domain texts. We release code, models, and data to support future research on more robust MGT detection benchmarks.| Campo DC | Valore | Lingua |
|---|---|---|
| dc.authority.orgunit | Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI | en |
| dc.authority.orgunit | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | en |
| dc.authority.people | Pedrotti A. | en |
| dc.authority.people | Papucci M. | en |
| dc.authority.people | Ciaccio C. | en |
| dc.authority.people | Miaschi A. | en |
| dc.authority.people | Puccetti G. | en |
| dc.authority.people | Dell'Orletta F. | en |
| dc.authority.people | Esuli A. | en |
| dc.authority.project | corda__h2020::e7f5e7755409fc74eea9d168ab795634 | en |
| dc.collection.id.s | 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d | * |
| dc.collection.name | 04.01 Contributo in Atti di convegno | * |
| dc.contributor.appartenenza | Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI | * |
| dc.contributor.appartenenza | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | * |
| dc.contributor.appartenenza.mi | 918 | * |
| dc.contributor.appartenenza.mi | 973 | * |
| dc.contributor.area | Non assegn | * |
| dc.contributor.area | Non assegn | * |
| dc.contributor.area | Non assegn | * |
| dc.contributor.area | Non assegn | * |
| dc.contributor.area | Non assegn | * |
| dc.contributor.area | Non assegn | * |
| dc.contributor.area | Non assegn | * |
| dc.date.accessioned | 2025/09/29 15:27:29 | - |
| dc.date.available | 2025/09/29 15:27:29 | - |
| dc.date.firstsubmission | 2025/09/29 15:26:49 | * |
| dc.date.issued | 2025 | - |
| dc.date.submission | 2025/09/29 15:26:49 | * |
| dc.description.abstracteng | Recent advancements in Generative AI and Large Language Models (LLMs) have enabled the creation of highly realistic synthetic content, raising concerns about the potential for malicious use, such as misinformation and manipulation. Moreover, detecting Machine-Generated Text (MGT) remains challenging due to the lack of robust benchmarks that assess generalization to real-world scenarios. In this work, we evaluate the resilience of state-of-the-art MGT detectors (e.g., Mage, Radar, LLM-DetectAIve) to linguistically informed adversarial attacks. We develop a pipeline that fine-tunes language models using Direct Preference Optimization (DPO) to shift the MGT style toward human-written text (HWT), obtaining generations more challenging to detect by current models. Additionally, we analyze the linguistic shifts induced by the alignment and how detectors rely on “linguistic shortcuts” to detect texts. Our results show that detectors can be easily fooled with relatively few examples, resulting in a significant drop in detecting performances. This highlights the importance of improving detection methods and making them robust to unseen in-domain texts. We release code, models, and data to support future research on more robust MGT detection benchmarks. | - |
| dc.description.allpeople | Pedrotti, A.; Papucci, M.; Ciaccio, C.; Miaschi, A.; Puccetti, G.; Dell'Orletta, F.; Esuli, A. | - |
| dc.description.allpeopleoriginal | Pedrotti A.; Papucci M.; Ciaccio C.; Miaschi A.; Puccetti G.; Dell'Orletta F.; Esuli A. | en |
| dc.description.fulltext | open | en |
| dc.description.numberofauthors | 7 | - |
| dc.identifier.doi | 10.18653/v1/2025.findings-acl.156 | en |
| dc.identifier.isbn | 979-8-89176-256-5 | en |
| dc.identifier.scopus | 2-s2.0-105028618911 | - |
| dc.identifier.source | crossref | * |
| dc.identifier.uri | https://hdl.handle.net/20.500.14243/554367 | - |
| dc.identifier.url | https://aclanthology.org/2025.findings-acl.156/ | en |
| dc.language.iso | eng | en |
| dc.publisher.name | Association for Computational Linguistics | en |
| dc.relation.allauthors | Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar (eds.) | en |
| dc.relation.conferencedate | 27/07-01/08/2025 | en |
| dc.relation.conferencename | NAACL 2025 - Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics. Findings | en |
| dc.relation.conferenceplace | Vienna, Austria | en |
| dc.relation.firstpage | 3010 | en |
| dc.relation.ispartofbook | NAACL 2025 Findings proceedings | en |
| dc.relation.lastpage | 3031 | en |
| dc.relation.medium | ELETTRONICO | en |
| dc.relation.numberofpages | 22 | en |
| dc.relation.projectAcronym | SoBigData | en |
| dc.relation.projectAwardNumber | 654024 | en |
| dc.relation.projectAwardTitle | SoBigData Research Infrastructure | en |
| dc.relation.projectFunderName | European Commission | en |
| dc.relation.projectFundingStream | Horizon 2020 Framework Programme | en |
| dc.subject.keywordseng | machine-generated text detection, synthetic content detection | - |
| dc.subject.singlekeyword | machine-generated text detection | * |
| dc.subject.singlekeyword | synthetic content detection | * |
| dc.title | Stress-testing machine generated text detection: shifting language models writing style to fool detectors | en |
| dc.type.driver | info:eu-repo/semantics/conferenceObject | - |
| dc.type.full | 04 Contributo in convegno::04.01 Contributo in Atti di convegno | it |
| dc.type.miur | 273 | - |
| iris.mediafilter.data | 2025/09/30 03:37:07 | * |
| iris.orcid.lastModifiedDate | 2026/04/20 15:05:00 | * |
| iris.orcid.lastModifiedMillisecond | 1776690300512 | * |
| iris.scopus.extIssued | 2025 | - |
| iris.scopus.extTitle | Stress-testing Machine Generated Text Detection: Shifting Language Models Writing Style to Fool Detectors | - |
| iris.scopus.ideLinkStatusDate | 2026/04/20 15:05:00 | * |
| iris.scopus.ideLinkStatusMillisecond | 1776690300555 | * |
| iris.sitodocente.maxattempts | 1 | - |
| iris.unpaywall.bestoaversion | publishedVersion | * |
| iris.unpaywall.doi | 10.18653/v1/2025.findings-acl.156 | * |
| iris.unpaywall.isoa | true | * |
| iris.unpaywall.journalisindoaj | false | * |
| iris.unpaywall.landingpage | https://doi.org/10.18653/v1/2025.findings-acl.156 | * |
| iris.unpaywall.license | cc-by | * |
| iris.unpaywall.metadataCallLastModified | 28/04/2026 05:03:35 | - |
| iris.unpaywall.metadataCallLastModifiedMillisecond | 1777345415827 | - |
| iris.unpaywall.oastatus | gold | * |
| iris.unpaywall.pdfurl | https://aclanthology.org/2025.findings-acl.156.pdf | * |
| scopus.authority.anceserie | PROCEEDINGS OF THE CONFERENCE - ASSOCIATION FOR COMPUTATIONAL LINGUISTICS. MEETING###0736-587X | * |
| scopus.category | 1203 | * |
| scopus.category | 3310 | * |
| scopus.category | 1706 | * |
| scopus.contributor.affiliation | Istituto di Scienza e Tecnologie dell'Informazione “A. Faedo” (CNR-ISTI) | - |
| scopus.contributor.affiliation | Istituto di Linguistica Computazionale “Antonio Zampolli” (CNR-ILC) | - |
| scopus.contributor.affiliation | Istituto di Linguistica Computazionale “Antonio Zampolli” (CNR-ILC) | - |
| scopus.contributor.affiliation | Istituto di Linguistica Computazionale “Antonio Zampolli” (CNR-ILC) | - |
| scopus.contributor.affiliation | Istituto di Scienza e Tecnologie dell'Informazione “A. Faedo” (CNR-ISTI) | - |
| scopus.contributor.affiliation | Istituto di Linguistica Computazionale “Antonio Zampolli” (CNR-ILC) | - |
| scopus.contributor.affiliation | Istituto di Scienza e Tecnologie dell'Informazione “A. Faedo” (CNR-ISTI) | - |
| scopus.contributor.afid | 60085207 | - |
| scopus.contributor.afid | 60008941 | - |
| scopus.contributor.afid | 60008941 | - |
| scopus.contributor.afid | 60008941 | - |
| scopus.contributor.afid | 60085207 | - |
| scopus.contributor.afid | 60008941 | - |
| scopus.contributor.afid | 60085207 | - |
| scopus.contributor.auid | 57223141523 | - |
| scopus.contributor.auid | 57991631200 | - |
| scopus.contributor.auid | 59504212000 | - |
| scopus.contributor.auid | 57211678681 | - |
| scopus.contributor.auid | 57220748419 | - |
| scopus.contributor.auid | 57540567000 | - |
| scopus.contributor.auid | 15044356100 | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | 114087935 | - |
| scopus.contributor.dptid | 114087935 | - |
| scopus.contributor.dptid | 114087935 | - |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | 114087935 | - |
| scopus.contributor.dptid | - | |
| scopus.contributor.name | Andrea | - |
| scopus.contributor.name | Michele | - |
| scopus.contributor.name | Cristiano | - |
| scopus.contributor.name | Alessio | - |
| scopus.contributor.name | Giovanni | - |
| scopus.contributor.name | Felice | - |
| scopus.contributor.name | Andrea | - |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | ItaliaNLP Lab; | - |
| scopus.contributor.subaffiliation | ItaliaNLP Lab; | - |
| scopus.contributor.subaffiliation | ItaliaNLP Lab; | - |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | ItaliaNLP Lab; | - |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.surname | Pedrotti | - |
| scopus.contributor.surname | Papucci | - |
| scopus.contributor.surname | Ciaccio | - |
| scopus.contributor.surname | Miaschi | - |
| scopus.contributor.surname | Puccetti | - |
| scopus.contributor.surname | Dell'Orletta | - |
| scopus.contributor.surname | Esuli | - |
| scopus.date.issued | 2025 | * |
| scopus.description.abstracteng | Recent advancements in Generative AI and Large Language Models (LLMs) have enabled the creation of highly realistic synthetic content, raising concerns about the potential for malicious use, such as misinformation and manipulation. Moreover, detecting Machine-Generated Text (MGT) remains challenging due to the lack of robust benchmarks that assess generalization to real-world scenarios. In this work, we present a pipeline to test the resilience of state-of-the-art MGT detectors (e.g., Mage, Radar, LLM-DetectAIve) to linguistically informed adversarial attacks. To challenge the detectors, we fine-tune language models using Direct Preference Optimization (DPO) to shift the MGT style toward human-written text (HWT). This exploits the detectors' reliance on stylistic clues, making new generations more challenging to detect. Additionally, we analyze the linguistic shifts induced by the alignment and which features are used by detectors to detect MGT texts. Our results show that detectors can be easily fooled with relatively few examples, resulting in a significant drop in detection performance. This highlights the importance of improving detection methods and making them robust to unseen in-domain texts. We release code, models, and data to support future research on more robust MGT detection benchmarks. | * |
| scopus.description.allpeopleoriginal | Pedrotti A.; Papucci M.; Ciaccio C.; Miaschi A.; Puccetti G.; Dell'Orletta F.; Esuli A. | * |
| scopus.differences | scopus.authority.anceserie | * |
| scopus.differences | scopus.publisher.name | * |
| scopus.differences | scopus.relation.conferencedate | * |
| scopus.differences | scopus.description.abstracteng | * |
| scopus.differences | scopus.relation.conferencename | * |
| scopus.differences | scopus.identifier.isbn | * |
| scopus.differences | scopus.relation.conferenceplace | * |
| scopus.document.type | cp | * |
| scopus.document.types | cp | * |
| scopus.funding.funders | 501100021856 - Ministero dell'Università e della Ricerca; 501100021856 - Ministero dell'Università e della Ricerca; 501100000780 - European Commission; 501100000780 - European Commission; 100031478 - NextGenerationEU; 100031478 - NextGenerationEU; | * |
| scopus.funding.ids | CUP B53C22001770006; XAI-CARE-PNRR-MAD-2022-12376692; CUP B53D23013050006; CUP B53C22001760006; PE0000013-FAIR; | * |
| scopus.identifier.doi | 10.18653/v1/2025.findings-acl.156 | * |
| scopus.identifier.isbn | 9798891762565 | * |
| scopus.identifier.pui | 650042695 | * |
| scopus.identifier.scopus | 2-s2.0-105028618911 | * |
| scopus.journal.sourceid | 21101138302 | * |
| scopus.language.iso | eng | * |
| scopus.publisher.name | Association for Computational Linguistics (ACL) | * |
| scopus.relation.conferencedate | 2025 | * |
| scopus.relation.conferencename | 63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025 | * |
| scopus.relation.conferenceplace | aut | * |
| scopus.relation.firstpage | 3010 | * |
| scopus.relation.lastpage | 3031 | * |
| scopus.title | Stress-testing Machine Generated Text Detection: Shifting Language Models Writing Style to Fool Detectors | * |
| scopus.titleeng | Stress-testing Machine Generated Text Detection: Shifting Language Models Writing Style to Fool Detectors | * |
| Appare nelle tipologie: | 04.01 Contributo in Atti di convegno | |
| File | Dimensione | Formato | |
|---|---|---|---|
|
Pedrotti et al_ACL Findings-2025.pdf
accesso aperto
Descrizione: Stress-testing Machine Generated Text Detection: Shifting Language Models Writing Style to Fool Detectors
Tipologia:
Versione Editoriale (PDF)
Licenza:
Altro tipo di licenza
Dimensione
798.04 kB
Formato
Adobe PDF
|
798.04 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


