Authorship Identification is the branch of authorship analysis concerned with uncovering the author of a written document. Methods devised for Authorship Identification typically employ stylometry (the analysis of unconscious traits that authors exhibit while writing), and are expected not to make inferences grounded on the topics the authors usually write about (as reflected in their past production). In this paper, we present a series of experiments evaluating the use of feature sets based on rhythmic and psycholinguistic patterns for Authorship Verification and Attribution in Spanish political language, via different approaches of text distortion used to actively mask the underlying topic. We feed these feature sets to a SVM learner, and show that they lead to results that are comparable to those obtained by the BETO transformer when the latter is trained on the original text, i.e., when potentially learning from topical information.

Investigating topic-agnostic features for authorship tasks in Spanish political speeches

Moreo A
2022

Abstract

Authorship Identification is the branch of authorship analysis concerned with uncovering the author of a written document. Methods devised for Authorship Identification typically employ stylometry (the analysis of unconscious traits that authors exhibit while writing), and are expected not to make inferences grounded on the topics the authors usually write about (as reflected in their past production). In this paper, we present a series of experiments evaluating the use of feature sets based on rhythmic and psycholinguistic patterns for Authorship Verification and Attribution in Spanish political language, via different approaches of text distortion used to actively mask the underlying topic. We feed these feature sets to a SVM learner, and show that they lead to results that are comparable to those obtained by the BETO transformer when the latter is trained on the original text, i.e., when potentially learning from topical information.
2022
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
978-3-031-08473-7
Authorship identification
Text distortion
Political speech
File in questo prodotto:
File Dimensione Formato  
prod_472052-doc_192009.pdf

solo utenti autorizzati

Descrizione: Investigating topic-agnostic features for authorship tasks in Spanish political speeches
Tipologia: Versione Editoriale (PDF)
Dimensione 394.25 kB
Formato Adobe PDF
394.25 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
prod_472052-doc_192021.pdf

accesso aperto

Descrizione: Postprint - Investigating topic-agnostic features for authorship tasks in Spanish political speeches
Tipologia: Versione Editoriale (PDF)
Dimensione 301.78 kB
Formato Adobe PDF
301.78 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/443591
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 1
social impact