CNR Institutional Research Information System

From a few-shot learning perspective, we propose a strategy to enrich the latent semantic of the text provided in the dataset provided for the Profiling Cryptocurrency Influencers with Few-shot Learning, the task hosted at PAN@CLEF2023. Our approach is based on data augmentation using the backtranslation forth and back to and from Japanese language. We translate samples in the original training dataset to a target language (i.e. Japanese). Then we translate it back to English. The original sample and the backtranslated one are then merged. Then we fine-tuned two state-of-the-art Transformer models on this augmented version of the training dataset. We evaluate the performance of the two fine-tuned models using the Macro and Micro F1 accordingly to the official metric used for the task. After the fine-tuning phase, ELECTRA and XLNet obtained a Macro F1 of 0.7694 and 0.7872 respectively on the original training set. Our best submission obtained a Macro F1 equal to 0.3851 on the official test set provided.

Text Enrichment with Japanese Language to Profile Cryptocurrency Influencers

F Lomonaco;M Siino;M Tesconi

2023

Abstract

From a few-shot learning perspective, we propose a strategy to enrich the latent semantic of the text provided in the dataset provided for the Profiling Cryptocurrency Influencers with Few-shot Learning, the task hosted at PAN@CLEF2023. Our approach is based on data augmentation using the backtranslation forth and back to and from Japanese language. We translate samples in the original training dataset to a target language (i.e. Japanese). Then we translate it back to English. The original sample and the backtranslated one are then merged. Then we fine-tuned two state-of-the-art Transformer models on this augmented version of the training dataset. We evaluate the performance of the two fine-tuned models using the Macro and Micro F1 accordingly to the official metric used for the task. After the fine-tuning phase, ELECTRA and XLNet obtained a Macro F1 of 0.7694 and 0.7872 respectively on the original training set. Our best submission obtained a Macro F1 equal to 0.3851 on the official test set provided.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Strutture organizzative
	
				Istituto di informatica e telematica - IIT
			
	Parole chiave
	
				cryptocurrency influencers
data augmentation
author profiling
text classification
Twitter
text enrichment
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
prod_490140-doc_204181.pdf accesso aperto Descrizione: Text Enrichment with Japanese Language to Profile Cryptocurrency Influencers Tipologia: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 1.04 MB Formato Adobe PDF Visualizza/Apri	1.04 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/452010

Citazioni

ND

ND

ND

social impact