This paper describes the system used by the AIMH Team to approach the SemEval Task 6. We propose an approach that relies on an architecture based on the transformer model to process multimodal content (text and images) in memes. Our architecture, called DVTT (Double Visual Textual Transformer), approaches Subtasks 1 and 3 of Task 6 as multi-label classification problems, where the text and/or images of the meme are processed, and the probabilities of the presence of each possible persuasion technique are returned as a result. DVTT uses two complete networks of transformers that work on text and images that are mutually conditioned. One of the two modalities acts as the main one and the second one intervenes to enrich the first one, thus obtaining two distinct ways of operation. The two transformers outputs are merged by averaging the inferred probabilities for each possible label, and the overall network is trained end-to-end with a binary cross-entropy loss.

AIMH at SemEval-2021 - Task 6: multimodal classification using an ensemble of transformer models

Messina N;Falchi F;Gennaro C;Amato G
2021

Abstract

This paper describes the system used by the AIMH Team to approach the SemEval Task 6. We propose an approach that relies on an architecture based on the transformer model to process multimodal content (text and images) in memes. Our architecture, called DVTT (Double Visual Textual Transformer), approaches Subtasks 1 and 3 of Task 6 as multi-label classification problems, where the text and/or images of the meme are processed, and the probabilities of the presence of each possible persuasion technique are returned as a result. DVTT uses two complete networks of transformers that work on text and images that are mutually conditioned. One of the two modalities acts as the main one and the second one intervenes to enrich the first one, thus obtaining two distinct ways of operation. The two transformers outputs are merged by averaging the inferred probabilities for each possible label, and the overall network is trained end-to-end with a binary cross-entropy loss.
2021
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Inglese
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
SemEval-2021 - 15th International Workshop on Semantic Evaluation
1020
1026
978-1-954085-70-1
https://aclanthology.org/2021.semeval-1.140
Sì, ma tipo non specificato
5-6/08/2021
Bangkok, Thailand
Deep learning
Social network
Persuasion detection
Computer vision
NLP
Multi-modal
4
open
Messina N.; Falchi F.; Gennaro C.; Amato G.
273
info:eu-repo/semantics/conferenceObject
04 Contributo in convegno::04.01 Contributo in Atti di convegno
   A European AI On Demand Platform and Ecosystem
   AI4EU
   H2020
   825619

   A European Excellence Centre for Media, Society and Democracy
   AI4Media
   H2020
   951911
File in questo prodotto:
File Dimensione Formato  
prod_457536-doc_177562.pdf

accesso aperto

Descrizione: Postprint - AIMH at SemEval-2021 - Task 6: multimodal classification using an ensemble of transformer models
Tipologia: Versione Editoriale (PDF)
Dimensione 665.48 kB
Formato Adobe PDF
665.48 kB Adobe PDF Visualizza/Apri
prod_457536-doc_177595.pdf

accesso aperto

Descrizione: AIMH at SemEval-2021 - Task 6: multimodal classification using an ensemble of transformer models
Tipologia: Versione Editoriale (PDF)
Dimensione 725.98 kB
Formato Adobe PDF
725.98 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/395773
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? 5
social impact