Providing rich and accurate metadata for indexing media content is a crucial problem for all the companies offering streaming entertainment services. These metadata are commonly employed to enhance search engine results and feed recommendation algorithms to improve the matching with user interests. However, the problem of labeling multimedia content with informative tags is challenging as the labeling procedure, manually performed by domain experts, is time-consuming and prone to error. Recently, the adoption of AI-based methods has been demonstrated to be an effective approach for automating this complex process. However, developing an effective solution requires coping with different challenging issues, such as data noise and the scarcity of labeled examples during the training phase. In this work, we address these challenges by introducing a Transformer-based framework for multi-modal multi-label classification enriched with model prediction explanation capabilities. These explanations can help the domain expert to understand the system's predictions. Experimentation conducted on two real test cases demonstrates its effectiveness.

Movie tag prediction: An extreme multi-label multi-modal transformer-based solution with explanation

Guarascio Massimo;Minici Marco;Pisani Francesco Sergio;
2024

Abstract

Providing rich and accurate metadata for indexing media content is a crucial problem for all the companies offering streaming entertainment services. These metadata are commonly employed to enhance search engine results and feed recommendation algorithms to improve the matching with user interests. However, the problem of labeling multimedia content with informative tags is challenging as the labeling procedure, manually performed by domain experts, is time-consuming and prone to error. Recently, the adoption of AI-based methods has been demonstrated to be an effective approach for automating this complex process. However, developing an effective solution requires coping with different challenging issues, such as data noise and the scarcity of labeled examples during the training phase. In this work, we address these challenges by introducing a Transformer-based framework for multi-modal multi-label classification enriched with model prediction explanation capabilities. These explanations can help the domain expert to understand the system's predictions. Experimentation conducted on two real test cases demonstrates its effectiveness.
2024
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
Deep learning
Explainable artificial intelligence
Extreme multi-label classification
Movie genre classification
Multi-modal learning
File in questo prodotto:
File Dimensione Formato  
prod_491521-doc_204999.pdf

solo utenti autorizzati

Descrizione: paper
Tipologia: Versione Editoriale (PDF)
Dimensione 2.3 MB
Formato Adobe PDF
2.3 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/452347
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact