In this paper, we introduce a deep learning solution for video activity recognition that leverages an innovative combination of convolutional layers with a linear-complexity attention mechanism. Moreover, we introduce a novel quantization mechanism to further improve the efficiency of our model during both training and inference. Our model maintains a reduced computational cost, while preserving robust learning and generalization capabilities. Our approach addresses the issues related to the high computing requirements of current models, with the goal of achieving competitive accuracy on consumer and edge devices, enabling smart home and smart healthcare applications where efficiency and privacy issues are of concern. We experimentally validate our model on different established and publicly available video activity recognition benchmarks, improving accuracy over alternative models at a competitive computing cost.

CA3D: Convolutional-Attentional 3D nets for efficient video activity recognition on the edge

Lagani G.;Falchi F.;Gennaro C.;Amato G.
2025

Abstract

In this paper, we introduce a deep learning solution for video activity recognition that leverages an innovative combination of convolutional layers with a linear-complexity attention mechanism. Moreover, we introduce a novel quantization mechanism to further improve the efficiency of our model during both training and inference. Our model maintains a reduced computational cost, while preserving robust learning and generalization capabilities. Our approach addresses the issues related to the high computing requirements of current models, with the goal of achieving competitive accuracy on consumer and edge devices, enabling smart home and smart healthcare applications where efficiency and privacy issues are of concern. We experimentally validate our model on different established and publicly available video activity recognition benchmarks, improving accuracy over alternative models at a competitive computing cost.
2025
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
978-3-031-91978-7
Attention
Computer Vision
Convolution
Deep Learning
Neural Networks
Video Activity Recognition
File in questo prodotto:
File Dimensione Formato  
2505.19928v1.pdf

accesso aperto

Descrizione: CA3D: Convolutional-Attentional 3D Nets for Efficient Video Activity Recognition on the Edge
Tipologia: Documento in Pre-print
Licenza: Altro tipo di licenza
Dimensione 1.13 MB
Formato Adobe PDF
1.13 MB Adobe PDF Visualizza/Apri
Lagani-Falchi-Gennaro-Amato_LNCS 2025.pdf

solo utenti autorizzati

Descrizione: CA3D: Convolutional-Attentional 3D Nets for Efficient Video Activity Recognition on the Edge
Tipologia: Versione Editoriale (PDF)
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 1.25 MB
Formato Adobe PDF
1.25 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/552088
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact