Video summarization (VS) has attracted intense attention recently due to its enormous applications in various computer vision domains, such as video retrieval, indexing, and browsing. Traditional VS researches mostly target at the effectiveness of the VS algorithms by introducing the high quality of features and clusters for selecting representative visual elements. Due to the increased density of vision sensors network, there is a tradeoff between the processing time of the VS methods with reasonable and representative quality of the generated summaries. It is a challenging task to generate a video summary of significant importance while fulfilling the needs of Internet of Things (IoT) surveillance networks with constrained resources. This article addresses this problem by proposing a new computationally effective solution through designing a deep CNN framework with hierarchical weighted fusion for the summarization of surveillance videos captured in IoT settings. The first stage of our framework designs discriminative rich features extracted from deep CNNs for shot segmentation. Then, we employ image memorability predicted from a fine-tuned CNN model in the framework, along with aesthetic and entropy features to maintain the interestingness and diversity of the summary. Third, a hierarchical weighted fusion mechanism is proposed to produce an aggregated score for the effective computation of the extracted features. Finally, an attention curve is constituted using the aggregated score for deciding outstanding keyframes for the final video summary. Experiments are conducted using benchmark data sets for validating the importance and effectiveness of our framework, which outperforms the other state-of-the-art schemes.

Cost-Effective Video Summarization Using Deep CNN With Hierarchical Weighted Fusion for IoT Surveillance Networks

Sannino Giovanna;
2019

Abstract

Video summarization (VS) has attracted intense attention recently due to its enormous applications in various computer vision domains, such as video retrieval, indexing, and browsing. Traditional VS researches mostly target at the effectiveness of the VS algorithms by introducing the high quality of features and clusters for selecting representative visual elements. Due to the increased density of vision sensors network, there is a tradeoff between the processing time of the VS methods with reasonable and representative quality of the generated summaries. It is a challenging task to generate a video summary of significant importance while fulfilling the needs of Internet of Things (IoT) surveillance networks with constrained resources. This article addresses this problem by proposing a new computationally effective solution through designing a deep CNN framework with hierarchical weighted fusion for the summarization of surveillance videos captured in IoT settings. The first stage of our framework designs discriminative rich features extracted from deep CNNs for shot segmentation. Then, we employ image memorability predicted from a fine-tuned CNN model in the framework, along with aesthetic and entropy features to maintain the interestingness and diversity of the summary. Third, a hierarchical weighted fusion mechanism is proposed to produce an aggregated score for the effective computation of the extracted features. Finally, an attention curve is constituted using the aggregated score for deciding outstanding keyframes for the final video summary. Experiments are conducted using benchmark data sets for validating the importance and effectiveness of our framework, which outperforms the other state-of-the-art schemes.
2019
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
Surveillance
Feature extraction
Streaming media
Entropy
Internet of Things
Biological system modeling
Task analysis
Artificial intelligence
computer vision
data science
energy efficiency
fusion
Internet of Things (IoT)
resource-constrained devices
surveillance
video analysis
video summarization (VS)
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/381061
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 87
  • ???jsp.display-item.citation.isi??? 80
social impact