Goto

Collaborating Authors

 Boukerche, Azzedine


A Survey on Video Analytics in Cloud-Edge-Terminal Collaborative Systems

arXiv.org Artificial Intelligence

The explosive growth of video data has driven the development of distributed video analytics in cloud-edge-terminal collaborative (CETC) systems, enabling efficient video processing, real-time inference, and privacy-preserving analysis. Among multiple advantages, CETC systems can distribute video processing tasks and enable adaptive analytics across cloud, edge, and terminal devices, leading to breakthroughs in video surveillance, autonomous driving, and smart cities. In this survey, we first analyze fundamental architectural components, including hierarchical, distributed, and hybrid frameworks, alongside edge computing platforms and resource management mechanisms. Building upon these foundations, edge-centric approaches emphasize on-device processing, edge-assisted offloading, and edge intelligence, while cloud-centric methods leverage powerful computational capabilities for complex video understanding and model training. Our investigation also covers hybrid video analytics incorporating adaptive task offloading and resource-aware scheduling techniques that optimize performance across the entire system. Beyond conventional approaches, recent advances in large language models and multimodal integration reveal both opportunities and challenges in platform scalability, data protection, and system reliability. Future directions also encompass explainable systems, efficient processing mechanisms, and advanced video analytics, offering valuable insights for researchers and practitioners in this dynamic field.


Networking Systems for Video Anomaly Detection: A Tutorial and Survey

arXiv.org Artificial Intelligence

With the widespread use of surveillance cameras in smart cities [104] and the boom of online video applications powered by 4/5G communication technologies, traditional human inspection is no longer able to accurately monitor the video data generated around the clock, which is not only time-consuming and labor-intensive but also poses the risk of leaking important information (e.g., biometrics and sensitive speech). In contrast, VAD-empowered IoVT applications [54], such as Intelligent Surveillance Systems (IVSS) and automated content analysis platforms, can process massive video streams online and detect events of interest in real-time, sending only noteworthy anomaly parts for human review, significantly reducing data storage and communication costs, and helping to eliminate public concerns about data security and privacy protection. As a result, VAD has gained widespread attention in academia and industry over the last decade and has been used in emerging fields such as information forensics [154], industrial manufacturing [71] in smart cities as well as online content analysis in mobile video applications [153]. VAD extends the data scope of conventional Anomaly Detection (AD) from time series, images, and graphs to video, which not only needs to cope with the endogenous data complexity, but also needs to take into account the computational and communication costs in resource-limited devices [55]. Specifically, the inherent high-dimensional structure of video data, high information density and redundancy, heterogeneity of temporal and spatial patterns, and feature entanglement between foreground targets and background scenes make VAD more challenging than traditional AD tasks at the levels of representation learning and anomaly discrimination [89]. Existing studies [4, 60, 69, 76] have shown that high-performance VAD models need to target the modeling of appearance and motion information, i.e., the difference between regular events and anomalous examples in both spatial and temporal dimensions. In contrast to time series AD that mainly measures periodic temporal patterns of variables, and image AD which only focusing on spatial contextual deviations, VAD needs to extract both discriminative spatial and temporal features from a large amount of redundant information (e.g., repetitive temporal contexts and label-independent data distributions), as well as to learn the differences between normal and anomalous events in terms of their local appearances and global motions [100]. However, video anomalies are ambiguous and subjective [48].