AITopics

Country: North America > United States > California (0.28)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-7-2026, 12:56:21 GMT

103303dd56a731e377d01f6a37badae3-Paper.pdf

perturbation, search space, transformation, (15 more...)

Country:

North America > United States > California > Riverside County > Riverside (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia (0.04)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (0.71)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsDec-23-2025, 18:51:25 GMT

Adversarial Attacks on Black Box Video Classifiers: Leveraging the Power of Geometric Transformations

When compared to the image classification models, black-box adversarial attacks against video classification models have been largely understudied. This could be possible because, with video, the temporal dimension poses significant additional challenges in gradient estimation. Query-efficient black-box attacks rely on effectively estimated gradients towards maximizing the probability of misclassifying the target video. In this work, we demonstrate that such effective gradients can be searched for by parameterizing the temporal structure of the search space with geometric transformations.

adversarial attack, black box video classifier, classification model, (10 more...)

Industry:

Transportation > Air (0.92)
Information Technology > Security & Privacy (0.69)
Government > Military (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Artificial Intelligence > Vision (0.65)

Lin, Tingyu, Dadras, Armin, Kleber, Florian, Sablatnig, Robert

Camera Movement Classification in Historical Footage: A Comparative Study of Deep Video Models

arXiv.org Artificial IntelligenceOct-17-2025

Camera movement conveys spatial and narrative information essential for understanding video content. While recent camera movement classification (CMC) methods perform well on modern datasets, their generalization to historical footage remains unexplored. This paper presents the first systematic evaluation of deep video CMC models on archival film material. We summarize representative methods and datasets, highlighting differences in model design and label definitions. Five standard video classification models are assessed on the HISTORIAN dataset, which includes expert-annotated World War II footage. The best-performing model, Video Swin Transformer, achieves 80.25% accuracy, showing strong convergence despite limited training data. Our findings highlight the challenges and potential of adapting existing models to low-quality video and motivate future work combining diverse input modalities and temporal architectures.

artificial intelligence, deep learning, machine learning, (15 more...)

2510.14713

Country: Europe > Austria > Vienna (0.14)

Genre: Research Report (0.70)

Industry: Media > Film (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Thoduka, Santosh, Houben, Sebastian, Gall, Juergen, Plöger, Paul G.

Enhancing Video-Based Robot Failure Detection Using Task Knowledge

arXiv.org Artificial IntelligenceSep-24-2025

Robust robotic task execution hinges on the reliable detection of execution failures in order to trigger safe operation modes, recovery strategies, or task replanning. However, many failure detection methods struggle to provide meaningful performance when applied to a variety of real-world scenarios. In this paper, we propose a video-based failure detection approach that uses spatio-temporal knowledge in the form of the actions the robot performs and task-relevant objects within the field of view. Both pieces of information are available in most robotic scenarios and can thus be readily obtained. We demonstrate the effectiveness of our approach on three datasets that we amend, in part, with additional annotations of the aforementioned task-relevant knowledge. In light of the results, we also propose a data augmentation method that improves performance by applying variable frame rates to different parts of the video. We observe an improvement from 77.9 to 80.0 in F1 score on the ARMBench dataset without additional computational expense and an additional increase to 81.4 with test-time augmentation. The results emphasize the importance of spatio-temporal information during failure detection and suggest further investigation of suitable heuristics in future implementations. Code and annotations are available.

artificial intelligence, dataset, machine learning, (13 more...)

doi: 10.1109/ECMR65884.2025.11162998

2508.18705

Country: Europe > Germany (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Neural Information Processing SystemsOct-9-2024, 13:34:04 GMT

Adversarial Attacks on Black Box Video Classifiers: Leveraging the Power of Geometric Transformations

When compared to the image classification models, black-box adversarial attacks against video classification models have been largely understudied. This could be possible because, with video, the temporal dimension poses significant additional challenges in gradient estimation. Query-efficient black-box attacks rely on effectively estimated gradients towards maximizing the probability of misclassifying the target video. In this work, we demonstrate that such effective gradients can be searched for by parameterizing the temporal structure of the search space with geometric transformations. GEO-TRAP employs standard geometric transformation operations to reduce the search space for effective gradients into searching for a small group of parameters that define these operations.

black box video classifier, classification model, video classification model, (10 more...)

Industry:

Transportation > Air (0.92)
Information Technology > Security & Privacy (0.68)
Government > Military (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.74)
Information Technology > Artificial Intelligence > Vision (0.70)

arXiv.org Artificial IntelligenceMay-17-2024

Towards Gradient-based Time-Series Explanations through a SpatioTemporal Attention Network

Lee, Min Hun

However, it is not desirable to apply AI fully autonomously as wrong outcomes of AI models in high-stake domains could have serious impacts on people. Regardless of the performance of an AI model, the end-users desire to understand the evidence on the outcome of an AI model [35]. A growing body of research investigates how to generate explanations of an AI model and augment user's decision-making tasks [2, 18, 25]. Researchers have explored various techniques to make AI interpretable and explainable [15]. These explainable AI techniques can be broadly categorized into inherently interpretable models (e.g.

explanation, local view, time-series explanation, (12 more...)

2405.17444

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > Singapore > Central Region > Singapore (0.04)

Genre: Research Report > New Finding (0.47)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
Information Technology > Artificial Intelligence > Natural Language (0.96)

arXiv.org Artificial IntelligenceOct-16-2022

MASTAF: A Model-Agnostic Spatio-Temporal Attention Fusion Network for Few-shot Video Classification

Liu, Rex, Zhang, Huanle, Pirsiavash, Hamed, Liu, Xin

We propose MASTAF, a Model-Agnostic Spatio-Temporal Attention Fusion network for few-shot video classification. MASTAF takes input from a general video spatial and temporal representation,e.g., using 2D CNN, 3D CNN, and Video Transformer. Then, to make the most of such representations, we use self- and cross-attention models to highlight the critical spatio-temporal region to increase the inter-class variations and decrease the intra-class variations. Last, MASTAF applies a lightweight fusion network and a nearest neighbor classifier to classify each query video. We demonstrate that MASTAF improves the state-of-the-art performance on three few-shot video classification benchmarks(UCF101, HMDB51, and Something-Something-V2), e.g., by up to 91.6%, 69.5%, and 60.7% for five-way one-shot video classification, respectively.

artificial intelligence, machine learning, representation, (16 more...)

2112.04585

Country: North America > United States > California > Yolo County > Davis (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)