AITopics | video understanding

Collaborating Authors

video understanding

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild

Neural Information Processing SystemsMar-17-2026, 07:54:20 GMT

This paper addresses the problem of 3D human pose estimation in the wild. A significant challenge is the lack of training data, i.e., 2D images of humans annotated with 3D poses. Such data is necessary to train state-of-the-art CNN architectures. Here, we propose a solution to generate a large set of photorealistic synthetic images of humans with 3D pose annotations. We introduce an image-based synthesis engine that artificially augments a dataset of real images with 2D human pose annotations using 3D Motion Capture (MoCap) data.

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.80)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.71)

Add feedback

A Benchmark Dataset for Event-Guided Human Pose Estimation and Tracking in Extreme Conditions

Neural Information Processing SystemsFeb-18-2026, 16:45:32 GMT

Multi-person pose estimation and tracking have been actively researched by the computer vision community due to their practical applicability. However, existing human pose estimation and tracking datasets have only been successful in typical scenarios, such as those without motion blur or with well-lit conditions.

artificial intelligence, dataset, machine learning, (12 more...)

Neural Information Processing Systems

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.68)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Continuous Heatmap Regression for Pose Estimation via Implicit Neural Representation

Neural Information Processing SystemsFeb-17-2026, 17:48:39 GMT

Heatmap regression has dominated human pose estimation due to its superior performance and strong generalization.

artificial intelligence, machine learning, representation, (19 more...)

Neural Information Processing Systems

Country: Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding Xinyu Fang

Neural Information Processing SystemsFeb-17-2026, 02:45:14 GMT

The advent of large vision-language models (L VLMs) has spurred research into their applications in multi-modal contexts, particularly in video understanding.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Law (0.67)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.72)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Animal-Bench: Benchmarking Multimodal Video Models for Animal-centric Video Understanding

Neural Information Processing SystemsFeb-16-2026, 15:02:57 GMT

With the emergence of large pre-trained multimodal video models, multiple benchmarks have been proposed to evaluate model capabilities.

benchmark, large language model, machine learning, (17 more...)

Neural Information Processing Systems

Country: Asia > China > Beijing > Beijing (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.64)

Add feedback

Diversifying Spatial-Temporal Perception for Video Domain Generalization Kun-Y u Lin

Neural Information Processing SystemsFeb-16-2026, 13:38:50 GMT

However, existing video classification models rely on the i.i.d.

artificial intelligence, computer vision, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > China (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.36)

Add feedback

Synthetic-to-Real Pose Estimation with Geometric Reconstruction Qiuxia Lin 1 Kerui Gu1 Linlin Y ang 2, 3 Angela Y ao 1 1

Neural Information Processing SystemsFeb-16-2026, 09:31:30 GMT

The warping estimation module W is based on an hourglass with five conv3 3 - bn - relu - pool2 2 in the encoders and five upsample2 2 - conv3 3 - bn - relu blocks in the decoders. In G, we use the Johnson architecture [ 3 ] with two down-sampling blocks, six residual-blocks and two up-sampling blocks. The design follows [ 7 ]. The inputs are the base image, displacement field, and inpainting map. It downsampled 4 and upsampled 4 to get the output, i.e. the reconstructed image.

artificial intelligence, geometric reconstruction qiuxia lin 1, video understanding, (13 more...)

Neural Information Processing Systems

Country: Asia > Singapore (0.05)

Technology: Information Technology > Artificial Intelligence > Vision > Video Understanding (0.43)

Add feedback

Synthetic-to-Real Pose Estimation with Geometric Reconstruction Qiuxia Lin 1 Kerui Gu1 Linlin Y ang 2, 3 Angela Y ao 1 1

Neural Information Processing SystemsFeb-16-2026, 09:31:26 GMT

Pose estimation is remarkably successful under supervised learning, but obtaining annotations, especially for new deployments, is costly and time-consuming. This work tackles adapting models trained on synthetic data to real-world target domains with only unlabelled data. A common approach is model fine-tuning with pseudo-labels from the target domain; yet many pseudo-labelling strategies cannot provide sufficient high-quality pose labels. This work proposes a reconstruction-based strategy as a complement to pseudo-labelling for synthetic-to-real domain adaptation. We generate the driving image by geometrically transforming a base image according to the predicted keypoints and enforce a reconstruction loss to refine the predictions. It provides a novel solution to effectively correct confident yet inaccurate keypoint locations through image reconstruction in domain adaptation. Our approach outperforms the previous state-of-the-arts by 8% for PCK on four large-scale hand and human real-world datasets. In particular, we excel on endpoints such as fingertips and head, with 7.2% and 29.9% improvements in PCK.

artificial intelligence, machine learning, reconstruction, (20 more...)

Neural Information Processing Systems

Country: Asia > Singapore (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology: