AITopics | temporal consistency

Collaborating Authors

temporal consistency

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

One-Step Diffusion for Detail-Rich and Temporally Consistent Video Super-Resolution

Neural Information Processing SystemsJun-23-2026, 04:08:52 GMT

It is a challenging problem to reproduce rich spatial details while maintaining temporal consistency in real-world video super-resolution (Real-VSR), especially when we leverage pre-trained generative models such as stable diffusion (SD) for realistic details synthesis.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Instant Video Models: Universal Adapters for Stabilizing Image-Based Networks

Neural Information Processing SystemsJun-20-2026, 22:22:12 GMT

When applied sequentially to video, frame-based networks often exhibit temporal inconsistency--for example, outputs that flicker between frames. This problem is amplified when the network inputs contain time-varying corruptions. In this work, we introduce a general approach for adapting frame-based models for stable and robust inference on video. We describe a class of stability adapters that can be inserted into virtually any architecture and a resource-efficient training process that can be performed with a frozen base network. We introduce a unified conceptual framework for describing temporal stability and corruption robustness, centered on a proposed accuracy-stability-robustness loss. By analyzing the theoretical properties of this loss, we identify the conditions where it produces well-behaved stabilizer training.

machine learning, natural language, stabilizer, (18 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Information Technology (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

PPMStereo: Pick-and-Play Memory Construction for Consistent Dynamic Stereo Matching

Neural Information Processing SystemsJun-19-2026, 13:12:44 GMT

Temporally consistent depth estimation from stereo video is critical for real-world applications such as augmented reality, where inconsistent depth estimation disrupts the immersion of users. Despite its importance, this task remains challenging due to the difficulty in modeling long-term temporal consistency in a computationally efficient manner. Previous methods attempt to address this by aggregating spatio-temporal information but face a fundamental trade-off: limited temporal modeling provides only modest gains, whereas capturing long-range dependencies significantly increases computational cost. To address this limitation, we introduce a memory buffer for modeling long-range spatio-temporal consistency while achieving efficient dynamic stereo matching. Inspired by the two-stage decision-making process in humans, we propose a Pick-and-Play Memory (PPM) construction module for dynamic Stereo matching, dubbed as PPMStereo. PPM consists of a'pick' process that identifies the most relevant frames and a'play' process that weights the selected frames adaptively for spatio-temporal aggregation. This two-stage collaborative process maintains a compact yet highly informative memory buffer while achieving temporally consistent information aggregation.

artificial intelligence, computer vision, machine learning, (14 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

VividFace: ARobost and High-Fidelity Video Face Swapping Framework

Neural Information Processing SystemsJun-19-2026, 08:11:37 GMT

Video face swapping has seen increasing adoption in diverse applications, yet existing methods primarily trained on static images struggle to address temporal consistency and complex real-world scenarios. To overcome these limitations, we propose the first video face swapping framework, VividFace, a robust and high-fidelity diffusion-based framework. VividFace employs a novel hybrid training strategy that leverages abundant static image data alongside temporal video sequences, enabling it to effectively model temporal coherence and identity consistency in videos. Central to our approach is a carefully designed diffusion model integrated with a specialized VAE, capable of processing image-video hybrid data efficiently. To further enhance identity and pose disentanglement, we introduce and release the Attribute-Identity Disentanglement Triplet (AIDT) dataset, comprising a large-scale collection of triplets where each set contains three face images--two sharing the same pose and two sharing the same identity. Augmented comprehensively with occlusion scenarios, AIDT significantly boosts the robustness of VividFace against occlusions.

artificial intelligence, machine learning, natural language, (14 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Media (0.68)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)

Add feedback

Video Depth Estimation ModelCover FigureMerge360!imageto video

Neural Information Processing SystemsJun-16-2026, 22:41:30 GMT

To mitigate the distortions brought by equirectangular projection, existing methods typically divide 360 images into distortion-less perspective patches. However, since these patches are processed independently, depth inconsistencies are often introduced due to scale drift among patches. Recently, video depth estimation (VDE) models have leveraged temporal consistency for stable depth predictions across frames. Inspired by this, we propose to represent a 360 image as a sequence of perspective frames, mimicking the viewpoint adjustments users make when exploring a 360 scenario in virtual reality. Thus, the spatial consistency among perspective depth patches can be enhanced by exploiting the temporal consistency inherent in VDE models. To this end, we introduce a training-free pipeline for 360 monocular depth estimation, called ST2360D.

artificial intelligence, depth estimation, image understanding, (16 more...)

Neural Information Processing Systems

Country: Asia > China (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (0.84)
Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.66)

Add feedback

More effort is needed to protect pedestrian privacy in the era of AI

Neural Information Processing SystemsJun-15-2026, 09:51:07 GMT

In the era of artificial intelligence (AI), pedestrian privacy is increasingly at risk. In research areas such as autonomous driving, computer vision, and surveillance, large datasets are often collected in public spaces, capturing pedestrians without consent or anonymization. These datasets are used to train systems that can identify, track, and analyze individuals, often without their knowledge. Although various technical methods and regional regulations have been proposed to address this issue, existing solutions are either insufficient to protect privacy or compromise data utility, thereby limiting their effectiveness for research. In this paper, we argue that more effort is needed to protect pedestrian privacy in the era of AI while maintaining data utility. We call on the AI and computer vision communities to take pedestrian privacy seriously and to rethink how pedestrian data are collected and anonymized. Collaboration with experts in law and ethics will also be essential for the responsible development of AI. Without stronger action, it will become increasingly difficult for individuals to protect their privacy, and public trust in AI may decline.

artificial intelligence, deep learning, machine learning, (19 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom (0.28)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Transportation > Ground > Road (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.66)

Add feedback

Diffusion4D: Fast Spatial-temporal Consistent 4D generation via Video Diffusion Models

Neural Information Processing SystemsMar-22-2026, 11:34:54 GMT

The availability of large-scale multimodal datasets and advancements in diffusion models have significantly accelerated progress in 4D content generation. Most prior approaches rely on multiple images or video diffusion models, utilizing score distillation sampling for optimization or generating pseudo novel views for direct supervision. However, these methods are hindered by slow optimization speeds and multi-view inconsistency issues. Spatial and temporal consistency in 4D geometry has been extensively explored respectively in 3D-aware diffusion models and traditional monocular video diffusion models. Building on this foundation, we propose a strategy to migrate the temporal consistency in video diffusion models to the spatial-temporal consistency required for 4D generation.

artificial intelligence, diffusion model, machine learning, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback