AITopics | depth estimation

Collaborating Authors

depth estimation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ff887781480973bd3cb6026feb378d1e-Paper-Conference.pdf

Neural Information Processing SystemsJun-23-2026, 12:40:59 GMT

This based paper on pix presents el-space Pixel-P diffusion erfect generation Depth that, a monocular produces high-quality depth estimation, flying-pix model elfree point clouds from estimated depth maps. Current generative depth estimation models they require fine-tune a VAE Stable to compre Diffusion ss depth and maps achiev into e impressi the latent ve performance.

arxiv preprint arxiv, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

RPG360: Robust 360 Depth Estimation with Perspective Foundation Models and Graph Optimization

Neural Information Processing SystemsJun-23-2026, 03:58:01 GMT

The increasing use of 360 images across various domains has emphasized the need for robust depth estimation techniques tailored for omnidirectional images. However, obtaining large-scale labeled datasets for 360 depth estimation remains a significant challenge. In this paper, we propose RPG360, a training-free robust 360 monocular depth estimation method that leverages perspective foundation models and graph optimization. Our approach converts 360 images into sixface cubemap representations, where a perspective foundation model is employed to estimate depth and surface normals. To address depth scale inconsistencies across different faces of the cubemap, we introduce a novel depth scale alignment technique using graph-based optimization, which parameterizes the predicted depth and normal maps while incorporating an additional per-face scale parameter. This optimization ensures depth scale consistency across the six-face cubemap while preserving 3D structural integrity. Furthermore, as foundation models exhibit inherent robustness in zero-shot settings, our method achieves superior performance across diverse datasets, including Matterport3D, Stanford2D3D, and 360Loc. We also demonstrate the versatility of our depth estimation approach by validating its benefits in downstream tasks such as feature matching 3.2 5.4% and Structure from Motion 0.2 9.7% in AUC@5 .

artificial intelligence, depth estimation, image understanding, (15 more...)

Neural Information Processing Systems

Country: North America (0.28)

Genre: Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)

Add feedback

In VG VG VG VG In DU DU CU In CU MAOu MAOu MA MA In In ppppprrStStTTSSSSGGGGssuuuuutttt33tttttTTTT3R3R3R3R33RR+++RR+++OOOuuuOurrss

Neural Information Processing SystemsJun-22-2026, 23:33:22 GMT

The family of feed-forward reconstruction model regresses pointmap of all input images to a reference frame coordinate system, along with other auxiliary outputs, in a single forward pass. However, we find that current models struggle with fine geometry and robustness due to (i) the scarcity of high-fidelity depth and pose supervision and (ii) the inherent geometric misalignment from multi-view pointmap regression. Fin3R jointly tackles two issues with an extra lightweight fine-tuning step. We freeze the decoder, which handles view matching, and fine-tune only the image encoder--the component dedicated to feature extraction. The encoder is enriched with fine geometric details distilled from a strong monocular teacher model on large, unlabeled datasets, using a custom, lightweight LoRA adapter.

artificial intelligence, estimation, machine learning, (16 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Event-Driven Dynamic Scene Depth Completion

Neural Information Processing SystemsJun-22-2026, 20:56:42 GMT

Depth completion in dynamic scenes poses significant challenges due to rapid ego-motion and object motion, which can severely degrade the quality of input modalities such as RGB images and LiDAR measurements. Conventional RGB-D sensors often struggle to align precisely and capture reliable depth under such conditions. In contrast, event cameras with their high temporal resolution and sensitivity to motion at the pixel level provide complementary cues that are beneficial in dynamic environments. To this end, we propose EventDC, the first event-driven depth completion framework. It consists of two key components: Event-Modulated Alignment (EMA) and Local Depth Filtering (LDF). Both modules adaptively learn the two fundamental components of convolution operations: offsets and weights conditioned on motion-sensitive event streams. In the encoder, EMA leverages events to modulate the sampling positions of RGB-D features to achieve pixel redistribution for improved alignment and fusion.

depth completion, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Sensing and Signal Processing (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.68)

Add feedback

Distil-E2D: Distilling Image-to-Depth Priors for Event-Based Monocular Depth Estimation

Neural Information Processing SystemsJun-22-2026, 05:28:03 GMT

Event cameras are neuromorphic vision sensors that asynchronously capture pixellevel intensity changes with high temporal resolution and dynamic range. These make them well suited for monocular depth estimation under challenging lighting conditions. However, progress in event-based monocular depth estimation remains constrained by the quality of supervision: LiDAR-based depth labels are inherently sparse, spatially incomplete, and prone to artifacts. Consequently, these signals are suboptimal for learning dense depth from sparse events. To address this problem, we propose Distil-E2D, a framework that distills depth priors from the image domain into the event domain by generating dense synthetic pseudolabels from co-recorded APS or RGB frames using foundational depth models. These pseudolabels complement sparse LiDAR depths with dense semantically rich supervision informed by large-scale image-depth datasets. To reconcile discrepancies between synthetic and real depths, we introduce a Confidence-Guided Calibrated Depth Loss that learns nonlinear depth alignment and adaptively weights supervision by alignment confidence. Additionally, our architecture integrates past predictions via a Context Transformer and employs a Dual-Decoder Training scheme that enhances encoder representations by jointly learning metric and relative depth abstractions. Experiments on benchmark datasets show that Distil-E2D achieves state-of-the-art performance in event-based monocular depth estimation across both event-only and event+APS settings.

artificial intelligence, image understanding, machine learning, (19 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Overview (0.67)

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

More Than Generation Unifying Generation and Depth Estimation via Text to Image Diffusion Models

Neural Information Processing SystemsJun-21-2026, 09:47:30 GMT

Generative depth estimation methods leverage the rich visual priors stored in pre-trained text-to-image diffusion models, demonstrating astonishing zero-shot capability.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

PolypSense3D: AMulti-Source Benchmark Dataset for Depth-Aware Polyp Size Measurement in Endoscopy

Neural Information Processing SystemsJun-18-2026, 12:27:55 GMT

Accurate polyp sizing during endoscopy is crucial for cancer risk assessment but is hindered by subjective methods and inadequate datasets lacking integrated 2D appearance, 3D structure, and real-world size information. We introduce PolypSense3D, the first multi-source benchmark dataset specifically targeting depth-aware polyp size measurement. It uniquely integrates over 43,000 frames from virtual simulations, physical phantoms, and clinical sequences, providing synchronized RGB, dense/sparse depth, segmentation masks, camera parameters, and millimeter-scale size labels derived via a novel forceps-assisted in-vivo annotation technique. To establish its value, we benchmark state-of-the-art segmentation and depth estimation models. Results quantify significant domain gaps between simulated/phantom and clinical data and reveal substantial error propagation from perception stages to final size estimation, with the best fully automated pipelines achieving an average Mean Absolute Error (MAE) of 0.95 mm on the clinical data subset. Publicly released under CCBY-SA 4.0 with code and evaluation protocols, PolypSense3D offers a standardized platform to accelerate research in robust, clinically relevant quantitative endoscopic vision.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
Asia > China (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Gastroenterology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Jasmine: Harnessing Diffusion Prior for Self-Supervised Depth Estimation

Neural Information Processing SystemsJun-18-2026, 11:20:57 GMT

In this paper, we propose Jasmine, the first Stable Diffusion (SD)-based selfsupervised framework for monocular depth estimation, which effectively harnesses SD's visual priors to enhance the sharpness and generalization of unsupervised prediction. Previous SD-based methods are all supervised since adapting diffusion models for dense prediction requires high-precision supervision. In contrast, selfsupervised reprojection suffers from inherent challenges (e.g., occlusions, textureless regions, illumination variance), and the predictions exhibit blurs and artifacts that severely compromise SD's latent priors. To resolve this, we construct a novel surrogate task of mix-batch image reconstruction. Without any additional supervision, it preserves the detail priors of SD models by reconstructing the images themselves while preventing depth estimation from degradation. Furthermore, to address the inherent misalignment between SD's scale and shift invariant estimation and self-supervised scale-invariant depth estimation, we build the Scale-Shift GRU. It not only bridges this distribution gap but also isolates the fine-grained texture of SD output against the interference of reprojection loss. Extensive experiments demonstrate that Jasmine achieves SoTA performance on the KITTI benchmark and exhibits superior zero-shot generalization across multiple datasets. Project page and code are available at here.

artificial intelligence, image understanding, machine learning, (19 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

EAG3R: Event-Augmented 3DGeometry Estimation for Dynamic and Extreme-Lighting Scenes

Neural Information Processing SystemsJun-18-2026, 02:57:10 GMT

Robust 3D geometry estimation from videos is critical for applications such as autonomous navigation, SLAM, and 3D scene reconstruction. Recent methods like DUSt3R demonstrate that regressing dense pointmaps from image pairs enables accurate and efficient pose-free reconstruction.

computer vision, large language model, machine learning, (17 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Media (0.46)
Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
(3 more...)

Add feedback

Video Depth Estimation ModelCover FigureMerge360!imageto video

Neural Information Processing SystemsJun-16-2026, 22:41:30 GMT

To mitigate the distortions brought by equirectangular projection, existing methods typically divide 360 images into distortion-less perspective patches. However, since these patches are processed independently, depth inconsistencies are often introduced due to scale drift among patches. Recently, video depth estimation (VDE) models have leveraged temporal consistency for stable depth predictions across frames. Inspired by this, we propose to represent a 360 image as a sequence of perspective frames, mimicking the viewpoint adjustments users make when exploring a 360 scenario in virtual reality. Thus, the spatial consistency among perspective depth patches can be enhanced by exploiting the temporal consistency inherent in VDE models. To this end, we introduce a training-free pipeline for 360 monocular depth estimation, called ST2360D.

artificial intelligence, depth estimation, image understanding, (16 more...)

Neural Information Processing Systems

Country: Asia > China (0.46)

Genre: