Goto

Collaborating Authors

 feature point



SAPA: Similarity-Aware Point Affiliation for Feature Upsampling

Neural Information Processing Systems

We introduce point affiliation into feature upsampling, a notion that describes the affiliation of each upsampled point to a semantic cluster formed by local decoder feature points with semantic similarity. By rethinking point affiliation, we present a generic formulation for generating upsampling kernels. The kernels encourage not only semantic smoothness but also boundary sharpness in the upsampled feature maps. Such properties are particularly useful for some dense prediction tasks such as semantic segmentation. The key idea of our formulation is to generate similarity-aware kernels by comparing the similarity between each encoder feature point and the spatially associated local region of decoder features. In this way, the encoder feature point can function as a cue to inform the semantic cluster of upsampled feature points. To embody the formulation, we further instantiate a lightweight upsampling operator, termed Similarity-Aware Point Affiliation (SAPA), and investigate its variants. SAPA invites consistent performance improvements on a number of dense prediction tasks, including semantic segmentation, object detection, depth estimation, and image matting. Code is available at: https://github.com/poppinace/sapa


ADA-DPM: A Neural Descriptors-based Adaptive Noise Filtering Strategy for SLAM

Shao, Yongxin, Tan, Aihong, Wang, Binrui, Jin, Yinlian, Guan, Licong, Liao, Peng

arXiv.org Artificial Intelligence

Lidar SLAM plays a significant role in mobile robot navigation and high-definition map construction. However, existing methods often face a trade-off between localization accuracy and system robustness in scenarios with a high proportion of dynamic objects, point cloud distortion, and unstructured environments. To address this issue, we propose a neural descriptors-based adaptive noise filtering strategy for SLAM, named ADA-DPM, which improves the performance of localization and mapping tasks through three key technical innovations. Firstly, to tackle dynamic object interference, we design the Dynamic Segmentation Head to predict and filter out dynamic feature points, eliminating the ego-motion interference caused by dynamic objects. Secondly, to mitigate the impact of noise and unstructured feature points, we propose the Global Importance Scoring Head that adaptively selects high-contribution feature points while suppressing the influence of noise and unstructured feature points. Moreover, we introduce the Cross-Layer Graph Convolution Module (GLI-GCN) to construct multi-scale neighborhood graphs, fusing local structural information across different scales and improving the discriminative power of overlapping features. Finally, experimental validations on multiple public datasets confirm the effectiveness of ADA-DPM.


SAPA: Similarity-Aware Point Affiliation for Feature Upsampling

Neural Information Processing Systems

We introduce point affiliation into feature upsampling, a notion that describes the affiliation of each upsampled point to a semantic cluster formed by local decoder feature points with semantic similarity. By rethinking point affiliation, we present a generic formulation for generating upsampling kernels. The kernels encourage not only semantic smoothness but also boundary sharpness in the up-sampled feature maps. Such properties are particularly useful for some dense prediction tasks such as semantic segmentation. The key idea of our formulation is to generate similarity-aware kernels by comparing the similarity between each encoder feature point and the spatially associated local region of decoder features.



Robust Pose Estimation in Crowded Scenes with Direct Pose-Level Inference Supplementary Materials

Neural Information Processing Systems

In Sec. 3 we present the overall objective of PINet, here we provide more details about the groundtruth Here we also follow [2] to take the maximum of confidence maps. Here we give the details of how to sample pose for PR refinement. We visualize some crowded scenes pose estimation results on both OCHuman and CrowdPose, as shown in Figure 1 and Figure 1.


Towards Fine-Grained Interpretability: Counterfactual Explanations for Misclassification with Saliency Partition

Zhang, Lintong, Yin, Kang, Lee, Seong-Whan

arXiv.org Artificial Intelligence

Attribution-based explanation techniques capture key patterns to enhance visual interpretability; however, these patterns often lack the granularity needed for insight in fine-grained tasks, particularly in cases of model misclassifica-tion, where explanations may be insufficiently detailed. T o address this limitation, we propose a fine-grained counterfactual explanation framework that generates both object-level and part-level interpretability, addressing two fundamental questions: (1) which fine-grained features contribute to model misclassification, and (2) where dominant local features influence counterfactual adjustments. Our approach yields explainable counterfactuals in a non-generative manner by quantifying similarity and weighting component contributions within regions of interest between correctly classified and misclassified samples. Furthermore, we introduce a saliency partition module grounded in Shapley value contributions, isolating features with region-specific relevance. Extensive experiments demonstrate the superiority of our approach in capturing more granular, intuitively meaningful regions, surpassing fine-grained methods.


Supplementary Material

Neural Information Processing Systems

Figure 1: Five large-scale scenes rendered in real-time using UE4-NeRF . Each scene can be rendered in real-time using UE4-NeRF. In Figure 2, we have provided additional qualitative comparisons with MVS. MVS utilizes sparse reconstruction to extract feature points, which are then expanded based on morphological and color differences to generate a dense point cloud. This dense point cloud is further used for surface reconstruction, resulting in triangulated meshes.


Robust Pose Estimation in Crowded Scenes with Direct Pose-Level Inference Supplementary Materials

Neural Information Processing Systems

In Sec. 3 we present the overall objective of PINet, here we provide more details about the groundtruth Here we also follow [2] to take the maximum of confidence maps. Here we give the details of how to sample pose for PR refinement. We visualize some crowded scenes pose estimation results on both OCHuman and CrowdPose, as shown in Figure 1 and Figure 1.


PL-VIWO2: A Lightweight, Fast and Robust Visual-Inertial-Wheel Odometry Using Points and Lines

Zhang, Zhixin, Zhao, Liang, Ladosz, Pawel

arXiv.org Artificial Intelligence

Vision-based odometry has been widely adopted in autonomous driving owing to its low cost and lightweight setup; however, its performance often degrades in complex outdoor urban environments. To address these challenges, we propose PL-VIWO2, a filter-based visual-inertial-wheel odometry system that integrates an IMU, wheel encoder, and camera (supporting both monocular and stereo) for long-term robust state estimation. The main contributions are: (i) a novel line feature processing framework that exploits the geometric relationship between 2D feature points and lines, enabling fast and robust line tracking and triangulation while ensuring real-time performance; (ii) an SE(2)-constrained SE(3) wheel pre-integration method that leverages the planar motion characteristics of ground vehicles for accurate wheel updates; and (iii) an efficient motion consistency check (MCC) that filters out dynamic features by jointly using IMU and wheel measurements. Extensive experiments on Monte Carlo simulations and public autonomous driving datasets demonstrate that PL-VIWO2 outperforms state-of-the-art methods in terms of accuracy, efficiency, and robustness.