Goto

Collaborating Authors

 correspondence


Structured Analytic Coherent Point Drift for Non-Rigid Point Set Registration

arXiv.org Machine Learning

Coherent Point Drift (CPD) is a representative probabilistic framework for unsupervised non-rigid point set registration. Its standard non-rigid M-step, however, relies on a point-indexed Gaussian-kernel system whose size grows with the number of moving points, making deformation estimation computationally heavy for large point sets and difficult to control in complexity during registration. To address these limitations, we propose Analytic-CPD, a new unsupervised non-rigid registration framework that gives CPD a structured analytic reformulation. Analytic-CPD preserves the CPD posterior correspondence layer, but lifts the M-step from point-indexed kernel displacement estimation to structured analytic mapping estimation. By coupling the Gaussian-mixture posterior mechanism of CPD with Structured Analytic Mappings (SAM), the method obtains a deformation model whose coefficient dimension is governed by the ambient dimension and analytic order rather than by the number of moving points. More importantly, deformation estimation is organized over an interpretable hierarchy of analytic function spaces, so the analytic order can be increased progressively as posterior correspondences become more reliable. We implement this idea through an increasing-degree continuation strategy with decreasing stage lengths: low-order analytic maps first stabilize the posterior correspondence structure, while higher-order modes later refine nonlinear residual deformation. Experiments on controlled model-matched, smooth model-mismatch, and registered human-shape data demonstrate the effectiveness and favorable accuracy--efficiency performance of Analytic-CPD.


Towards Lower Bounds on the Depth of ReLU Neural Networks

Neural Information Processing Systems

We contribute to a better understanding of the class of functions that is represented by a neural network with ReLU activations and a given architecture. Using techniques from mixed-integer optimization, polyhedral theory, and tropical geometry, we provide a mathematical counterbalance to the universal approximation theorems which suggest that a single hidden layer is sufficient for learning tasks. In particular, we investigate whether the class of exactly representable functions strictly increases by adding more layers (with no restrictions on size). This problem has potential impact on algorithmic and statistical aspects because of the insight it provides into the class of functions represented by neural hypothesis classes. However, to the best of our knowledge, this question has not been investigated in the neural network literature. We also present upper bounds on the sizes of neural networks required to represent functions in these neural hypothesis classes.


NAVI: Category-Agnostic Image Collections with High-Quality 3DShape and Pose Annotations

Neural Information Processing Systems

Recent advances in neural reconstruction enable high-quality 3D object reconstruction from casually captured image collections. Current techniques mostly analyze their progress on relatively simple image collections where Structurefrom-Motion (SfM) techniques can provide ground-truth (GT) camera poses. We note that SfM techniques tend to fail on in-the-wild image collections such as image search results with varying backgrounds and illuminations. To enable systematic research progress on 3D reconstruction from casual image captures, we propose'NAVI': a new dataset of category-agnostic image collections of objects with high-quality 3D scans along with per-image 2D-3D alignments providing near-perfect GT camera parameters. These 2D-3D alignments allow us to extract accurate derivative annotations such as dense pixel correspondences, depth and segmentation maps. We demonstrate the use of NAVI image collections on different problem settings and show that NAVI enables more thorough evaluations that were not possible with existing datasets. We believe NAVI is beneficial for systematic research progress on 3D reconstruction and correspondence estimation.



Shape Non-rigid Kinematics (SNK): AZero-Shot Method for Non-Rigid Shape Matching via Unsupervised Functional Map Regularized Reconstruction

Neural Information Processing Systems

We present Shape Non-rigid Kinematics (SNK), a novel zero-shot method for non-rigid shape matching that eliminates the need for extensive training or ground truth data. SNK operates on a single pair of shapes, and employs a reconstructionbased strategy using an encoder-decoder architecture, which deforms the source shape to closely match the target shape. During the process, an unsupervised functional map is predicted and converted into a point-to-point map, serving as a supervisory mechanism for the reconstruction. To aid in training, we have designed a new decoder architecture that generates smooth, realistic deformations. SNK demonstrates competitive results on traditional benchmarks, simplifying the shapematching process without compromising accuracy. Our code can be found online: https://github.com/pvnieo/SNK.



Modality-Agnostic Topology Aware Localization

Neural Information Processing Systems

This work presents a data-driven approach for the indoor localization of an observer on a 2D topological map of the environment. State-of-the-art techniques may yield accurate estimates only when they are tailor-made for a specific data modality like camera-based system that prevents their applicability to broader domains. Here, we establish a modality-agnostic framework (called OT-Isomap) and formulate the localization problem in the context of parametric manifold learning while leveraging optimal transportation. This framework allows jointly learning a lowdimensional embedding as well as correspondences with a topological map. We examine the generalizability of the proposed algorithm by applying it to data from diverse modalities such as image sequences and radio frequency signals. The experimental results demonstrate decimeter-level accuracy for localization using different sensory inputs.



4b6538a44a1dfdc2b83477cd76dee98e-Supplemental.pdf

Neural Information Processing Systems

In this document, we provide more implementation details of CATs and more results on SPair71k [16], PF-PASCAL [4], and PF-WILLOW [3]. Given resized input images Is,It R256 256 3, we conducted experiments using different feature backbone networks, including DeiT-B [22], DINO [2] and ResNet-101 [5]. For the ResNet-101multi in the paper, we use the best layer subset [15] of (0,8,20,21,26,28,29,30) for SPair-71k, and (2,17,21,22,25,26,28) for PF-PASCAL and PF-WILLOW. We resized the spatial resolution of extracted feature maps to 16 16. The extracted features undergo l-2 normalization and the correlation maps are constructed using dot products.


CATs: Cost Aggregation Transformers for Visual Correspondence

Neural Information Processing Systems

We propose a novel cost aggregation network, called Cost Aggregation Transformers (CATs), to find dense correspondences between semantically similar images with additional challenges posed by large intra-class appearance and geometric variations. Cost aggregation is a highly important process in matching tasks, which the matching accuracy depends on the quality of its output. Compared to handcrafted or CNN-based methods addressing the cost aggregation, in that either lacks robustness to severe deformations or inherit the limitation of CNNs that fail to discriminate incorrect matches due to limited receptive fields, CATs explore global consensus among initial correlation map with the help of some architectural designs that allow us to fully leverage self-attention mechanism. Specifically, we include appearance affinity modeling to aid the cost aggregation process in order to disambiguate the noisy initial correlation maps and propose multi-level aggregation to efficiently capture different semantics from hierarchical feature representations. We then combine with swapping self-attention technique and residual connections not only to enforce consistent matching, but also to ease the learning process, which we find that these result in an apparent performance boost. We conduct experiments to demonstrate the effectiveness of the proposed model over the latest methods and provide extensive ablation studies.