Goto

Collaborating Authors

 coordinate system





Generalised Implicit Neural Representations

Neural Information Processing Systems

We consider the problem of learning implicit neural representations (INRs) for signals on non-Euclidean domains. In the Euclidean case, INRs are trained on a discrete sampling of a signal over a regular lattice. Here, we assume that the continuous signal exists on some unknown topological space from which we sample a discrete graph.In the absence of a coordinate system to identify the sampled nodes, we propose approximating their location with a spectral embedding of the graph. This allows us to train INRs without knowing the underlying continuous domain, which is the case for most graph signals in nature, while also making the INRs independent of any choice of coordinate system. We show experiments with our method on various real-world signals on non-Euclidean domains.


Gauge Equivariant Transformer

Neural Information Processing Systems

Attention mechanism has shown great performance and efficiency in a lot of deep learning models, in which relative position encoding plays a crucial role. However, when introducing attention to manifolds, there is no canonical local coordinate system to parameterize neighborhoods. To address this issue, we propose an equivariant transformer to make our model agnostic to the orientation of local coordinate systems (\textit{i.e.}, gauge equivariant), which employs multi-head self-attention to jointly incorporate both position-based and content-based information. To enhance expressive ability, we adopt regular field of cyclic groups as feature fields in intermediate layers, and propose a novel method to parallel transport the feature vectors in these fields. In addition, we project the position vector of each point onto its local coordinate system to disentangle the orientation of the coordinate system in ambient space (\textit{i.e.}, global coordinate system), achieving rotation invariance. To the best of our knowledge, we are the first to introduce gauge equivariance to self-attention, thus name our model Gauge Equivariant Transformer (GET), which can be efficiently implemented on triangle meshes. Extensive experiments show that GET achieves state-of-the-art performance on two common recognition tasks.


PolarStream: Streaming Object Detection and Segmentation with Polar Pillars

Neural Information Processing Systems

Recent works recognized lidars as an inherently streaming data source and showed that the end-to-end latency of lidar perception models can be reduced significantly by operating on wedge-shaped point cloud sectors rather then the full point cloud. However, due to use of cartesian coordinate systems these methods represent the sectors as rectangular regions, wasting memory and compute. In this work we propose using a polar coordinate system and make two key improvements on this design. First, we increase the spatial context by using multi-scale padding from neighboring sectors: preceding sector from the current scan and/or the following sector from the past scan. Second, we improve the core polar convolutional architecture by introducing feature undistortion and range stratified convolutions. Experimental results on the nuScenes dataset show significant improvements over other streaming based methods. We also achieve comparable results to existing non-streaming methods but with lower latencies.


Neural Symplectic Form: Learning Hamiltonian Equations on General Coordinate Systems

Neural Information Processing Systems

In recent years, substantial research on the methods for learning Hamiltonian equations has been conducted. Although these approaches are very promising, the commonly used representation of the Hamilton equation uses the generalized momenta, which are generally unknown. Therefore, the training data must be represented in this unknown coordinate system, and this causes difficulty in applying the model to real data. Meanwhile, Hamiltonian equations also have a coordinate-free expression that is expressed by using the symplectic 2-form. In this study, we propose a model that learns the symplectic form from data using neural networks, thereby providing a method for learning Hamiltonian equations from data represented in general coordinate systems, which are not limited to the generalized coordinates and the generalized momenta. Consequently, the proposed method is capable not only of modeling target equations of both Hamiltonian and Lagrangian formalisms but also of extracting unknown Hamiltonian structures hidden in the data. For example, many polynomial ordinary differential equations such as the Lotka-Volterra equation are known to admit non-trivial Hamiltonian structures, and our numerical experiments show that such structures can be certainly learned from data. Technically, each symplectic 2-form is associated with a skew-symmetric matrix, but not all skew-symmetric matrices define the symplectic 2-form. In the proposed method, using the fact that symplectic 2-forms are derived as the exterior derivative of certain differential 1-forms, we model the differential 1-form by neural networks, thereby improving the efficiency of learning.


Roto-translated Local Coordinate Frames For Interacting Dynamical Systems

Neural Information Processing Systems

Modelling interactions is critical in learning complex dynamical systems, namely systems of interacting objects with highly non-linear and time-dependent behaviour. A large class of such systems can be formalized as $\textit{geometric graphs}$, $\textit{i.e.}$ graphs with nodes positioned in the Euclidean space given an $\textit{arbitrarily}$ chosen global coordinate system, for instance vehicles in a traffic scene. Notwithstanding the arbitrary global coordinate system, the governing dynamics of the respective dynamical systems are invariant to rotations and translations, also known as $\textit{Galilean invariance}$. As ignoring these invariances leads to worse generalization, in this work we propose local coordinate systems per node-object to induce roto-translation invariance to the geometric graph of the interacting dynamical system. Further, the local coordinate systems allow for a natural definition of anisotropic filtering in graph neural networks. Experiments in traffic scenes, 3D motion capture, and colliding particles demonstrate the proposed approach comfortably outperforms the recent state-of-the-art.


Minimization of Functions on Dually Flat Spaces Using Geodesic Descent Based on Dual Connections

Omiya, Gaku, Komaki, Fumiyasu

arXiv.org Machine Learning

We propose geodesic-based optimization methods on dually flat spaces, where the geometric structure of the parameter manifold is closely related to the form of the objective function. A primary application is maximum likelihood estimation in statistical models, especially exponential families, whose model manifolds are dually flat. We show that an m-geodesic update, which directly optimizes the log-likelihood, can theoretically reach the maximum likelihood estimator in a single step. In contrast, an e-geodesic update has a practical advantage in cases where the parameter space is geodesically complete, allowing optimization without explicitly handling parameter constraints. We establish the theoretical properties of the proposed methods and validate their effectiveness through numerical experiments.


Two-Stage Camera Calibration Method for Multi-Camera Systems Using Scene Geometry

Abramov, Aleksandr

arXiv.org Artificial Intelligence

Calibration of multi-camera systems is a key task for accurate object tracking. However, it remains a challenging problem in real-world conditions, where traditional methods are not applicable due to the lack of accurate floor plans, physical access to place calibration patterns, or synchronized video streams. This paper presents a novel two-stage calibration method that overcomes these limitations. In the first stage, partial calibration of individual cameras is performed based on an operator's annotation of natural geometric primitives (parallel, perpendicular, and vertical lines, or line segments of equal length). This allows estimating key parameters (roll, pitch, focal length) and projecting the camera's Effective Field of View (EFOV) onto the horizontal plane in a base 3D coordinate system. In the second stage, precise system calibration is achieved through interactive manipulation of the projected EFOV polygons. The operator adjusts their position, scale, and rotation to align them with the floor plan or, in its absence, using virtual calibration elements projected onto all cameras in the system. This determines the remaining extrinsic parameters (camera position and yaw). Calibration requires only a static image from each camera, eliminating the need for physical access or synchronized video. The method is implemented as a practical web service. Comparative analysis and demonstration videos confirm the method's applicability, accuracy, and flexibility, enabling the deployment of precise multi-camera tracking systems in scenarios previously considered infeasible.