Goto

Collaborating Authors

 rotation


PLOT: Progressive Localization via Optimal Transport in Neural Causal Abstraction

arXiv.org Machine Learning

Causal abstraction offers a principled framework for mechanistic interpretability, aligning a high-level causal model with the low-level computation realized by a neural network through counterfactual intervention analysis. Existing methods such as distributed alignment search (DAS) learn expressive subspace interventions, but the relevant neural site is unknown a priori, so finding a handle requires a computationally burdensome search over candidate sites. We introduce PLOT (Progressive Localization via Optimal Transport), a transport-based framework that localizes causal variables from the output effect geometry of abstract and neural interventions. PLOT fits an optimal transport coupling between abstract variables and candidate neural sites, yielding a global soft correspondence that can be calibrated into intervention handles. In simple settings, a single coupling over individual neurons suffices. In larger models, PLOT is applied progressively, moving from coarse sites such as tokens, timesteps, or layers to finer supports such as coordinate groups or PCA spans, and optionally guiding DAS based on the localized signal. Across experiments of increasing complexity, transport-only PLOT handles are exceedingly fast and competitive on accuracy, while PLOT-guided DAS reaches DAS-level accuracy at a fraction of full DAS runtime, providing an efficient localization engine for causal abstraction research at scale.


BOOOM: Loss-Function-Agnostic Black-Box Optimization over Orthonormal Manifolds for Machine Learning and Statistical Inference

arXiv.org Machine Learning

Optimization over the Stiefel manifold $\mathrm{St}(p,d)$, the set of $p \times d$ column-orthonormal matrices, is fundamental in statistics, machine learning, and scientific computing, yet remains challenging in the presence of non-convex, non-smooth, or black-box objectives. Existing methods largely rely on either convex relaxations or gradient-based Riemannian optimization, limiting applicability in derivative-free and highly multimodal settings. We propose \textsc{BOOOM} (Black-box Optimization Over Orthonormal Manifolds), a general-purpose framework for loss-function-agnostic optimization on $\mathrm{St}(p,d)$. The key idea is a global Givens rotation-based parametrization that maps the manifold to an unconstrained Euclidean angle space while preserving feasibility exactly. Building on this representation, BOOOM employs a structured, parallelizable, derivative-free search based on Recursive Modified Pattern Search, enabling systematic exploration through plane-wise rotations without requiring gradient information and facilitating escape from poor local optima. We establish a unified theoretical framework showing equivalence between angle-space and manifold optimization, transfer of stationarity, and global convergence in probability under mild conditions. Empirical results across diverse problems, including heterogeneous quadratic optimization, low-rank and sparse matrix decomposition, independent component analysis, and orthogonal joint diagonalization, among other widely studied settings, demonstrate strong performance relative to state-of-the-art methods, particularly in non-smooth and highly multimodal regimes. We further illustrate its practical utility through a novel supervised PCA formulation applied to metabolomics data in colorectal cancer.


f8e55d98b0c2569bd0aa25b076e6b3f8-Supplemental-Conference.pdf

Neural Information Processing Systems

Motion Compensation We compare our method to the traditional motion-compensated coding378 approach that forms the core of inter-picture coding in well established compression standards such379 as MPEG. Block matching is an essential component of these standards, allowing the compression of380 video content by up to three orders of magnitude with moderate loss of information. For each block381 in a frame, typical coders search for the most similar spatially displaced block in the previous frame382 (typically measured with MSE), and communicate the displacement coordinates to allow prediction383 of frame content by translating blocks of the (already transmitted) previous frame. We implemented384 a "diamond search" algorithm [29] operating on blocks of 8 8 pixels, with a maximal search385 distance of 8 pixels which balances accuracy of motion estimates and speed of estimation (the search386 step is computationally intensive). We use the estimated displacements to perform causal motion387 compensation (cMC), using displacement vectors estimated from the previous two observed frames388 (xt 1 and xt) to predict the next frame (xt+1) rather than the current one (as in MPEG).389




ARoto translation invariance

Neural Information Processing Systems

A.1 Rotations in 2 dimensions In 2-dimensional settings, there exists a single scalar angular position, the yaw angle ฮธ. In order to perform the transformation, we have to express the angular positions in a format suitable for linear transformations; we do so by transforming them to rotation matrices, perform a matrix multiplication, and then transform the angular positions back to angle format. In 2 dimensions, we use eq. After the rotation, we can convert them back to angle format using the 2-argument arc-tangent function: ฮธ = atan2(sinฮธ,cosฮธ) (14) Simplified rotations In 2 dimensions, the computations can be simplified since rotations commute. First, we show that chained rotations result in angle addition/subtraction, that is: Q(ฮธi) Q(ฮธj) = cosฮธi sinฮธi sinฮธicosฮธi cosฮธj sinฮธj sinฮธjcosฮธj (15) = cosฮธicosฮธj sinฮธisinฮธj cosฮธisinฮธj sinฮธicosฮธj sinฮธicosฮธj +cosฮธisinฮธj sinฮธisinฮธj +cosฮธicosฮธj (16) = cos(ฮธi +ฮธj) sin(ฮธi +ฮธj) sin(ฮธi +ฮธj) cos(ฮธi +ฮธj) (17) = Q(ฮธi +ฮธj) (18) Following the same approach, we compute the inverse rotation: Q (ฮธi) Q(ฮธj) = Q( ฮธi) Q(ฮธj) = Q(ฮธj ฮธi) (19) Thus, instead of rotating the angular positions (expressed in rotation matrix form) using the rotation matrix Q, in practice we perform the transformation directly to the angles via addition/subtraction, and replace the matrix Qwith the identity matrix I1 1.


Roto-translated Local Coordinate Frames For Interacting Dynamical Systems

Neural Information Processing Systems

Modelling interactions is critical in learning complex dynamical systems, namely systems of interacting objects with highly non-linear and time-dependent behaviour. A large class of such systems can be formalized as geometric graphs, i.e., graphs with nodes positioned in the Euclidean space given an arbitrarily chosen global coordinate system, for instance vehicles in a traffic scene. Notwithstanding the arbitrary global coordinate system, the governing dynamics of the respective dynamical systems are invariant to rotations and translations, also known as Galilean invariance. As ignoring these invariances leads to worse generalization, in this work we propose local coordinate frames per node-object to induce roto-translation invariance to the geometric graph of the interacting dynamical system. Further, the local coordinate frames allow for a natural definition of anisotropic filtering in graph neural networks. Experiments in traffic scenes, 3D motion capture, and colliding particles demonstrate that the proposed approach comfortably outperforms the recent state-of-the-art.


Hyperbolic Procrustes Analysis Using Riemannian Geometry

Neural Information Processing Systems

Label-free alignment between datasets collected at different times, locations, or by different instruments is a fundamental scientific task. Hyperbolic spaces have recently provided a fruitful foundation for the development of informative representations of hierarchical data. Here, we take a purely geometric approach for label-free alignment of hierarchical datasets and introduce hyperbolic Procrustes analysis (HPA). HPA consists of new implementations of the three prototypical Procrustes analysis components: translation, scaling, and rotation, based on the Riemannian geometry of the Lorentz model of hyperbolic space. We analyze the proposed components, highlighting their useful properties for alignment. The efficacy of HPA, its theoretical properties, stability and computational efficiency are demonstrated in simulations.