uniqueness
Quantifying Modeling Interactions An Information Decomposition Framework
The recent explosion of interest in multimodal applications has resulted in a wide selection of datasets and methods for representing and integrating information from different modalities. Despite these empirical advances, there remain fundamental research questions: How can we quantify the interactions that are necessary to solve a multimodal task? Subsequently, what are the most suitable multimodal models to capture these interactions? To answer these questions, we propose an information-theoretic approach to quantify the degree of redundancy, uniqueness, and synergy relating input modalities with an output task. We term these three measures as the PID statistics of a multimodal distribution (or PID for short), and introduce two new estimators for these PID statistics that scale to high-dimensional distributions. To validate PID estimation, we conduct extensive experiments on both synthetic datasets where the PID is known and on large-scale multimodal benchmarks where PID estimations are compared with human annotations. Finally, we demonstrate their usefulness in (1) quantifying interactions within multimodal datasets, (2) quantifying interactions captured by multimodal models, (3) principled approaches for model selection, and (4) three real-world case studies engaging with domain experts in pathology, mood prediction, and robotic perception where our framework helps to recommend strong multimodal models for each application.
Phase transitions in Doi-Onsager, Noisy Transformer, and other multimodal models
Mun, Kyunghoo, Rosenzweig, Matthew
We study phase transitions for repulsive-attractive mean-field free energies on the circle. For a $\frac{1}{n+1}$-periodic interaction whose Fourier coefficients satisfy a certain decay condition, we prove that the critical coupling strength $K_c$ coincides with the linear stability threshold $K_\#$ of the uniform distribution and that the phase transition is continuous, in the sense that the uniform distribution is the unique global minimizer at criticality. The proof is based on a sharp coercivity estimate for the free energy obtained from the constrained Lebedev--Milin inequality. We apply this result to three motivating models for which the exact value of the phase transition and its (dis)continuity in terms of the model parameters was not fully known. For the two-dimensional Doi--Onsager model $W(ฮธ)=-|\sin(2ฯฮธ)|$, we prove that the phase transition is continuous at $K_c=K_\#=3ฯ/4$. For the noisy transformer model $W_ฮฒ(ฮธ)=(e^{ฮฒ\cos(2ฯฮธ)}-1)/ฮฒ$, we identify the sharp threshold $ฮฒ_*$ such that $K_c(ฮฒ) = K_\#(ฮฒ)$ and the phase transition is continuous for $ฮฒ\leq ฮฒ_*$, while $K_c(ฮฒ)
Unfolding with a Wasserstein Loss
Craig, Katy, Faktor, Benjamin, Nachman, Benjamin
Data unfolding -- the removal of noise or artifacts from measurements -- is a fundamental task across the experimental sciences. Of particular interest in the present work are applications of data unfolding in physics, in which context the dominant approach is RichardsonLucy (RL) deconvolution. The classical RL approach aims to find denoised data that, once passed through the noise model, is as close as possible to the measured data, in terms of Kullback-Leibler (KL) divergence. Fundamental to this approach is the hypothesis that the support of the measured data overlaps with the output of the noise model, so that the KL divergence correctly captures their similarity. In practice, this hypothesis is typically enforced by binning the measured data and noise model, introducing numerical error into the unfolding process. As a counterpoint to classical binned methods for unfolding, the present work studies an alternative formulation of the unfolding problem, using a Wasserstein loss instead of the KL divergence to quantify the similarity between the measured data and the output of the noise model. We establish sharp conditions for existence and uniqueness of optimizers; as a consequence we answer open questions of Li, et al. [23], regarding necessary conditions for existence and uniqueness in the case of transport map noise models. Following these theoretical results, we then develop a provably convergent generalized Sinkhorn algorithm to compute approximate optimizers. Our algorithm requires only empirical observations of the noise model and measured data and scales with the size of the data, rather than the ambient dimension.
The Exponentially Weighted Signature
Bloch, Alexandre, Cohen, Samuel N., Lyons, Terry, Mouterde, Joรซl, Walker, Benjamin
The signature is a canonical representation of a multidimensional path over an interval. However, it treats all historical information uniformly, offering no intrinsic mechanism for contextualising the relevance of the past. To address this, we introduce the Exponentially Weighted Signature (EWS), generalising the Exponentially Fading Memory (EFM) signature from diagonal to general bounded linear operators. These operators enable cross-channel coupling at the level of temporal weighting together with richer memory dynamics including oscillatory, growth, and regime-dependent behaviour, while preserving the algebraic strengths of the classical signature. We show that the EWS is the unique solution to a linear controlled differential equation on the tensor algebra, and that it generalises both state-space models and the Laplace and Fourier transforms of the path. The group-like structure of the EWS enables efficient computation and makes the framework amenable to gradient-based learning, with the full semigroup action parametrised by and learned through its generator. We use this framework to empirically demonstrate the expressivity gap between the EWS and both the signature and EFM on two SDE-based regression tasks.