supplemental material
Bridging Maximum Likelihood and Optimal Transport for Efficient Inference and Model Selection in Stochastic Block Models
Queric, Simon, Vincent-Cuaz, Cédric, Bouveyron, Charles, Corneli, Marco
We study inference in stochastic block models (SBMs) through the lens of optimal transport (OT). We first establish that maximum likelihood variational inference (MLVI) can be interpreted as a semi-relaxed Gromov-Wasserstein (srGW) projection with entropic regularization. While this formulation yields accurate clustering, the entropic regularization prevents transport plans to be sparse, hindering intrinsic model selection. Consequently, we investigate unregularized srGW estimators, and prove that they consistently recover both the SBM connectivity matrix and latent cluster assignments in the asymptotic regime. However, this asymptotic property does not translate into reliable model selection in finite samples, and calls for additional mechanisms to promote sparsity in the inferred cluster proportions. We empirically show that such a regularized formulation yields estimators that simultaneously recover model parameters and select the number of clusters in a single optimization problem, thereby avoiding costly grid search or heuristic model selection procedures.
Posterior Contraction Rates for Sparse Kolmogorov-Arnold Networks in Anisotropic Besov Spaces
Oh, Jeunghun, Lee, Kyeongwon, Lee, Jaeyong, Lin, Lizhen
We study posterior contraction rates for sparse Bayesian Kolmogorov-Arnold networks (KANs) over anisotropic Besov spaces, providing a statistical foundation of KANs from a Bayesian point of view. We show that sparse Bayesian KANs equipped with spike-and-slab-type sparsity priors attain the near-minimax posterior contraction. In particular, the contraction rate depends on the intrinsic anisotropic smoothness of the underlying function. Moreover, by placing a hyperprior on a single model-size parameter, the resulting posterior adapts to unknown anisotropic smoothness and still achieves the corresponding near-minimax rate. A distinctive feature of our results, compared with those for standard sparse MLP-based models, is that the KAN depth can be kept fixed: owing to the flexibility of learnable spline edge functions, the required approximation complexity is controlled through the network width, spline-grid range and size, and parameter sparsity. Our analysis develops theoretical tools tailored to sparse spline-edge architectures, including approximation and complexity bounds for Bayesian KANs. We then extend to compositional Besov spaces and show that the contraction rates depend on layerwise smoothness and effective dimension of the underlying compositional structure, thereby effectively avoiding the curse of dimensionality. Together, the developed tools and findings advance the theoretical understanding of Bayesian neural networks and provide rigorous statistical foundations for KANs.
Ensemble Distributionally Robust Bayesian Optimisation
Ramazyan, Tigran, Derkach, Denis
We study zeroth-order optimisation under context distributional uncertainty, a setting commonly tackled using Bayesian optimisation (BO). A prevailing strategy to make BO more robust to the complex and noisy nature of data is to employ an ensemble as the surrogate model, thereby mitigating the weaknesses of any single model. In this study, we propose a novel algorithm for Ensemble Distributionally Robust Bayesian Optimisation that remains computationally tractable while managing continuous context. We obtain theoretical sublinear regret bounds, improving current state-of-the-art results. We show that our method's empirical behaviour aligns with its theoretical guarantees.
Heterogeneous Ordinal Structure Learning with Bayesian Nonparametric Complexity Discovery
Public attitudes toward artificial intelligence are heterogeneous, ordinally measured, and poorly captured by any single dependency graph. Existing ordinal structure learners assume a shared directed acyclic graph (DAG) across all respondents; recent heterogeneous ordinal graphical-model approaches focus on subgroup discovery rather than confirmatory cluster-specific DAG estimation; and latent profile analyses discard dependency structure entirely. We introduce a heterogeneous ordinal structure-learning framework combining monotone Gaussian score embedding, Bayesian nonparametric (BNP) complexity discovery via a truncated stick-breaking prior, and confirmatory fixed-K estimation with cluster-specific sparse DAG learning. The key methodological insight is a discovery-to-confirmation workflow: the nonparametric stage calibrates plausible archetype complexity, while inner-validated confirmatory refitting yields stable, interpretable structural estimates. On the 2024 Pew American Trends Panel AI attitudes survey, Wave 152 (W152) survey, (N = 4,788, 8 ordinal items), the confirmatory K*=5 model reduces holdout transformed-score mean squared error (MSE) by 25.8% over a single-graph baseline and by 4.6% over mixture-only clustering. A controlled tiered semi-synthetic benchmark calibrated to W152 structure validates recovery across difficulty regimes and transparently reveals failure modes under stress conditions.
Arboretum: ALarge Multimodal Dataset Enabling AI for Biodiversity (Supplemental Material)
Arboretum is a 134.6M sample dataset designed to advance AI for biodiversity applications by providing a large-scale, accurately annotated multimodal dataset that includes images and corresponding textual descriptions for a diverse set of species. Arboretum aims to facilitate the development of AI models for species identification, ecological monitoring, and agricultural research. Additionally, we introduce three new benchmark datasets: Arboretum-Unseen, Arboretum-LifeStages, and Arboretum-Balanced. As the authors of this submission, we affirm that we bear all responsibility in case of any rights violations or ethical issues associated with this work. We confirm that the submitted work is original, and if it includes third-party content, it is used with proper permissions and attributions.
Modality-Agnostic Topology Aware Localization - Supplemental Material - Farhad G. Zanjani Ilia Karmanov Hanno Ackermann Daniel Dijkman Simone Merlin Max Welling Fatih Porikli Qualcomm AIResearch
Triplet sampling was implemented based on the temporal vicinity of samples. Since the input is sequential, for each sample (called anchor) in the sequence, we consider a small and a large temporal window with predefined fixed widths. These two temporal windows are centered at the timestamp of the anchor. Any sample inside the smaller temporal window can be considered as a positive sample and any sample outside the small window but inside the large window can be considered as a negative sample. The widths of the temporal windows roughly depend on the speed of the observer in the environment.
2e6d9c6052e99fcdfa61d9b9da273ca2-Supplemental.pdf
As a "warm-up" and because it is of independent interest, we will first study an adaptation algorithm which picks the single best kernel from the meta tasks: Definition 7 (Adaptation by choosing-one-best kernel). With the set of base kernels {k1,...,kN}, ˆk = arg maxi ˆJλne(StrP,StrQ; ki) is said to be the best kernel adaptation. Proposition 3 shows uniform convergence of ˆJλ for direct adaptation of a kernel class, whether a deep kernel or multiple kernel learning. For our analysis of choosing the best single kernel, however, we only need uniform convergence over a finite set, where we can obtain a slightly better rate. Let ki be a set of base kernels, whose power criteria on the corresponding distributions are Ji = J(P,Q; ki), and let s0 = mini [N] σ2H1(P,Q; ki).
Implicit Neural Representations with Levels-of-Experts
Coordinate-based networks, usually in the forms of MLPs, have been successfully applied to the task of predicting high-frequency but low-dimensional signals using coordinate inputs. To scale them to model large-scale signals, previous works resort to hybrid representations, combining a coordinate-based network with a grid-based representation, such as sparse voxels. However, such approaches lack a compact global latent representation in its grid, making it difficult to model a distribution of signals, which is important for generalization tasks. To address the limitation, we propose the Levels-of-Experts (LoE) framework, which is a novel coordinate-based representation consisting of an MLP with periodic, positiondependent weights arranged hierarchically. For each linear layer of the MLP, multiple candidate values of its weight matrix are tiled and replicated across the input space, with different layers replicating at different frequencies. Based on the input, only one of the weight matrices is chosen for each layer. This greatly increases the model capacity without incurring extra computation or compromising generalization capability. We show that the new representation is an efficient and competitive drop-in replacement for a wide range of tasks, including signal fitting, novel view synthesis, and generative modeling.