Goto

Collaborating Authors

 kernel function


CALM-PDE: Continuous and Adaptive Convolutions for Latent Space Modeling of Time-dependent PDEs

Neural Information Processing Systems

Solving time-dependent Partial Differential Equations (PDEs) using a densely discretized spatial domain is a fundamental problem in various scientific and engineering disciplines, including modeling climate phenomena and fluid dynamics.


Multi-View Oriented GPLVM: Expressiveness and Efficiency

Neural Information Processing Systems

The multi-view Gaussian process latent variable model (MV-GPLVM) aims to learn a unified representation from multi-view data but is hindered by challenges such as limited kernel expressiveness and low computational efficiency. To overcome these issues, we first introduce a new duality between the spectral density and the kernel function. By modeling the spectral density with a bivariate Gaussian mixture, we then derive a generic and expressive kernel termed Next-Gen Spectral Mixture (NG-SM) for MV-GPLVMs. To address the inherent computational inefficiency of the NG-SM kernel, we design a new form of random Fourier feature approximation. Combined with a tailored reparameterization trick, this approximation enables scalable variational inference for both the model and the unified latent representations. Numerical evaluations across a diverse range of multi-view datasets demonstrate that our proposed method consistently outperforms state-of-the-art models in learning meaningful latent representations.


When Kernels Multiply, Clusters Unify: Fusing Embeddings with the Kronecker Product

Neural Information Processing Systems

State-of-the-art embeddings often capture distinct yet complementary discriminative features: For instance, one image embedding model may excel at distinguishing fine-grained textures, while another focuses on object-level structure. Motivated by this observation, we propose a principled approach to fuse such complementary representations through kernel multiplication. Multiplying the kernel similarity functions of two embeddings allows their discriminative structures to interact, producing a fused representation whose kernel encodes the union of the clusters identified by each parent embedding. This formulation also provides a natural way to construct joint kernels for paired multi-modal data (e.g., image-text tuples), where the product of modality-specific kernels inherits structure from both domains. We highlight that this kernel product is mathematically realized via the Kronecker product of the embedding feature maps, yielding our proposed KrossFuse framework for embedding fusion. To address the computational cost of the resulting high-dimensional Kronecker space, we further develop RP KrossFuse, a scalable variant that leverages random projections for efficient approximation. As a key application, we use this framework to bridge the performance gap between cross-modal embeddings (e.g., CLIP, BLIP) and unimodal experts (e.g., DINOv2, E5). Experiments show that RP KrossFuse effectively integrates these models, enhancing modality-specific performance while preserving cross-modal alignment.


Iterative Missing Data Imputation with Model Form Adaptation and Non-Missing Feature Supervision

Neural Information Processing Systems

Iterative imputation is a prevalent method for missing data imputation, where each feature is imputed iteratively by treating it as a target variable estimated from all other features. However, iterative imputation method suffers from two principal limitations: it imposes a single parametric model form to impute all features, neglecting the potential for optimal models to vary among features, which risks model misspecification; and it assumes every feature contains missing values, overlooking the potential presence of non-missing features, termed as oracle features, which are informative for imputation. To address these limitations, we propose kernel point imputation (KPI), a bi-level optimization framework for iterative missing data imputation. At the inner level, KPI adaptively learns the optimal model form for each feature within a reproducing kernel Hilbert space, addressing limitation . At the outer level, KPI utilizes oracle features as supervisory signals to iteratively refine the imputations, addressing limitation . Experiments demonstrate that KPI outperforms competitive imputation methods. Code is available at https://github.com/FMLYD/kpi.git.


Mode-Shape Expansion Using Physics-Constrained Gaussian Process Regression

arXiv.org Machine Learning

This paper addresses the challenge of reconstructing full-field structural mode shapes from sparse sensor data. While Gaussian Process Regression (GPR) offers a robust non-parametric framework for spatial interpolation and uncertainty quantification, standard formulations often yield physically inconsistent mode-shape reconstructions under sparse sensing conditions. A Physics-Constrained Single-Output Gaussian Process (CONS-SOGP) framework is derived that utilizes independent modal kernels while coupling the optimization via a mass-orthogonality penalty. The paper presents derivations for the marginal likelihood, hyperparameter gradients, and penalty coupling. Numerical verification on a multi-degree-of-freedom structure demonstrates that the proposed method overcomes existing limitations in GP-based prediction, providing more accurate and reliable expanded mode shapes.


An Efficient Spatial Branch-and-Bound Algorithm for Global Optimization of Gaussian Process Posterior Mean Functions

arXiv.org Machine Learning

We study the deterministic global optimization of trained Gaussian process posterior mean functions over hyperrectangular domains. Although the posterior mean function has a compact closed-form representation, its global optimization is challenging because it remains nonlinear and nonconvex. Existing exact deterministic approaches become increasingly difficult to scale as the number of training data points grows, leading to approximation-based methods that improve tractability by optimizing a modified (inexact) objective. In this work, we propose PALM-Mean, a piecewise-analytic lower-bounding framework embedded in reduced-space spatial branch-and-bound. At each node, kernel terms that are locally important are replaced by a sign-aware piecewise-linear relaxation in an appropriate scalar distance variable, while the remaining terms are bounded analytically in closed form. We show this hybrid approach yields a valid lower bound for the posterior mean, while limiting the size of the branch-and-bound subproblems. We establish validity of the node lower bounds and $\varepsilon$-global convergence of the resulting algorithm. Computational results on synthetic benchmarks and real-world application problems show that PALM-Mean improves scalability relative to representative general-purpose deterministic global solvers, particularly as the number of training data points increases.


Supplementary Materials

Neural Information Processing Systems

We provide the supplements of "Contextual Gaussian Process Bandits with Neural Networks" here. Specifically, we discuss alternative acquisition functions that can be incorporated with the neural network-accompanied Gaussian process (NN-AGP) model in Section 6. In Section 7, we discuss the bandit algorithm with NN-AGP, where the neural network approximation error is considered. In Section 8, we provide the detailed proof of theorems. We provide the experimental details and include additional numerical experiments in Section 9. Last we discuss the limitations of NN-AGP and propose the potential approaches to addressing the limitations for future work, including sparse NN-AGP for alleviating computational burdens and transfer learning with NN-AGP to address cold-start issue; see Section 10. In the main text, we employ the upper confidence bound function as the acquisition function in the contextual Bayesian optimization approach. Here, we provide two alternative choices: Thompson sampling (TS) and knowledge gradient (KG). We describe the two procedures of the contextual GP bandit problems with NN-AGP, where the acquisition function is replaced by TS or KG. It chooses the action that maximizes the expected reward with respect to a random belief that is drawn for a posterior distribution. Besides the multi-armed bandit problems, TS has also achieved both theoretical and practical success in BO and Gaussian process regression. For more detailed discussions on TS, we refer to [87, 88]. Specifically, we propose a neural network-accompanied Gaussian process Thompson sampling (NNAGP-TS) approach to address contextual GP bandits. The approach works as follows. In each iteration, NN-AGP-TS first fits an NN-AGP model with the historic data. Then, given the current contextual variable, a realization of the Gaussian process with respect to x X is sampled from the posterior distribution conditional on the historic data1.


Kernel Identification Through Transformers

Neural Information Processing Systems

Kernel selection plays a central role in determining the performance of Gaussian Process (GP) models, as the chosen kernel determines both the inductive biases and prior support of functions under the GP prior. This work addresses the challenge of constructing custom kernel functions for high-dimensional GP regression models. Drawing inspiration from recent progress in deep learning, we introduce a novel approach named KITT: Kernel Identification Through Transformers. KITT exploits a transformer-based architecture to generate kernel recommendations in under 0.1 seconds, which is several orders of magnitude faster than conventional kernel search algorithms. We train our model using synthetic data generated from priors over a vocabulary of known kernels. By exploiting the nature of the selfattention mechanism, KITT is able to process datasets with inputs of arbitrary dimension. We demonstrate that kernels chosen by KITT yield strong performance over a diverse collection of regression benchmarks.


Supplementary Material

Neural Information Processing Systems

We say a real-valued random variable X is -sub-Gaussian if it its mean is zero and for all " 2 R we have E[exp("X)] exp Such assumptions on the noise variables are frequently used in bandit optimization. Typically, in kernelized bandits, we assume that unknown f 2F k(D;B)= {f 2H k(D): kfkk B}, where Hk(D) is the reproducing kernel Hilbert space of functions associated with the given positive-definite kernel function. Typically, the learner knows Fk(D;B), meaning that both k(,) and B are considered as input to the learner's algorithm. We outline some commonly used kernel functions k: D D! R, that we also consider: Linear kernel: klin(x,x0)= xTx0, Squared exponential kernel: kSE(x,x0)=exp kx x0k2 2l2, Matérn kernel: kMat(x,x0)= 2 Maximum information gain is a kernel-dependent quantity that measures the complexity of the given function class. It has first been introduced in [40], and since then it has been used in numerous works on Gaussian process bandits.


Kernel functions based on triplet comparisons

Neural Information Processing Systems

Given only information in the form of similarity triplets Object A is more similar to object B than to object C about a data set, we propose two ways of defining a kernel function on the data set. While previous approaches construct a low-dimensional Euclidean embedding of the data set that reflects the given similarity triplets, we aim at defining kernel functions that correspond to high-dimensional embeddings. These kernel functions can subsequently be used to apply any kernel method to the data set.