Plotting


Learning Nonparametric Volterra Kernels with Gaussian Processes

Neural Information Processing Systems

This paper introduces a method for the nonparametric Bayesian learning of nonlinear operators, through the use of the Volterra series with kernels represented using Gaussian processes (GPs), which we term the nonparametric Volterra kernels model (NVKM). When the input function to the operator is unobserved and has a GP prior, the NVKM constitutes a powerful method for both single and multiple output regression, and can be viewed as a nonlinear and nonparametric latent force model. When the input function is observed, the NVKM can be used to perform Bayesian system identification. We use recent advances in efficient sampling of explicit functions from GPs to map process realisations through the Volterra series without resorting to numerical integration, allowing scalability through doubly stochastic variational inference, and avoiding the need for Gaussian approximations of the output processes. We demonstrate the performance of the model for both multiple output regression and system identification using standard benchmarks.


Spectral Editing of Activations for Large Language Model Alignment

Neural Information Processing Systems

Large language models (LLMs) often exhibit undesirable behaviours, such as generating untruthful or biased content. Editing their internal representations has been shown to be effective in mitigating such behaviours on top of the existing alignment methods. We propose a novel inference-time editing method, namely spectral editing of activations (SEA), to project the input representations into directions with maximal covariance with the positive demonstrations (e.g., truthful) while minimising covariance with the negative demonstrations (e.g., hallucinated). We also extend our method to non-linear editing using feature functions. We run extensive experiments on benchmarks concerning truthfulness and bias with six open-source LLMs of different sizes and model families. The results demonstrate the superiority of SEA in effectiveness, generalisation to similar tasks, as well as computation and data efficiency. We also show that SEA editing only has a limited negative impact on other model capabilities.


Contrasting with Symile: Simple Model-Agnostic Representation Learning for Unlimited Modalities

Neural Information Processing Systems

Contrastive learning methods, such as CLIP, leverage naturally paired data--for example, images and their corresponding text captions--to learn general representations that transfer efficiently to downstream tasks. While such approaches are generally applied to two modalities, domains such as robotics, healthcare, and video need to support many types of data at once. We show that the pairwise application of CLIP fails to capture joint information between modalities, thereby limiting the quality of the learned representations. To address this issue, we present Symile, a simple contrastive learning approach that captures higherorder information between any number of modalities. Symile provides a flexible, architecture-agnostic objective for learning modality-specific representations. To develop Symile's objective, we derive a lower bound on total correlation, and show that Symile representations for any set of modalities form a sufficient statistic for predicting the remaining modalities. Symile outperforms pairwise CLIP, even with modalities missing in the data, on cross-modal classification and retrieval across several experiments including on an original multilingual dataset of 33M image, text and audio samples and a clinical dataset of chest X-rays, electrocardiograms, and laboratory measurements.



Efficiently Learning One Hidden Layer Neural Networks From Queries

Neural Information Processing Systems

Model extraction attacks have renewed interest in the classic problem of learning neural networks from queries. This work gives the first polynomial-time algorithm for learning one hidden layer neural networks provided black-box access to the network.


A Measure-Theoretic Approach to Kernel Conditional Mean Embeddings

Neural Information Processing Systems

We present an operator-free, measure-theoretic approach to the conditional mean embedding (CME) as a random variable taking values in a reproducing kernel Hilbert space. While the kernel mean embedding of unconditional distributions has been defined rigorously, the existing operator-based approach of the conditional version depends on stringent assumptions that hinder its analysis. We overcome this limitation via a measure-theoretic treatment of CMEs. We derive a natural regression interpretation to obtain empirical estimates, and provide a thorough theoretical analysis thereof, including universal consistency. As natural by-products, we obtain the conditional analogues of the maximum mean discrepancy and Hilbert-Schmidt independence criterion, and demonstrate their behaviour via simulations.




Sharper Generalization Bounds for Pairwise Learning: Supplementary Material

Neural Information Processing Systems

To prove Theorem 1, we need to introduce some lemmas. The following lemma is attributed to [7], which provides far-reaching moment bounds for a summation of weakly dependent and mean-zero random functions with bounded increments under a change of any single coordinate. The bounds on moments of random variables can be used to establish concentration inequalities, as shown in the following lemma [4, 16]. Lemma A.2. Let a, b R The following lemma controls the change on the output of stable algorithms if we perturb a training dataset by two examples. With these lemmas, we can give the proof of Theorem 1 on high-probability bounds of the generalization gap.