Goto

Collaborating Authors

 Banff


A Diffusion Model Framework for Unsupervised Neural Combinatorial Optimization

arXiv.org Machine Learning

Learning to sample from intractable distributions over discrete sets without relying on corresponding training data is a central problem in a wide range of fields, including Combinatorial Optimization. Currently, popular deep learning-based approaches rely primarily on generative models that yield exact sample likelihoods. This work introduces a method that lifts this restriction and opens the possibility to employ highly expressive latent variable models like diffusion models. Our approach is conceptually based on a loss that upper bounds the reverse Kullback-Leibler divergence and evades the requirement of exact sample likelihoods. We experimentally validate our approach in data-free Combinatorial Optimization and demonstrate that our method achieves a new state-of-the-art on a wide range of benchmark problems.


Demystifying SGD with Doubly Stochastic Gradients

arXiv.org Machine Learning

Optimization objectives in the form of a sum of intractable expectations are rising in importance (e.g., diffusion models, variational autoencoders, and many more), a setting also known as "finite sum with infinite data." For these problems, a popular strategy is to employ SGD with doubly stochastic gradients (doubly SGD): the expectations are estimated using the gradient estimator of each component, while the sum is estimated by subsampling over these estimators. Despite its popularity, little is known about the convergence properties of doubly SGD, except under strong assumptions such as bounded variance. In this work, we establish the convergence of doubly SGD with independent minibatching and random reshuffling under general conditions, which encompasses dependent component gradient estimators. In particular, for dependent estimators, our analysis allows fined-grained analysis of the effect correlations. As a result, under a per-iteration computational budget of $b \times m$, where $b$ is the minibatch size and $m$ is the number of Monte Carlo samples, our analysis suggests where one should invest most of the budget in general. Furthermore, we prove that random reshuffling (RR) improves the complexity dependence on the subsampling noise.


MEMoE: Enhancing Model Editing with Mixture of Experts Adaptors

arXiv.org Artificial Intelligence

Model editing aims to efficiently alter the behavior of Large Language Models (LLMs) within a desired scope, while ensuring no adverse impact on other inputs. Recent years have witnessed various model editing methods been proposed. However, these methods either exhibit poor overall performance or struggle to strike a balance between generalization and locality. We propose MEMoE, a model editing adapter utilizing a Mixture of Experts (MoE) architecture with a knowledge anchor routing strategy. MEMoE updates knowledge using a bypass MoE structure, keeping the original parameters unchanged to preserve the general ability of LLMs. And, the knowledge anchor routing ensures that inputs requiring similar knowledge are routed to the same expert, thereby enhancing the generalization of the updated knowledge. Experimental results show the superiority of our approach over both batch editing and sequential batch editing tasks, exhibiting exceptional overall performance alongside outstanding balance between generalization and locality. Our code will be available.


Representation and De-interleaving of Mixtures of Hidden Markov Processes

arXiv.org Machine Learning

De-interleaving of the mixtures of Hidden Markov Processes (HMPs) generally depends on its representation model. Existing representation models consider Markov chain mixtures rather than hidden Markov, resulting in the lack of robustness to non-ideal situations such as observation noise or missing observations. Besides, de-interleaving methods utilize a search-based strategy, which is time-consuming. To address these issues, this paper proposes a novel representation model and corresponding de-interleaving methods for the mixtures of HMPs. At first, a generative model for representing the mixtures of HMPs is designed. Subsequently, the de-interleaving process is formulated as a posterior inference for the generative model. Secondly, an exact inference method is developed to maximize the likelihood of the complete data, and two approximate inference methods are developed to maximize the evidence lower bound by creating tractable structures. Then, a theoretical error probability lower bound is derived using the likelihood ratio test, and the algorithms are shown to get reasonably close to the bound. Finally, simulation results demonstrate that the proposed methods are highly effective and robust for non-ideal situations, outperforming baseline methods on simulated and real-life data.


ImplicitSLIM and How it Improves Embedding-based Collaborative Filtering

arXiv.org Artificial Intelligence

Sparse linear methods (SLIM) and their variations show outstanding performance, but they are memory-intensive and hard to scale. ImplicitSLIM improves embedding-based models by extracting embeddings from SLIM-like models in a computationally cheap and memory-efficient way, without explicit learning of heavy SLIM-like models. We show that ImplicitSLIM improves performance and speeds up convergence for both state of the art and classical collaborative filtering methods. Learnable embeddings are a core part of many collaborative filtering (CF) models. In this work, we propose an approach able to improve a wide variety of collaborative filtering models with learnable embeddings. Item-item methods, including kNN-based approaches (Sarwar et al., 2001) and sparse linear methods (SLIM) (Ning & Karypis, 2011), are making predictions based on item-item similarity. Previous research shows that the item-item weight matrix learned by SLIM-like models can become a part of other collaborative filtering models; e.g., RecWalk uses it as a transition probability matrix (Nikolakopoulos & Karypis, 2019). In this work, we reuse the item-item weight matrix in order to enrich embedding-based models with information on item-item interactions. Another motivation for our approach stems from nonlinear dimensionality reduction methods (e.g., VAEs) applied to collaborative filtering (Shenbin et al., 2020). We consider a group of manifold learning methods that aim to preserve the structure of data in the embedding space, that is, they force embeddings of similar objects to be similar.


Large Brain Model for Learning Generic Representations with Tremendous EEG Data in BCI

arXiv.org Artificial Intelligence

The current electroencephalogram (EEG) based deep learning models are typically designed for specific datasets and applications in brain-computer interaction (BCI), limiting the scale of the models and thus diminishing their perceptual capabilities and generalizability. Recently, Large Language Models (LLMs) have achieved unprecedented success in text processing, prompting us to explore the capabilities of Large EEG Models (LEMs). We hope that LEMs can break through the limitations of different task types of EEG datasets, and obtain universal perceptual capabilities of EEG signals through unsupervised pre-training. Then the models can be fine-tuned for different downstream tasks. However, compared to text data, the volume of EEG datasets is generally small and the format varies widely. For example, there can be mismatched numbers of electrodes, unequal length data samples, varied task designs, and low signal-to-noise ratio. To overcome these challenges, we propose a unified foundation model for EEG called Large Brain Model (LaBraM). LaBraM enables cross-dataset learning by segmenting the EEG signals into EEG channel patches. Vector-quantized neural spectrum prediction is used to train a semantically rich neural tokenizer that encodes continuous raw EEG channel patches into compact neural codes. We then pre-train neural Transformers by predicting the original neural codes for the masked EEG channel patches. The LaBraMs were pre-trained on about 2,500 hours of various types of EEG signals from around 20 datasets and validated on multiple different types of downstream tasks. Experiments on abnormal detection, event type classification, emotion recognition, and gait prediction show that our LaBraM outperforms all compared SOTA methods in their respective fields. Electroencephalography (EEG) is a method to record an electrogram of the spontaneous electrical activity of the brain.


NPGA: Neural Parametric Gaussian Avatars

arXiv.org Artificial Intelligence

The creation of high-fidelity, digital versions of human heads is an important stepping stone in the process of further integrating virtual components into our everyday lives. Constructing such avatars is a challenging research problem, due to a high demand for photo-realism and real-time rendering performance. In this work, we propose Neural Parametric Gaussian Avatars (NPGA), a data-driven approach to create high-fidelity, controllable avatars from multi-view video recordings. We build our method around 3D Gaussian Splatting for its highly efficient rendering and to inherit the topological flexibility of point clouds. In contrast to previous work, we condition our avatars' dynamics on the rich expression space of neural parametric head models (NPHM), instead of mesh-based 3DMMs. To this end, we distill the backward deformation field of our underlying NPHM into forward deformations which are compatible with rasterization-based rendering. All remaining fine-scale, expression-dependent details are learned from the multi-view videos. To increase the representational capacity of our avatars, we augment the canonical Gaussian point cloud using per-primitive latent features which govern its dynamic behavior. To regularize this increased dynamic expressivity, we propose Laplacian terms on the latent features and predicted dynamics. We evaluate our method on the public NeRSemble dataset, demonstrating that NPGA significantly outperforms the previous state-of-the-art avatars on the self-reenactment task by 2.6 PSNR. Furthermore, we demonstrate accurate animation capabilities from real-world monocular videos.


SEMF: Supervised Expectation-Maximization Framework for Predicting Intervals

arXiv.org Machine Learning

This work introduces the Supervised Expectation-Maximization Framework (SEMF), a versatile and model-agnostic framework that generates prediction intervals for datasets with complete or missing data. SEMF extends the Expectation-Maximization (EM) algorithm, traditionally used in unsupervised learning, to a supervised context, enabling it to extract latent representations for uncertainty estimation. The framework demonstrates robustness through extensive empirical evaluation across 11 tabular datasets, achieving$\unicode{x2013}$in some cases$\unicode{x2013}$narrower normalized prediction intervals and higher coverage than traditional quantile regression methods. Furthermore, SEMF integrates seamlessly with existing machine learning algorithms, such as gradient-boosted trees and neural networks, exemplifying its usefulness for real-world applications. The experimental results highlight SEMF's potential to advance state-of-the-art techniques in uncertainty quantification.


"It depends": Configuring AI to Improve Clinical Usefulness Across Contexts

arXiv.org Artificial Intelligence

Artificial Intelligence (AI) repeatedly match or outperform radiologists in lab experiments. However, real-world implementations of radiological AI-based systems are found to provide little to no clinical value. This paper explores how to design AI for clinical usefulness in different contexts. We conducted 19 design sessions and design interventions with 13 radiologists from 7 clinical sites in Denmark and Kenya, based on three iterations of a functional AI-based prototype. Ten sociotechnical dependencies were identified as crucial for the design of AI in radiology. We conceptualised four technical dimensions that must be configured to the intended clinical context of use: AI functionality, AI medical focus, AI decision threshold, and AI Explainability. We present four design recommendations on how to address dependencies pertaining to the medical knowledge, clinic type, user expertise level, patient context, and user situation that condition the configuration of these technical dimensions.


Uncertainty Management in the Construction of Knowledge Graphs: a Survey

arXiv.org Artificial Intelligence

Knowledge Graphs (KGs) are a major asset for companies thanks to their great flexibility in data representation and their numerous applications, e.g., vocabulary sharing, Q/A or recommendation systems. To build a KG it is a common practice to rely on automatic methods for extracting knowledge from various heterogeneous sources. But in a noisy and uncertain world, knowledge may not be reliable and conflicts between data sources may occur. Integrating unreliable data would directly impact the use of the KG, therefore such conflicts must be resolved. This could be done manually by selecting the best data to integrate. This first approach is highly accurate, but costly and time-consuming. That is why recent efforts focus on automatic approaches, which represents a challenging task since it requires handling the uncertainty of extracted knowledge throughout its integration into the KG. We survey state-of-the-art approaches in this direction and present constructions of both open and enterprise KGs and how their quality is maintained. We then describe different knowledge extraction methods, introducing additional uncertainty. We also discuss downstream tasks after knowledge acquisition, including KG completion using embedding models, knowledge alignment, and knowledge fusion in order to address the problem of knowledge uncertainty in KG construction. We conclude with a discussion on the remaining challenges and perspectives when constructing a KG taking into account uncertainty.