Goto

Collaborating Authors

 Bayesian Learning


Decoding-based Regression

arXiv.org Machine Learning

Language models have recently been shown capable of performing regression tasks wherein numeric predictions are represented as decoded strings. In this work, we provide theoretical grounds for this capability and furthermore investigate the utility of causal auto-regressive sequence models when they are applied to any feature representation. We find that, despite being trained in the usual way - for next-token prediction via cross-entropy loss - decoding-based regression is as performant as traditional approaches for tabular regression tasks, while being flexible enough to capture arbitrary distributions, such as in the task of density estimation.


Combining physics-based and data-driven models: advancing the frontiers of research with Scientific Machine Learning

arXiv.org Artificial Intelligence

Scientific Machine Learning (SciML) is a recently emerged research field which combines physics-based and data-driven models for the numerical approximation of differential problems. Physics-based models rely on the physical understanding of the problem at hand, subsequent mathematical formulation, and numerical approximation. Data-driven models instead aim to extract relations between input and output data without arguing any causality principle underlining the available data distribution. In recent years, data-driven models have been rapidly developed and popularized. Such a diffusion has been triggered by a huge availability of data (the so-called big data), an increasingly cheap computing power, and the development of powerful machine learning algorithms. SciML leverages the physical awareness of physics-based models and, at the same time, the efficiency of data-driven algorithms. With SciML, we can inject physics and mathematical knowledge into machine learning algorithms. Yet, we can rely on data-driven algorithms' capability to discover complex and non-linear patterns from data and improve the descriptive capacity of physics-based models. After recalling the mathematical foundations of digital modelling and machine learning algorithms, and presenting the most popular machine learning architectures, we discuss the great potential of a broad variety of SciML strategies in solving complex problems governed by partial differential equations. Finally, we illustrate the successful application of SciML to the simulation of the human cardiac function, a field of significant socio-economic importance that poses numerous challenges on both the mathematical and computational fronts. The corresponding mathematical model is a complex system of non-linear ordinary and partial differential equations describing the electromechanics, valve dynamics, blood circulation, perfusion in the coronary tree, and torso potential. Despite the robustness and accuracy of physics-based models, certain aspects, such as unveiling constitutive laws for cardiac cells and myocardial material properties, as well as devising efficient reduced order models to dominate the extraordinary computational complexity, have been successfully tackled by leveraging data-driven models.


BARNN: A Bayesian Autoregressive and Recurrent Neural Network

arXiv.org Artificial Intelligence

Autoregressive and recurrent networks have achieved remarkable progress across various fields, from weather forecasting to molecular generation and Large Language Models. Despite their strong predictive capabilities, these models lack a rigorous framework for addressing uncertainty, which is key in scientific applications such as PDE solving, molecular generation and Machine Learning Force Fields. To address this shortcoming we present BARNN: a variational Bayesian Autoregressive and Recurrent Neural Network. BARNNs aim to provide a principled way to turn any autoregressive or recurrent model into its Bayesian version. BARNN is based on the variational dropout method, allowing to apply it to large recurrent neural networks as well. We also introduce a temporal version of the "Variational Mixtures of Posteriors" prior (tVAMP-prior) to make Bayesian inference efficient and well-calibrated. Extensive experiments on PDE modelling and molecular generation demonstrate that BARNN not only achieves comparable or superior accuracy compared to existing methods, but also excels in uncertainty quantification and modelling long-range dependencies.


Joint Optimization of Prompt Security and System Performance in Edge-Cloud LLM Systems

arXiv.org Artificial Intelligence

Large language models (LLMs) have significantly facilitated human life, and prompt engineering has improved the efficiency of these models. However, recent years have witnessed a rise in prompt engineering-empowered attacks, leading to issues such as privacy leaks, increased latency, and system resource wastage. Though safety fine-tuning based methods with Reinforcement Learning from Human Feedback (RLHF) are proposed to align the LLMs, existing security mechanisms fail to cope with fickle prompt attacks, highlighting the necessity of performing security detection on prompts. In this paper, we jointly consider prompt security, service latency, and system resource optimization in Edge-Cloud LLM (EC-LLM) systems under various prompt attacks. To enhance prompt security, a vector-database-enabled lightweight attack detector is proposed. We formalize the problem of joint prompt detection, latency, and resource optimization into a multi-stage dynamic Bayesian game model. The equilibrium strategy is determined by predicting the number of malicious tasks and updating beliefs at each stage through Bayesian updates. The proposed scheme is evaluated on a real implemented EC-LLM system, and the results demonstrate that our approach offers enhanced security, reduces the service latency for benign users, and decreases system resource consumption compared to state-of-the-art algorithms.


A Comprehensive Analysis on Machine Learning based Methods for Lung Cancer Level Classification

arXiv.org Artificial Intelligence

Lung cancer is a major issue in worldwide public health, requiring early diagnosis using stable techniques. This work begins a thorough investigation of the use of machine learning (ML) methods for precise classification of lung cancer stages. A cautious analysis is performed to overcome overfitting issues in model performance, taking into account minimum child weight and learning rate. A set of machine learning (ML) models including XGBoost (XGB), LGBM, Adaboost, Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), CatBoost, and k-Nearest Neighbor (k-NN) are run methodically and contrasted. Furthermore, the correlation between features and targets is examined using the deep neural network (DNN) model and thus their capability in detecting complex patternsis established. It is argued that several ML models can be capable of classifying lung cancer stages with great accuracy. In spite of the complexity of DNN architectures, traditional ML models like XGBoost, LGBM, and Logistic Regression excel with superior performance. The models perform better than the others in lung cancer prediction on the complete set of comparative metrics like accuracy, precision, recall, and F-1 score


Reducing Aleatoric and Epistemic Uncertainty through Multi-modal Data Acquisition

arXiv.org Artificial Intelligence

To generate accurate and reliable predictions, modern AI systems need to combine data from multiple modalities, such as text, images, audio, spreadsheets, and time series. Multi-modal data introduces new opportunities and challenges for disentangling uncertainty: it is commonly assumed in the machine learning community that epistemic uncertainty can be reduced by collecting more data, while aleatoric uncertainty is irreducible. However, this assumption is challenged in modern AI systems when information is obtained from different modalities. This paper introduces an innovative data acquisition framework where uncertainty disentanglement leads to actionable decisions, allowing sampling in two directions: sample size and data modality. The main hypothesis is that aleatoric uncertainty decreases as the number of modalities increases, while epistemic uncertainty decreases by collecting more observations. We provide proof-of-concept implementations on two multi-modal datasets to showcase our data acquisition framework, which combines ideas from active learning, active feature acquisition and uncertainty quantification.


Bayesian Optimization with Preference Exploration by Monotonic Neural Network Ensemble

arXiv.org Machine Learning

In MOO, there is usually not a single optimal solution, but a range of so-called Pareto optimal or non-dominated Many real-world black-box optimization problems solutions with different trade-offs. A widely adopted approach have multiple conflicting objectives. Rather aims to search for a good representation of these than attempting to approximate the entire set of Pareto-optimal solutions by maximizing their hypervolume. Pareto-optimal solutions, interactive preference Two prominent methods stand out in this regard: ParEGO learning, i.e., optimization with a decision maker (Knowles, 2006), which employs random augmented Chebyshev in the loop, allows to focus the search on the scalarizations for optimization in each iteration, and most relevant subset. However, few previous studies expected hypervolume maximization (Yang et al., 2019; have exploited the fact that utility functions Daulton et al., 2020), which directly maximizes the hypervolume are usually monotonic.


Joint Learning of Energy-based Models and their Partition Function

arXiv.org Machine Learning

Energy-based models (EBMs) offer a flexible framework for parameterizing probability distributions using neural networks. However, learning EBMs by exact maximum likelihood estimation (MLE) is generally intractable, due to the need to compute the partition function (normalization constant). In this paper, we propose a novel formulation for approximately learning probabilistic EBMs in combinatorially-large discrete spaces, such as sets or permutations. Our key idea is to jointly learn both an energy model and its log-partition, both parameterized as a neural network. Our approach not only provides a novel tractable objective criterion to learn EBMs by stochastic gradient descent (without relying on MCMC), but also a novel means to estimate the log-partition function on unseen data points. On the theoretical side, we show that our approach recovers the optimal MLE solution when optimizing in the space of continuous functions. Furthermore, we show that our approach naturally extends to the broader family of Fenchel-Young losses, allowing us to obtain the first tractable method for optimizing the sparsemax loss in combinatorially-large spaces. We demonstrate our approach on multilabel classification and label ranking.


Optimal generalisation and learning transition in extensive-width shallow neural networks near interpolation

arXiv.org Machine Learning

We consider a teacher-student model of supervised learning with a fully-trained 2-layer neural network whose width $k$ and input dimension $d$ are large and proportional. We compute the Bayes-optimal generalisation error of the network for any activation function in the regime where the number of training data $n$ scales quadratically with the input dimension, i.e., around the interpolation threshold where the number of trainable parameters $kd+k$ and of data points $n$ are comparable. Our analysis tackles generic weight distributions. Focusing on binary weights, we uncover a discontinuous phase transition separating a "universal" phase from a "specialisation" phase. In the first, the generalisation error is independent of the weight distribution and decays slowly with the sampling rate $n/d^2$, with the student learning only some non-linear combinations of the teacher weights. In the latter, the error is weight distribution-dependent and decays faster due to the alignment of the student towards the teacher network. We thus unveil the existence of a highly predictive solution near interpolation, which is however potentially hard to find.


Belief Roadmaps with Uncertain Landmark Evanescence

arXiv.org Artificial Intelligence

We would like a robot to navigate to a goal location while minimizing state uncertainty. To aid the robot in this endeavor, maps provide a prior belief over the location of objects and regions of interest. To localize itself within the map, a robot identifies mapped landmarks using its sensors. However, as the time between map creation and robot deployment increases, portions of the map can become stale, and landmarks, once believed to be permanent, may disappear. We refer to the propensity of a landmark to disappear as landmark evanescence. Reasoning about landmark evanescence during path planning, and the associated impact on localization accuracy, requires analyzing the presence or absence of each landmark, leading to an exponential number of possible outcomes of a given motion plan. To address this complexity, we develop BRULE, an extension of the Belief Roadmap. During planning, we replace the belief over future robot poses with a Gaussian mixture which is able to capture the effects of landmark evanescence. Furthermore, we show that belief updates can be made efficient, and that maintaining a random subset of mixture components is sufficient to find high quality solutions. We demonstrate performance in simulated and real-world experiments. Software is available at https://bit.ly/BRULE.