Bayesian Learning
Improving Trust Estimation in Human-Robot Collaboration Using Beta Reputation at Fine-grained Timescales
Dagdanov, Resul, Andrejevic, Milan, Liu, Dikai, Lin, Chin-Teng
When interacting with each other, humans adjust their behavior based on perceived trust. However, to achieve similar adaptability, robots must accurately estimate human trust at sufficiently granular timescales during the human-robot collaboration task. A beta reputation is a popular way to formalize a mathematical estimation of human trust. However, it relies on binary performance, which updates trust estimations only after each task concludes. Additionally, manually crafting a reward function is the usual method of building a performance indicator, which is labor-intensive and time-consuming. These limitations prevent efficiently capturing continuous changes in trust at more granular timescales throughout the collaboration task. Therefore, this paper presents a new framework for the estimation of human trust using a beta reputation at fine-grained timescales. To achieve granularity in beta reputation, we utilize continuous reward values to update trust estimations at each timestep of a task. We construct a continuous reward function using maximum entropy optimization to eliminate the need for the laborious specification of a performance indicator. The proposed framework improves trust estimations by increasing accuracy, eliminating the need for manually crafting a reward function, and advancing toward developing more intelligent robots. The source code is publicly available. https://github.com/resuldagdanov/robot-learning-human-trust
Soft Condorcet Optimization for Ranking of General Agents
Lanctot, Marc, Larson, Kate, Kaisers, Michael, Berthet, Quentin, Gemp, Ian, Diaz, Manfred, Maura-Rivero, Roberto-Rafael, Bachrach, Yoram, Koop, Anna, Precup, Doina
A common way to drive progress of AI models and agents is to compare their performance on standardized benchmarks. Comparing the performance of general agents requires aggregating their individual performances across a potentially wide variety of different tasks. In this paper, we describe a novel ranking scheme inspired by social choice frameworks, called Soft Condorcet Optimization (SCO), to compute the optimal ranking of agents: the one that makes the fewest mistakes in predicting the agent comparisons in the evaluation data. This optimal ranking is the maximum likelihood estimate when evaluation data (which we view as votes) are interpreted as noisy samples from a ground truth ranking, a solution to Condorcet's original voting system criteria. SCO ratings are maximal for Condorcet winners when they exist, which we show is not necessarily true for the classical rating system Elo. We propose three optimization algorithms to compute SCO ratings and evaluate their empirical performance. When serving as an approximation to the Kemeny-Young voting method, SCO rankings are on average 0 to 0.043 away from the optimal ranking in normalized Kendall-tau distance across 865 preference profiles from the PrefLib open ranking archive. In a simulated noisy tournament setting, SCO achieves accurate approximations to the ground truth ranking and the best among several baselines when 59\% or more of the preference data is missing. Finally, SCO ranking provides the best approximation to the optimal ranking, measured on held-out test sets, in a problem containing 52,958 human players across 31,049 games of the classic seven-player game of Diplomacy.
Elliptical Wishart distributions: information geometry, maximum likelihood estimator, performance analysis and statistical learning
Ayadi, Imen, Bouchard, Florent, Pascal, Frédéric
This paper deals with Elliptical Wishart distributions - which generalize the Wishart distribution - in the context of signal processing and machine learning. Two algorithms to compute the maximum likelihood estimator (MLE) are proposed: a fixed point algorithm and a Riemannian optimization method based on the derived information geometry of Elliptical Wishart distributions. The existence and uniqueness of the MLE are characterized as well as the convergence of both estimation algorithms. Statistical properties of the MLE are also investigated such as consistency, asymptotic normality and an intrinsic version of Fisher efficiency. On the statistical learning side, novel classification and clustering methods are designed. For the $t$-Wishart distribution, the performance of the MLE and statistical learning algorithms are evaluated on both simulated and real EEG and hyperspectral data, showcasing the interest of our proposed methods.
Differentiability and Approximation of Probability Functions under Gaussian Mixture Models: A Bayesian Approach
Contador, Gonzalo, Pérez-Aros, Pedro, Vilches, Emilio
In this work, we study probability functions associated with Gaussian mixture models. Our primary focus is on extending the use of spherical radial decomposition for multivariate Gaussian random vectors to the context of Gaussian mixture models, which are not inherently spherical but only conditionally so. Specifically, the conditional probability distribution, given a random parameter of the random vector, follows a Gaussian distribution, allowing us to apply Bayesian analysis tools to the probability function. This assumption, together with spherical radial decomposition for Gaussian random vectors, enables us to represent the probability function as an integral over the Euclidean sphere. Using this representation, we establish sufficient conditions to ensure the differentiability of the probability function and provide and integral representation of its gradient. Furthermore, leveraging the Bayesian decomposition, we approximate the probability function using random sampling over the parameter space and the Euclidean sphere. Finally, we present numerical examples that illustrate the advantages of this approach over classical approximations based on random vector sampling.
Point processes with event time uncertainty
Cheng, Xiuyuan, Gong, Tingnan, Xie, Yao
Point processes are widely used statistical models for uncovering the temporal patterns in dependent event data. In many applications, the event time cannot be observed exactly, calling for the incorporation of time uncertainty into the modeling of point process data. In this work, we introduce a framework to model time-uncertain point processes possibly on a network. We start by deriving the formulation in the continuous-time setting under a few assumptions motivated by application scenarios. After imposing a time grid, we obtain a discrete-time model that facilitates inference and can be computed by first-order optimization methods such as Gradient Descent or Variation inequality (VI) using batch-based Stochastic Gradient Descent (SGD). The parameter recovery guarantee is proved for VI inference at an $O(1/k)$ convergence rate using $k$ SGD steps. Our framework handles non-stationary processes by modeling the inference kernel as a matrix (or tensor on a network) and it covers the stationary process, such as the classical Hawkes process, as a special case. We experimentally show that the proposed approach outperforms previous General Linear model (GLM) baselines on simulated and real data and reveals meaningful causal relations on a Sepsis-associated Derangements dataset.
Recursive Learning of Asymptotic Variational Objectives
Mastrototaro, Alessandro, Müller, Mathias, Olsson, Jimmy
General state-space models (SSMs) are widely used in statistical machine learning and are among the most classical generative models for sequential time-series data. SSMs, comprising latent Markovian states, can be subjected to variational inference (VI), but standard VI methods like the importance-weighted autoencoder (IWAE) lack functionality for streaming data. To enable online VI in SSMs when the observations are received in real time, we propose maximising an IWAE-type variational lower bound on the asymptotic contrast function, rather than the standard IWAE ELBO, using stochastic approximation. Unlike the recursive maximum likelihood method, which directly maximises the asymptotic contrast, our approach, called online sequential IWAE (OSIWAE), allows for online learning of both model parameters and a Markovian recognition model for inferring latent states. By approximating filter state posteriors and their derivatives using sequential Monte Carlo (SMC) methods, we create a particle-based framework for online VI in SSMs. This approach is more theoretically well-founded than recently proposed online variational SMC methods. We provide rigorous theoretical results on the learning objective and a numerical study demonstrating the method's efficiency in learning model parameters and particle proposal kernels.
Interacting Large Language Model Agents. Interpretable Models and Social Learning
Jain, Adit, Krishnamurthy, Vikram
This paper develops theory and algorithms for interacting large language model agents (LLMAs) using methods from statistical signal processing and microeconomics. While both fields are mature, their application to decision-making by interacting LLMAs remains unexplored. Motivated by Bayesian sentiment analysis on online platforms, we construct interpretable models and stochastic control algorithms that enable LLMAs to interact and perform Bayesian inference. Because interacting LLMAs learn from prior decisions and external inputs, they exhibit bias and herding behavior. Thus, developing interpretable models and stochastic control algorithms is essential to understand and mitigate these behaviors. This paper has three main results. First, we show using Bayesian revealed preferences from microeconomics that an individual LLMA satisfies the sufficient conditions for rationally inattentive (bounded rationality) utility maximization and, given an observation, the LLMA chooses an action that maximizes a regularized utility. Second, we utilize Bayesian social learning to construct interpretable models for LLMAs that interact sequentially with each other and the environment while performing Bayesian inference. Our models capture the herding behavior exhibited by interacting LLMAs. Third, we propose a stochastic control framework to delay herding and improve state estimation accuracy under two settings: (a) centrally controlled LLMAs and (b) autonomous LLMAs with incentives. Throughout the paper, we demonstrate the efficacy of our methods on real datasets for hate speech classification and product quality assessment, using open-source models like Mistral and closed-source models like ChatGPT. The main takeaway of this paper, based on substantial empirical analysis and mathematical formalism, is that LLMAs act as rationally bounded Bayesian agents that exhibit social learning when interacting.
XNB: Explainable Class-Specific NaIve-Bayes Classifier
Aguilar-Ruiz, Jesus S., Romero, Cayetano, Cicconardi, Andrea
In today's data-intensive landscape, where high-dimensional datasets are increasingly common, reducing the number of input features is essential to prevent overfitting and improve model accuracy. Despite numerous efforts to tackle dimensionality reduction, most approaches apply a universal set of features across all classes, potentially missing the unique characteristics of individual classes. This paper presents the Explainable Class-Specific Naive Bayes (XNB) classifier, which introduces two critical innovations: 1) the use of Kernel Density Estimation to calculate posterior probabilities, allowing for a more accurate and flexible estimation process, and 2) the selection of class-specific feature subsets, ensuring that only the most relevant variables for each class are utilized. Extensive empirical analysis on high-dimensional genomic datasets shows that XNB matches the classification performance of traditional Naive Bayes while drastically improving model interpretability. By isolating the most relevant features for each class, XNB not only reduces the feature set to a minimal, distinct subset for each class but also provides deeper insights into how the model makes predictions. This approach offers significant advantages in fields where both precision and explainability are critical.
Bayesian scaling laws for in-context learning
Arora, Aryaman, Jurafsky, Dan, Potts, Christopher, Goodman, Noah D.
In-context learning (ICL) is a powerful technique for getting language models to perform complex tasks with no training updates. Prior work has established strong correlations between the number of in-context examples provided and the accuracy of the model's predictions. In this paper, we seek to explain this correlation by showing that ICL approximates a Bayesian learner. This perspective gives rise to a family of novel Bayesian scaling laws for ICL. In experiments with \mbox{GPT-2} models of different sizes, our scaling laws exceed or match existing scaling laws in accuracy while also offering interpretable terms for task priors, learning efficiency, and per-example probabilities. To illustrate the analytic power that such interpretable scaling laws provide, we report on controlled synthetic dataset experiments designed to inform real-world studies of safety alignment. In our experimental protocol, we use SFT to suppress an unwanted existing model capability and then use ICL to try to bring that capability back (many-shot jailbreaking). We then experiment on real-world instruction-tuned LLMs using capabilities benchmarks as well as a new many-shot jailbreaking dataset. In all cases, Bayesian scaling laws accurately predict the conditions under which ICL will cause the suppressed behavior to reemerge, which sheds light on the ineffectiveness of post-training at increasing LLM safety.
Nonparametric estimation of Hawkes processes with RKHSs
Bonnet, Anna, Sangnier, Maxime
Hawkes processes are a class of past-dependent point processes, widely used in many applications such as seismology [Ogata, 1988], criminology [Olinde and Short, 2020] and neuroscience [Reynaud-Bouret et al., 2013] for their ability to capture complex dependence structures. In their multidimensional version [Ogata, 1988], Hawkes processes can model pairwise interactions between different types of events, allowing to recover a connectivity graph between different features. Originally developed by Hawkes [1971] in order to model self-exciting phenomena, where each event increases the probability of a new event occurring, many extensions have been proposed ever since. In particular, nonlinear Hawkes processes have been introduced notably to detect inhibiting interactions, when an event can decrease the probability of another one appearing. Hawkes processes with inhibition are notoriously more complicated to handle due to the loss of many properties of linear Hawkes processes such as the cluster representation and the branching structure of the process [Hawkes and Oakes, 1974]. Since the first article on nonlinear Hawkes processes [Brémaud and Massoulié, 1996] proving in particular their existence, many works have focused on inhibition in the past few years. Among them, limit theorems have been established in [Costa et al., 2020] while Duval et al. [2022] obtained mean-field results on the behaviour of two neuronal populations. Regarding statistical inference, in the frequentist setting we can mention the exact maximum likelihood procedure of Bonnet et al. [2023], the least-squares approach by Bacry et al. [2020] and the nonparametric approach based on Bernstein-type polynomials by Lemonnier and Vayatis [2014]. While the first one proposes an exact inference procedure, it is restricted to exponential kernels.