Bayesian Learning
Robust agents learn causal world models
Richens, Jonathan, Everitt, Tom
It has long been hypothesised that causal reasoning plays a fundamental role in robust and general intelligence. However, it is not known if agents must learn causal models in order to generalise to new domains, or if other inductive biases are sufficient. We answer this question, showing that any agent capable of satisfying a regret bound under a large set of distributional shifts must have learned an approximate causal model of the data generating process, which converges to the true causal model for optimal agents. We discuss the implications of this result for several research areas including transfer learning and causal inference.
Statistical learning for constrained functional parameters in infinite-dimensional models with applications in fair machine learning
Nabi, Razieh, Hejazi, Nima S., van der Laan, Mark J., Benkeser, David
Constrained learning has become increasingly important, especially in the realm of algorithmic fairness and machine learning. In these settings, predictive models are developed specifically to satisfy pre-defined notions of fairness. Here, we study the general problem of constrained statistical machine learning through a statistical functional lens. We consider learning a function-valued parameter of interest under the constraint that one or several pre-specified real-valued functional parameters equal zero or are otherwise bounded. We characterize the constrained functional parameter as the minimizer of a penalized risk criterion using a Lagrange multiplier formulation. We show that closed-form solutions for the optimal constrained parameter are often available, providing insight into mechanisms that drive fairness in predictive models. Our results also suggest natural estimators of the constrained parameter that can be constructed by combining estimates of unconstrained parameters of the data generating distribution. Thus, our estimation procedure for constructing fair machine learning algorithms can be applied in conjunction with any statistical learning approach and off-the-shelf software. We demonstrate the generality of our method by explicitly considering a number of examples of statistical fairness constraints and implementing the approach using several popular learning approaches.
All-in-one simulation-based inference
Gloeckler, Manuel, Deistler, Michael, Weilbach, Christian, Wood, Frank, Macke, Jakob H.
Amortized Bayesian inference trains neural networks to solve stochastic inference problems using model simulations, thereby making it possible to rapidly perform Bayesian inference for any newly observed data. However, current simulation-based amortized inference methods are simulation-hungry and inflexible: They require the specification of a fixed parametric prior, simulator, and inference tasks ahead of time. Here, we present a new amortized inference method -- the Simformer -- which overcomes these limitations. By training a probabilistic diffusion model with transformer architectures, the Simformer outperforms current state-of-the-art amortized inference approaches on benchmark tasks and is substantially more flexible: It can be applied to models with function-valued parameters, it can handle inference scenarios with missing or unstructured data, and it can sample arbitrary conditionals of the joint distribution of parameters and data, including both posterior and likelihood. We showcase the performance and flexibility of the Simformer on simulators from ecology, epidemiology, and neuroscience, and demonstrate that it opens up new possibilities and application domains for amortized Bayesian inference on simulation-based models.
Enhancing Predictive Accuracy in Pharmaceutical Sales Through An Ensemble Kernel Gaussian Process Regression Approach
Mirshekari, Shahin, Moradi, Mohammadreza, Jafari, Hossein, Jafari, Mehdi, Ensaf, Mohammad
This research employs Gaussian Process Regression (GPR) with an ensemble kernel, integrating Exponential Squared, Revised Mat\'ern, and Rational Quadratic kernels to analyze pharmaceutical sales data. Bayesian optimization was used to identify optimal kernel weights: 0.76 for Exponential Squared, 0.21 for Revised Mat\'ern, and 0.13 for Rational Quadratic. The ensemble kernel demonstrated superior performance in predictive accuracy, achieving an \( R^2 \) score near 1.0, and significantly lower values in Mean Squared Error (MSE), Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE). These findings highlight the efficacy of ensemble kernels in GPR for predictive analytics in complex pharmaceutical sales datasets.
Fast Fishing: Approximating BAIT for Efficient and Scalable Deep Active Image Classification
Huseljic, Denis, Hahn, Paul, Herde, Marek, Rauch, Lukas, Sick, Bernhard
Deep active learning (AL) seeks to minimize the annotation costs for training deep neural networks. Bait, a recently proposed AL strategy based on the Fisher Information, has demonstrated impressive performance across various datasets. However, Bait's high computational and memory requirements hinder its applicability on large-scale classification tasks, resulting in current research neglecting Bait in their evaluation. This paper introduces two methods to enhance Bait's computational efficiency and scalability. Notably, we significantly reduce its time complexity by approximating the Fisher Information. In particular, we adapt the original formulation by i) taking the expectation over the most probable classes, and ii) constructing a binary classification task, leading to an alternative likelihood for gradient computations. Consequently, this allows the efficient use of Bait on large-scale datasets, including ImageNet. Our unified and comprehensive evaluation across a variety of datasets demonstrates that our approximations achieve strong performance with considerably reduced time complexity.
ALICE: Combining Feature Selection and Inter-Rater Agreeability for Machine Learning Insights
Anasashvili, Bachana, Jeleskovic, Vahidin
The use of Machine Learning models for decision-making has become the new norm not only in tech but any business field imaginable, covering any possible task at hand be it search engine recommendations, customer churn prediction, credit risk scoring, energy load forecasting, or the deployment of personalized AI assistants. This comes at a time when developing ML models has become increasingly easier with the rise of open-source, free and user-friendly Python libraries such as Keras, scikit-learn, PyTorch and as generative AI-based conversational chatbots such as ChatGPT, Gemini and Claude that can provide coding assistance -- if not ready-made code for modeling -- are evolving rapidly. Such developments yet again beg the question of interpretability in machine learning, which has been formulated in various ways in literature and been offered multiple proposed solutions such as exploring causality (see Section 2.1), explainability (see Section 2.2) or abandoning black box ML models altogether. But to make a philosophical argument, it is hard to see the benefits of highly model or domain-specific, post-hoc, or complex solutions to obtain insights into the inner-doings of machine learning models when the modeling task itself is growing ever more accessible to laypeople. Common thought on categorizing ML models in this regard would argue that parametric models descending from the fields of statistics and econometrics such as Linear or Logistic Regression are by nature more interpretable than their data-driven and non-parametric counterparts such as tree-based models or neural networks.
Concentration properties of fractional posterior in 1-bit matrix completion
The problem of estimating a matrix based on a set of its observed entries is commonly referred to as the matrix completion problem. In this work, we specifically address the scenario of binary observations, often termed as 1-bit matrix completion. While numerous studies have explored Bayesian and frequentist methods for real-value matrix completion, there has been a lack of theoretical exploration regarding Bayesian approaches in 1-bit matrix completion. We tackle this gap by considering a general, non-uniform sampling scheme and providing theoretical assurances on the efficacy of the fractional posterior. Our contributions include obtaining concentration results for the fractional posterior and demonstrating its effectiveness in recovering the underlying parameter matrix. We accomplish this using two distinct types of prior distributions: low-rank factorization priors and a spectral scaled Student prior, with the latter requiring fewer assumptions. Importantly, our results exhibit an adaptive nature by not mandating prior knowledge of the rank of the parameter matrix. Our findings are comparable to those found in the frequentist literature, yet demand fewer restrictive assumptions.
Enhancing Fairness and Performance in Machine Learning Models: A Multi-Task Learning Approach with Monte-Carlo Dropout and Pareto Optimality
The term bias was first introduced in the machine learning domain by Tom Mitchell in his 1980 paper titled "The need for biases in learning generalizations" Mitchell [1980]. The concept of bias refers to giving importance to particular features to improve generalization. This general idea of bias in machine learning is positive and necessary for models to perform, eliminating the risk of hyper-focusing on specific samples over others. On the contrary, bias can also be negative in machine learning. Negative bias can be defined as an inaccurate assumption made by a machine learning algorithm that is systematically or historically prejudiced against certain groups of people Zanna et al. [2022]. Decisions made by these biased algorithms could cause adverse effects on particular social groups, for example, those defined by sex, race, age, marital status, handicaps, etc., when used to make autonomous decisions in life-changing cases such as health, hiring, education, criminal sentencing, etc. Negative bias can be introduced into the machine pipeline in two main ways, through the data or the algorithm itself Blanzeisky and Cunningham [2021]. Bias due to data, also known as a negative legacy Cunningham and Delany [2021], Kamishima et al. [2012], can be caused by an imbalance in the representation of different population categories
The Impact of Variable Ordering on Bayesian Network Structure Learning
Kitson, Neville K, Constantinou, Anthony C
Causal Bayesian Networks provide an important tool for reasoning under uncertainty with potential application to many complex causal systems. Structure learning algorithms that can tell us something about the causal structure of these systems are becoming increasingly important. In the literature, the validity of these algorithms is often tested for sensitivity over varying sample sizes, hyper-parameters, and occasionally objective functions. In this paper, we show that the order in which the variables are read from data can have much greater impact on the accuracy of the algorithm than these factors. Because the variable ordering is arbitrary, any significant effect it has on learnt graph accuracy is concerning, and this raises questions about the validity of the results produced by algorithms that are sensitive to, but have not been assessed against, different variable orderings.
Adversarial Patterns: Building Robust Android Malware Classifiers
Bhusal, Dipkamal, Rastogi, Nidhi
Machine learning models are increasingly being adopted across various fields, such as medicine, business, autonomous vehicles, and cybersecurity, to analyze vast amounts of data, detect patterns, and make predictions or recommendations. In the field of cybersecurity, these models have made significant improvements in malware detection. However, despite their ability to understand complex patterns from unstructured data, these models are susceptible to adversarial attacks that perform slight modifications in malware samples, leading to misclassification from malignant to benign. Numerous defense approaches have been proposed to either detect such adversarial attacks or improve model robustness. These approaches have resulted in a multitude of attack and defense techniques and the emergence of a field known as `adversarial machine learning.' In this survey paper, we provide a comprehensive review of adversarial machine learning in the context of Android malware classifiers. Android is the most widely used operating system globally and is an easy target for malicious agents. The paper first presents an extensive background on Android malware classifiers, followed by an examination of the latest advancements in adversarial attacks and defenses. Finally, the paper provides guidelines for designing robust malware classifiers and outlines research directions for the future.