Collaborating Authors


Are we Forgetting about Compositional Optimisers in Bayesian Optimisation? Machine Learning

Bayesian optimisation presents a sample-efficient methodology for global optimisation. Within this framework, a crucial performance-determining subroutine is the maximisation of the acquisition function, a task complicated by the fact that acquisition functions tend to be non-convex and thus nontrivial to optimise. In this paper, we undertake a comprehensive empirical study of approaches to maximise the acquisition function. Additionally, by deriving novel, yet mathematically equivalent, compositional forms for popular acquisition functions, we recast the maximisation task as a compositional optimisation problem, allowing us to benefit from the extensive literature in this field. We highlight the empirical advantages of the compositional approach to acquisition function maximisation across 3958 individual experiments comprising synthetic optimisation tasks as well as tasks from Bayesmark. Given the generality of the acquisition function maximisation subroutine, we posit that the adoption of compositional optimisers has the potential to yield performance improvements across all domains in which Bayesian optimisation is currently being applied.

Explaining by Removing: A Unified Framework for Model Explanation Machine Learning

Researchers have proposed a wide variety of model explanation approaches, but it remains unclear how most methods are related or when one method is preferable to another. We establish a new class of methods, removal-based explanations, that are based on the principle of simulating feature removal to quantify each feature's influence. These methods vary in several respects, so we develop a framework that characterizes each method along three dimensions: 1) how the method removes features, 2) what model behavior the method explains, and 3) how the method summarizes each feature's influence. Our framework unifies 25 existing methods, including several of the most widely used approaches (SHAP, LIME, Meaningful Perturbations, permutation tests). This new class of explanation methods has rich connections that we examine using tools that have been largely overlooked by the explainability literature. To anchor removal-based explanations in cognitive psychology, we show that feature removal is a simple application of subtractive counterfactual reasoning. Ideas from cooperative game theory shed light on the relationships and trade-offs among different methods, and we derive conditions under which all removal-based explanations have information-theoretic interpretations. Through this analysis, we develop a unified framework that helps practitioners better understand model explanation tools, and that offers a strong theoretical foundation upon which future explainability research can build.

Towards Metaheuristics "In the Large" Artificial Intelligence

Following decades of sustained improvement, metaheuristics are one of the great success stories of optimization research. However, in order for research in metaheuristics to avoid fragmentation and a lack of reproducibility, there is a pressing need for stronger scientific and computational infrastructure to support the development, analysis and comparison of new approaches. We argue that, via principled choice of infrastructure support, the field can pursue a higher level of scientific enquiry. We describe our vision and report on progress, showing how the adoption of common protocols for all metaheuristics can help liberate the potential of the field, easing the exploration of the design space of metaheuristics.

A Survey on the Explainability of Supervised Machine Learning Machine Learning

Predictions obtained by, e.g., artificial neural networks have a high accuracy but humans often perceive the models as black boxes. Insights about the decision making are mostly opaque for humans. Particularly understanding the decision making in highly sensitive areas such as healthcare or fifinance, is of paramount importance. The decision-making behind the black boxes requires it to be more transparent, accountable, and understandable for humans. This survey paper provides essential definitions, an overview of the different principles and methodologies of explainable Supervised Machine Learning (SML). We conduct a state-of-the-art survey that reviews past and recent explainable SML approaches and classifies them according to the introduced definitions. Finally, we illustrate principles by means of an explanatory case study and discuss important future directions.

A Nonconvex Framework for Structured Dynamic Covariance Recovery Machine Learning

Dynamic covariance models appear prominently in the analysis of time-series data and can provide fundamental insights into complex systems. They are important for applications ranging from computational finance and economics (Engle et al., 2019) to epidemiology (Fox and Dunson, 2015) and neuroscience (Foti and Fox, 2019). Our work is motivated by the study of dynamic functional brain network connectivity, defined as the time-indexed covariance of brain activity. Improving understanding of brain function is timely, as neurological and neuropsychiatric disorders ranging from Schizophrenia, Autism, and Alzheimer's to depression create burdens on affected individuals. To this end, understanding the variation of brain connectivity between individuals is believed to be a crucial step towards uncovering the mechanisms of neural information processing (Sakoğlu et al., 2010; Chang et al., 2016), with potentially transformative applications to understanding and treating neurological and neuropsychiatric disorders (Fox and Raichle, 2007; Calhoun et al., 2014). There are a variety of approaches for estimating dynamic covariances in the neuroscience literature. Discrete-state hidden Markov models construct simple explanations of brain connectivity in terms of recurring connectivity patterns (Vidaurre et al., 2017), and yet they fail to capture the smooth nature of brain dynamics (Shine et al., 2016a,b).

A contribution to Optimal Transport on incomparable spaces Machine Learning

Optimal Transport is a theory that allows to define geometrical notions of distance between probability distributions and to find correspondences, relationships, between sets of points. Many machine learning applications are derived from this theory, at the frontier between mathematics and optimization. This thesis proposes to study the complex scenario in which the different data belong to incomparable spaces. In particular we address the following questions: how to define and apply Optimal Transport between graphs, between structured data? How can it be adapted when the data are varied and not embedded in the same metric space? This thesis proposes a set of Optimal Transport tools for these different cases. An important part is notably devoted to the study of the Gromov-Wasserstein distance whose properties allow to define interesting transport problems on incomparable spaces. More broadly, we analyze the mathematical properties of the various proposed tools, we establish algorithmic solutions to compute them and we study their applicability in numerous machine learning scenarii which cover, in particular, classification, simplification, partitioning of structured data, as well as heterogeneous domain adaptation.

Mitigating Bias in Set Selection with Noisy Protected Attributes Machine Learning

Subset selection algorithms are ubiquitous in AI-driven applications, including, online recruiting portals and image search engines, so it is imperative that these tools are not discriminatory on the basis of protected attributes such as gender or race. Currently, fair subset selection algorithms assume that the protected attributes are known as part of the dataset. However, attributes may be noisy due to errors during data collection or if they are imputed (as is often the case in real-world settings). While a wide body of work addresses the effect of noise on the performance of machine learning algorithms, its effect on fairness remains largely unexamined. We find that in the presence of noisy protected attributes, in attempting to increase fairness without considering noise, one can, in fact, decrease the fairness of the result! Towards addressing this, we consider an existing noise model in which there is probabilistic information about the protected attributes (e.g.,[19, 32, 56, 44]), and ask is fair selection is possible under noisy conditions? We formulate a ``denoised'' selection problem which functions for a large class of fairness metrics; given the desired fairness goal, the solution to the denoised problem violates the goal by at most a small multiplicative amount with high probability. Although the denoised problem turns out to be NP-hard, we give a linear-programming based approximation algorithm for it. We empirically evaluate our approach on both synthetic and real-world datasets. Our empirical results show that this approach can produce subsets which significantly improve the fairness metrics despite the presence of noisy protected attributes, and, compared to prior noise-oblivious approaches, has better Pareto-tradeoffs between utility and fairness.

Reliable Off-policy Evaluation for Reinforcement Learning Machine Learning

Reinforcement learning (RL) has achieved phenomenal success in games and robotics [,, ] in the past decade, which also stimulates the enthusiasm of extending these techniques in other areas including healthcare [, ], education [ ], autonomous driving [ ], recommendation systems [, ], etc. One of the major challenges in applying RL to these real-world applications, especially those involve high-stake environments, is the problem of o -policy evaluation (OPE): how one can evaluate a new policy before deployment, using only historical data collected from a di erent policy, known as the behavior policy. Indeed, for many practical applications, one may not have a faithful simulator of the domain from which su cient amount of data can be exploited to train the RL system, and it may not always be feasible to try out a new policy without causing unintended harms. For example, consider the problem of finding the best treatment plan for a patient, or testing the performance of an automated driving system, or suggesting a personalized curriculum for a student. In those tasks, conducting experimentation involves interactions with real people, thus it can be costly to collect data and even worse, a bad policy can be risky or unethical and may result in severe consequences. Therefore, it is important for the RL system to have the ability to predict how well a new policy would perform without having to deploy it first. While most existing works on OPE aim to provide accurate point estimates for short-horizon problems [,,, ] as well as long-or infinite-horizon problems [,,,,, ], it is equally important to quantify the uncertainty of the OPE point estimates for both safe exploration and optimistic planning.

Regularized spectral methods for clustering signed networks Machine Learning

We study the problem of $k$-way clustering in signed graphs. Considerable attention in recent years has been devoted to analyzing and modeling signed graphs, where the affinity measure between nodes takes either positive or negative values. Recently, Cucuringu et al. [CDGT 2019] proposed a spectral method, namely SPONGE (Signed Positive over Negative Generalized Eigenproblem), which casts the clustering task as a generalized eigenvalue problem optimizing a suitably defined objective function. This approach is motivated by social balance theory, where the clustering task aims to decompose a given network into disjoint groups, such that individuals within the same group are connected by as many positive edges as possible, while individuals from different groups are mainly connected by negative edges. Through extensive numerical simulations, SPONGE was shown to achieve state-of-the-art empirical performance. On the theoretical front, [CDGT 2019] analyzed SPONGE and the popular Signed Laplacian method under the setting of a Signed Stochastic Block Model (SSBM), for $k=2$ equal-sized clusters, in the regime where the graph is moderately dense. In this work, we build on the results in [CDGT 2019] on two fronts for the normalized versions of SPONGE and the Signed Laplacian. Firstly, for both algorithms, we extend the theoretical analysis in [CDGT 2019] to the general setting of $k \geq 2$ unequal-sized clusters in the moderately dense regime. Secondly, we introduce regularized versions of both methods to handle sparse graphs -- a regime where standard spectral methods underperform -- and provide theoretical guarantees under the same SSBM model. To the best of our knowledge, regularized spectral methods have so far not been considered in the setting of clustering signed graphs. We complement our theoretical results with an extensive set of numerical experiments on synthetic data.

Bayesian Optimization of Risk Measures Machine Learning

We consider Bayesian optimization of objective functions of the form $\rho[ F(x, W) ]$, where $F$ is a black-box expensive-to-evaluate function and $\rho$ denotes either the VaR or CVaR risk measure, computed with respect to the randomness induced by the environmental random variable $W$. Such problems arise in decision making under uncertainty, such as in portfolio optimization and robust systems design. We propose a family of novel Bayesian optimization algorithms that exploit the structure of the objective function to substantially improve sampling efficiency. Instead of modeling the objective function directly as is typical in Bayesian optimization, these algorithms model $F$ as a Gaussian process, and use the implied posterior on the objective function to decide which points to evaluate. We demonstrate the effectiveness of our approach in a variety of numerical experiments.