Goto

Collaborating Authors

 Uncertainty


Idealised Bayesian Neural Networks Cannot Have Adversarial Examples: Theoretical and Empirical Study

arXiv.org Machine Learning

We prove that idealised discriminative Bayesian neural networks, capturing perfect epistemic uncertainty, cannot have adversarial examples: Techniques for crafting adversarial examples will necessarily fail to generate perturbed images which fool the classifier. This suggests why MC dropout-based techniques have been observed to be fairly effective against adversarial examples. We support our claims mathematically and empirically. We experiment with HMC on synthetic data derived from MNIST for which we know the ground truth image density, showing that near-perfect epistemic uncertainty correlates to density under image manifold, and that adversarial images lie off the manifold. Using our new-found insights we suggest a new attack for MC dropout-based models by looking for imperfections in uncertainty estimation, and also suggest a mitigation. Lastly, we demonstrate our mitigation on a cats-vs-dogs image classification task with a VGG13 variant.


How Bayesian Networks are pioneering the 'smart data' revolution

#artificialintelligence

The era of'big data' offers enormous opportunities for societal improvements. There is an expectation โ€“ and even excitement โ€“ that, by simply applying sophisticated machine learning algorithms to'big data' sets, we may automatically find solutions to problems that were previously either unsolvable or would incur prohibitive economic costs. Yet, the clever algorithms needed to process big data cannot (and will never) solve most of the critical risk analysis problems that we face. Big data, even when carefully collected is typically unstructured and noisy; even the'biggest data' typically lack crucial, often hidden, information about key causal or explanatory variables that generate or influence the data we observe. For example, the world's leading economists failed to predict the 2008โ€“2010 international financial crisis because they relied on models based on historical statistical data that could not adapt to new circumstances, even when those circumstances were foreseeable by contrarian experts.


On the performance of multi-objective estimation of distribution algorithms for combinatorial problems

arXiv.org Artificial Intelligence

Fitness landscape analysis investigates features with a high influence on the performance of optimization algorithms, aiming to take advantage of the addressed problem characteristics. In this work, a fitness landscape analysis using problem features is performed for a Multi-objective Bayesian Optimization Algorithm (mBOA) on instances of MNK-landscape problem for 2, 3, 5 and 8 objectives. We also compare the results of mBOA with those provided by NSGA-III through the analysis of their estimated runtime necessary to identify an approximation of the Pareto front. Moreover, in order to scrutinize the probabilistic graphic model obtained by mBOA, the Pareto front is examined according to a probabilistic view. The fitness landscape study shows that mBOA is moderately or loosely influenced by some problem features, according to a simple and a multiple linear regression model, which is being proposed to predict the algorithms performance in terms of the estimated runtime. Besides, we conclude that the analysis of the probabilistic graphic model produced at the end of evolution can be useful to understand the convergence and diversity performances of the proposed approach.


A Desirability-Based Axiomatisation for Coherent Choice Functions

arXiv.org Artificial Intelligence

Choice functions constitute a simple, direct and very general mathematical framework for modelling choice under uncertainty. In particular, they are able to represent the set-valued choices that typically arise from applying decision rules to imprecise-probabilistic uncertainty models. We provide them with a clear interpretation in terms of attitudes towards gambling, borrowing ideas from the theory of sets of desirable gambles, and we use this interpretation to derive a set of basic axioms. We show that these axioms lead to a full-fledged theory of coherent choice functions, which includes a representation in terms of sets of desirable gambles, and a conservative inference method.


Distributed Learning from Interactions in Social Networks

arXiv.org Machine Learning

We consider a network scenario in which agents can evaluate each other according to a score graph that models some interactions. The goal is to design a distributed protocol, run by the agents, that allows them to learn their unknown state among a finite set of possible values. We propose a Bayesian framework in which scores and states are associated to probabilistic events with unknown parameters and hyperparameters, respectively. We show that each agent can learn its state by means of a local Bayesian classifier and a (centralized) Maximum-Likelihood (ML) estimator of parameter-hyperparameter that combines plain ML and Empirical Bayes approaches. By using tools from graphical models, which allow us to gain insight on conditional dependencies of scores and states, we provide a relaxed probabilistic model that ultimately leads to a parameter-hyperparameter estimator amenable to distributed computation. To highlight the appropriateness of the proposed relaxation, we demonstrate the distributed estimators on a social interaction setup for user profiling. A common feature of online social networks (OSNs) is the possibility of individuals to continuously interact among themselves, by sharing contents and expressing opinions or ratings on different topics [1], [2].


Efficiency of adaptive importance sampling

arXiv.org Machine Learning

The \textit{sampling policy} of stage $t$, formally expressed as a probability density function $q_t$, stands for the distribution of the sample $(x_{t,1},\ldots, x_{t,n_t})$ generated at $t$. From the past samples, some information depending on some \textit{objective} is derived leading eventually to update the sampling policy to $q_{t+1}$. This generic approach characterizes \textit{adaptive importance sampling} (AIS) schemes. Each stage $t$ is formed with two steps : (i) to explore the space with $n_t$ points according to $q_t$ and (ii) to exploit the current amount of information to update the sampling policy. The very fundamental question raised in the paper concerns the behavior of empirical sums based on AIS. Without making any assumption on the \textit{allocation policy} $n_t$, the theory developed involves no restriction on the split of computational resources between the explore (i) and the exploit (ii) step. It is shown that AIS is efficient : the asymptotic behavior of AIS is the same as some "oracle" strategy that knows the optimal sampling policy from the beginning. From a practical perspective, weighted AIS is introduced, a new method that allows to forget poor samples from early stages.


Learning Graphs from Data: A Signal Representation Perspective

arXiv.org Machine Learning

The construction of a meaningful graph topology plays a crucial role in the effective representation, processing, analysis and visualization of structured data. When a natural choice of the graph is not readily available from the datasets, it is thus desirable to infer or learn a graph topology from the data. In this tutorial overview, we survey solutions to the problem of graph learning, including classical viewpoints from statistics and physics, and more recent approaches that adopt a graph signal processing (GSP) perspective. We further emphasize the conceptual similarities and differences between classical and GSP graph inference methods and highlight the potential advantage of the latter in a number of theoretical and practical scenarios. We conclude with several open issues and challenges that are keys to the design of future signal processing and machine learning algorithms for learning graphs from data.


A Possibility Distribution Based Multi-Criteria Decision Algorithm for Resilient Supplier Selection Problems

arXiv.org Artificial Intelligence

Resilient supplier selection problem is a key decision problem for an organization to gain competitive advantage. In the presence of multiple conflicting evaluation criteria, contradicting decision makers, and imprecise information sources, this problem becomes even more difficult to solve with the classical optimization approaches. Multi-Criteria Decision Analysis (MCDA) is a viable alternative approach for handling the imprecise information associated with the evaluation proffered by the decision makers. In this work, we present a comprehensive algorithm for ranking a set of suppliers based on aggregated information obtained from crisp numerical assessments and reliability adjusted linguistic appraisals from a group of decision makers. We adapted two popular tools - Single Valued Neutrosophic Sets (SVNS) and Interval-valued fuzzy sets (IVFS) and extended them to incorporate both crisp and linguistic evaluations from the decision makers to obtain aggregated SVNS and IVFS. This information is then used to rank the suppliers by using TOPSIS method. We present a case study to illustrate the mechanism of the proposed algorithm and show sensitivity of the supplier ranking with respect to the priorities of evaluation criteria.


Structural Learning of Multivariate Regression Chain Graphs via Decomposition

arXiv.org Artificial Intelligence

We extend the decomposition approach for learning Bayesian networks (BN) proposed by (Xie et al., 2006) to learning multivariate regression chain graphs (MVR CGs), which include BNs as a special case. The same advantages of this decomposition approach hold in the more general setting: reduces complexity and increased power of computational independence tests. Moreover, latent (hidden) variables can be represented in MVR CGs by using bidirected edges, and our algorithm correctly recovers any independence structure that is faithful to a MVR CG, thus greatly extending the range of applications of decomposition-based model selection techniques. While our new algorithm has the same complexity as the one in (Xie et al., 2006) for BNs, it requires larger components for general MVR CGs, to insure that sufficient data is present to estimate parameters.


Sparse Linear Discriminant Analysis under the Neyman-Pearson Paradigm

arXiv.org Machine Learning

In classification applications such as severe disease diagnosis and fraud detection, people have clear priorities over the two types of classification errors. For instance, diagnosing a patient with cancer to be healthy may lead to loss of life, which incurs a much higher cost than the other way around. The classical binary classification paradigm does not take into account such priorities, as it aims to minimize the overall classification error. In contrast, the Neyman-Pearson (NP) paradigm seeks classifiers with a minimal type II error while having the prioritized type I error constrained under a user-specified level, addressing asymmetric type I/II error priorities in the previously mentioned scenarios. Despite recent advances in the NP classification literature, two essential issues pose challenges: i) current theoretical framework assumes bounded feature support, which does not admit parametric settings; ii) in practice, existing NP classifiers involve splitting class 0 samples into two parts using a pre-fixed split proportion. To address the first challenge, we present NP-sLDA that adapts the popular sparse linear discriminant analysis (sLDA, Mai et al. (2012)) to the NP paradigm. On the theoretical front, this is the first theoretically justified NP classifier that takes parametric assumptions and unbounded feature support. We formulate a new conditional margin assumption and a new conditional detection condition to accommodate unbounded feature support and show that NP-sLDA satisfies the NP oracle inequalities. Numerical results show that NP-sLDA is a valuable addition to the existing NP classifiers. To address the second challenge, we construct a general data-adaptive sample splitting scheme that improves the classification performance upon the default half-half class 0 split used in Tong et al. (2018).