Bayesian Inference
Distributed Learning from Interactions in Social Networks
Sasso, Francesco, Coluccia, Angelo, Notarstefano, Giuseppe
We consider a network scenario in which agents can evaluate each other according to a score graph that models some interactions. The goal is to design a distributed protocol, run by the agents, that allows them to learn their unknown state among a finite set of possible values. We propose a Bayesian framework in which scores and states are associated to probabilistic events with unknown parameters and hyperparameters, respectively. We show that each agent can learn its state by means of a local Bayesian classifier and a (centralized) Maximum-Likelihood (ML) estimator of parameter-hyperparameter that combines plain ML and Empirical Bayes approaches. By using tools from graphical models, which allow us to gain insight on conditional dependencies of scores and states, we provide a relaxed probabilistic model that ultimately leads to a parameter-hyperparameter estimator amenable to distributed computation. To highlight the appropriateness of the proposed relaxation, we demonstrate the distributed estimators on a social interaction setup for user profiling. A common feature of online social networks (OSNs) is the possibility of individuals to continuously interact among themselves, by sharing contents and expressing opinions or ratings on different topics [1], [2].
Efficiency of adaptive importance sampling
Delyon, Bernard, Portier, Franรงois
The \textit{sampling policy} of stage $t$, formally expressed as a probability density function $q_t$, stands for the distribution of the sample $(x_{t,1},\ldots, x_{t,n_t})$ generated at $t$. From the past samples, some information depending on some \textit{objective} is derived leading eventually to update the sampling policy to $q_{t+1}$. This generic approach characterizes \textit{adaptive importance sampling} (AIS) schemes. Each stage $t$ is formed with two steps : (i) to explore the space with $n_t$ points according to $q_t$ and (ii) to exploit the current amount of information to update the sampling policy. The very fundamental question raised in the paper concerns the behavior of empirical sums based on AIS. Without making any assumption on the \textit{allocation policy} $n_t$, the theory developed involves no restriction on the split of computational resources between the explore (i) and the exploit (ii) step. It is shown that AIS is efficient : the asymptotic behavior of AIS is the same as some "oracle" strategy that knows the optimal sampling policy from the beginning. From a practical perspective, weighted AIS is introduced, a new method that allows to forget poor samples from early stages.
Learning Graphs from Data: A Signal Representation Perspective
Dong, Xiaowen, Thanou, Dorina, Rabbat, Michael, Frossard, Pascal
The construction of a meaningful graph topology plays a crucial role in the effective representation, processing, analysis and visualization of structured data. When a natural choice of the graph is not readily available from the datasets, it is thus desirable to infer or learn a graph topology from the data. In this tutorial overview, we survey solutions to the problem of graph learning, including classical viewpoints from statistics and physics, and more recent approaches that adopt a graph signal processing (GSP) perspective. We further emphasize the conceptual similarities and differences between classical and GSP graph inference methods and highlight the potential advantage of the latter in a number of theoretical and practical scenarios. We conclude with several open issues and challenges that are keys to the design of future signal processing and machine learning algorithms for learning graphs from data.
Sparse Linear Discriminant Analysis under the Neyman-Pearson Paradigm
Tong, Xin, Xia, Lucy, Wang, Jiacheng, Feng, Yang
In classification applications such as severe disease diagnosis and fraud detection, people have clear priorities over the two types of classification errors. For instance, diagnosing a patient with cancer to be healthy may lead to loss of life, which incurs a much higher cost than the other way around. The classical binary classification paradigm does not take into account such priorities, as it aims to minimize the overall classification error. In contrast, the Neyman-Pearson (NP) paradigm seeks classifiers with a minimal type II error while having the prioritized type I error constrained under a user-specified level, addressing asymmetric type I/II error priorities in the previously mentioned scenarios. Despite recent advances in the NP classification literature, two essential issues pose challenges: i) current theoretical framework assumes bounded feature support, which does not admit parametric settings; ii) in practice, existing NP classifiers involve splitting class 0 samples into two parts using a pre-fixed split proportion. To address the first challenge, we present NP-sLDA that adapts the popular sparse linear discriminant analysis (sLDA, Mai et al. (2012)) to the NP paradigm. On the theoretical front, this is the first theoretically justified NP classifier that takes parametric assumptions and unbounded feature support. We formulate a new conditional margin assumption and a new conditional detection condition to accommodate unbounded feature support and show that NP-sLDA satisfies the NP oracle inequalities. Numerical results show that NP-sLDA is a valuable addition to the existing NP classifiers. To address the second challenge, we construct a general data-adaptive sample splitting scheme that improves the classification performance upon the default half-half class 0 split used in Tong et al. (2018).
Data-Free/Data-Sparse Softmax Parameter Estimation with Structured Class Geometries
This note considers softmax parameter estimation when little/no labeled training data is available, but a priori information about the relative geometry of class label log-odds boundaries is available. It is shown that `data-free' softmax model synthesis corresponds to solving a linear system of parameter equations, wherein desired dominant class log-odds boundaries are encoded via convex polytopes that decompose the input feature space. When solvable, the linear equations yield closed-form softmax parameter solution families using class boundary polytope specifications only. This allows softmax parameter learning to be implemented without expensive brute force data sampling and numerical optimization. The linear equations can also be adapted to constrained maximum likelihood estimation in data-sparse settings. Since solutions may also fail to exist for the linear parameter equations derived from certain polytope specifications, it is thus also shown that there exist probabilistic classification problems over m convexly separable classes for which the log-odds boundaries cannot be learned using an m-class softmax model.
Optimal Clustering under Uncertainty
Dalton, Lori A., Benalcรกzar, Marco E., Dougherty, Edward R.
Classical clustering algorithms typically either lack an underlying probability framework to make them predictive or focus on parameter estimation rather than defining and minimizing a notion of error. Recent work addresses these issues by developing a probabilistic framework based on the theory of random labeled point processes and characterizing a Bayes clusterer that minimizes the number of misclustered points. The Bayes clusterer is analogous to the Bayes classifier. Whereas determining a Bayes classifier requires full knowledge of the feature-label distribution, deriving a Bayes clusterer requires full knowledge of the point process. When uncertain of the point process, one would like to find a robust clusterer that is optimal over the uncertainty, just as one may find optimal robust classifiers with uncertain feature-label distributions. Herein, we derive an optimal robust clusterer by first finding an effective random point process that incorporates all randomness within its own probabilistic structure and from which a Bayes clusterer can be derived that provides an optimal robust clusterer relative to the uncertainty. This is analogous to the use of effective class-conditional distributions in robust classification. After evaluating the performance of robust clusterers in synthetic mixtures of Gaussians models, we apply the framework to granular imaging, where we make use of the asymptotic granulometric moment theory for granular images to relate robust clustering theory to the application.
Binary Classification with Karmic, Threshold-Quasi-Concave Metrics
Yan, Bowei, Koyejo, Oluwasanmi, Zhong, Kai, Ravikumar, Pradeep
Complex performance measures, beyond the popular measure of accuracy, are increasingly being used in the context of binary classification. These complex performance measures are typically not even decomposable, that is, the loss evaluated on a batch of samples cannot typically be expressed as a sum or average of losses evaluated at individual samples, which in turn requires new theoretical and methodological developments beyond standard treatments of supervised learning. In this paper, we advance this understanding of binary classification for complex performance measures by identifying two key properties: a so-called Karmic property, and a more technical threshold-quasi-concavity property, which we show is milder than existing structural assumptions imposed on performance measures. Under these properties, we show that the Bayes optimal classifier is a threshold function of the conditional probability of positive class. We then leverage this result to come up with a computationally practical plug-in classifier, via a novel threshold estimator, and further, provide a novel statistical analysis of classification error with respect to complex performance measures.
Bayesian approach to model-based extrapolation of nuclear observables
Neufcourt, Lรฉo, Cao, Yuchen, Nazarewicz, Witold, Viens, Frederi
The mass, or binding energy, is the basis property of the atomic nucleus. It determines its stability, and reaction and decay rates. Quantifying the nuclear binding is important for understanding the origin of elements in the universe. The astrophysical processes responsible for the nucleosynthesis in stars often take place far from the valley of stability, where experimental masses are not known. In such cases, missing nuclear information must be provided by theoretical predictions using extreme extrapolations. Bayesian machine learning techniques can be applied to improve predictions by taking full advantage of the information contained in the deviations between experimental and calculated masses. We consider 10 global models based on nuclear Density Functional Theory as well as two more phenomenological mass models. The emulators of S2n residuals and credibility intervals defining theoretical error bars are constructed using Bayesian Gaussian processes and Bayesian neural networks. We consider a large training dataset pertaining to nuclei whose masses were measured before 2003. For the testing datasets, we considered those exotic nuclei whose masses have been determined after 2003. We then carried out extrapolations towards the 2n dripline. While both Gaussian processes and Bayesian neural networks reduce the rms deviation from experiment significantly, GP offers a better and much more stable performance. The increase in the predictive power is quite astonishing: the resulting rms deviations from experiment on the testing dataset are similar to those of more phenomenological models. The empirical coverage probability curves we obtain match very well the reference values which is highly desirable to ensure honesty of uncertainty quantification, and the estimated credibility intervals on predictions make it possible to evaluate predictive power of individual models.
A Fast and Scalable Joint Estimator for Integrating Additional Knowledge in Learning Multiple Related Sparse Gaussian Graphical Models
Wang, Beilun, Sekhon, Arshdeep, Qi, Yanjun
We consider the problem of including additional knowledge in estimating sparse Gaussian graphical models (sGGMs) from aggregated samples, arising often in bioinformatics and neuroimaging applications. Previous joint sGGM estimators either fail to use existing knowledge or cannot scale-up to many tasks (large $K$) under a high-dimensional (large $p$) situation. In this paper, we propose a novel \underline{J}oint \underline{E}lementary \underline{E}stimator incorporating additional \underline{K}nowledge (JEEK) to infer multiple related sparse Gaussian Graphical models from large-scale heterogeneous data. Using domain knowledge as weights, we design a novel hybrid norm as the minimization objective to enforce the superposition of two weighted sparsity constraints, one on the shared interactions and the other on the task-specific structural patterns. This enables JEEK to elegantly consider various forms of existing knowledge based on the domain at hand and avoid the need to design knowledge-specific optimization. JEEK is solved through a fast and entry-wise parallelizable solution that largely improves the computational efficiency of the state-of-the-art $O(p^5K^4)$ to $O(p^2K^4)$. We conduct a rigorous statistical analysis showing that JEEK achieves the same convergence rate $O(\log(Kp)/n_{tot})$ as the state-of-the-art estimators that are much harder to compute. Empirically, on multiple synthetic datasets and two real-world data, JEEK outperforms the speed of the state-of-arts significantly while achieving the same level of prediction accuracy.
Adversarial quantum circuit learning for pure state approximation
Benedetti, Marcello, Grant, Edward, Wossnig, Leonard, Severini, Simone
Adversarial learning is one of the most successful approaches to modelling high-dimensional probability distributions from data. The quantum computing community has recently begun to generalize this idea and to look for potential applications. In this work, we derive an adversarial algorithm for the problem of approximating an unknown quantum pure state. Although this could be done on error-corrected quantum computers, the adversarial formulation enables us to execute the algorithm on near-term quantum computers. Two ansatz circuits are optimized in tandem: One tries to approximate the target state, the other tries to distinguish between target and approximated state. Supported by numerical simulations, we show that resilient backpropagation algorithms perform remarkably well in optimizing the two circuits. We use the bipartite entanglement entropy to design an efficient heuristic for the stopping criteria. Our approach may find application in quantum state tomography.