Bayesian Learning
On the performance of multi-objective estimation of distribution algorithms for combinatorial problems
Martins, Marcella S. R., Yafrani, Mohamed El, Santana, Roberto, Delgado, Myriam, Lüders, Ricardo, Ahiod, Belaïd
Fitness landscape analysis investigates features with a high influence on the performance of optimization algorithms, aiming to take advantage of the addressed problem characteristics. In this work, a fitness landscape analysis using problem features is performed for a Multi-objective Bayesian Optimization Algorithm (mBOA) on instances of MNK-landscape problem for 2, 3, 5 and 8 objectives. We also compare the results of mBOA with those provided by NSGA-III through the analysis of their estimated runtime necessary to identify an approximation of the Pareto front. Moreover, in order to scrutinize the probabilistic graphic model obtained by mBOA, the Pareto front is examined according to a probabilistic view. The fitness landscape study shows that mBOA is moderately or loosely influenced by some problem features, according to a simple and a multiple linear regression model, which is being proposed to predict the algorithms performance in terms of the estimated runtime. Besides, we conclude that the analysis of the probabilistic graphic model produced at the end of evolution can be useful to understand the convergence and diversity performances of the proposed approach.
EigenNetworks
Mei, Jonathan, Moura, José M. F.
In many applications, the interdependencies among a set of $N$ time series $\{ x_{nk}, k>0 \}_{n=1}^{N}$ are well captured by a graph or network $G$. The network itself may change over time as well (i.e., as $G_k$). We expect the network changes to be at a much slower rate than that of the time series. This paper introduces eigennetworks, networks that are building blocks to compose the actual networks $G_k$ capturing the dependencies among the time series. These eigennetworks can be estimated by first learning the time series of graphs $G_k$ from the data, followed by a Principal Network Analysis procedure. Algorithms for learning both the original time series of graphs and the eigennetworks are presented and discussed. Experiments on simulated and real time series data demonstrate the performance of the learning and the interpretation of the eigennetworks.
Distributed Learning from Interactions in Social Networks
Sasso, Francesco, Coluccia, Angelo, Notarstefano, Giuseppe
We consider a network scenario in which agents can evaluate each other according to a score graph that models some interactions. The goal is to design a distributed protocol, run by the agents, that allows them to learn their unknown state among a finite set of possible values. We propose a Bayesian framework in which scores and states are associated to probabilistic events with unknown parameters and hyperparameters, respectively. We show that each agent can learn its state by means of a local Bayesian classifier and a (centralized) Maximum-Likelihood (ML) estimator of parameter-hyperparameter that combines plain ML and Empirical Bayes approaches. By using tools from graphical models, which allow us to gain insight on conditional dependencies of scores and states, we provide a relaxed probabilistic model that ultimately leads to a parameter-hyperparameter estimator amenable to distributed computation. To highlight the appropriateness of the proposed relaxation, we demonstrate the distributed estimators on a social interaction setup for user profiling. A common feature of online social networks (OSNs) is the possibility of individuals to continuously interact among themselves, by sharing contents and expressing opinions or ratings on different topics [1], [2].
Efficiency of adaptive importance sampling
Delyon, Bernard, Portier, François
The \textit{sampling policy} of stage $t$, formally expressed as a probability density function $q_t$, stands for the distribution of the sample $(x_{t,1},\ldots, x_{t,n_t})$ generated at $t$. From the past samples, some information depending on some \textit{objective} is derived leading eventually to update the sampling policy to $q_{t+1}$. This generic approach characterizes \textit{adaptive importance sampling} (AIS) schemes. Each stage $t$ is formed with two steps : (i) to explore the space with $n_t$ points according to $q_t$ and (ii) to exploit the current amount of information to update the sampling policy. The very fundamental question raised in the paper concerns the behavior of empirical sums based on AIS. Without making any assumption on the \textit{allocation policy} $n_t$, the theory developed involves no restriction on the split of computational resources between the explore (i) and the exploit (ii) step. It is shown that AIS is efficient : the asymptotic behavior of AIS is the same as some "oracle" strategy that knows the optimal sampling policy from the beginning. From a practical perspective, weighted AIS is introduced, a new method that allows to forget poor samples from early stages.
Learning Graphs from Data: A Signal Representation Perspective
Dong, Xiaowen, Thanou, Dorina, Rabbat, Michael, Frossard, Pascal
The construction of a meaningful graph topology plays a crucial role in the effective representation, processing, analysis and visualization of structured data. When a natural choice of the graph is not readily available from the datasets, it is thus desirable to infer or learn a graph topology from the data. In this tutorial overview, we survey solutions to the problem of graph learning, including classical viewpoints from statistics and physics, and more recent approaches that adopt a graph signal processing (GSP) perspective. We further emphasize the conceptual similarities and differences between classical and GSP graph inference methods and highlight the potential advantage of the latter in a number of theoretical and practical scenarios. We conclude with several open issues and challenges that are keys to the design of future signal processing and machine learning algorithms for learning graphs from data.
Structural Learning of Multivariate Regression Chain Graphs via Decomposition
Javidian, Mohammad Ali, Valtorta, Marco
We extend the decomposition approach for learning Bayesian networks (BN) proposed by (Xie et al., 2006) to learning multivariate regression chain graphs (MVR CGs), which include BNs as a special case. The same advantages of this decomposition approach hold in the more general setting: reduces complexity and increased power of computational independence tests. Moreover, latent (hidden) variables can be represented in MVR CGs by using bidirected edges, and our algorithm correctly recovers any independence structure that is faithful to a MVR CG, thus greatly extending the range of applications of decomposition-based model selection techniques. While our new algorithm has the same complexity as the one in (Xie et al., 2006) for BNs, it requires larger components for general MVR CGs, to insure that sufficient data is present to estimate parameters.
Sparse Linear Discriminant Analysis under the Neyman-Pearson Paradigm
Tong, Xin, Xia, Lucy, Wang, Jiacheng, Feng, Yang
In classification applications such as severe disease diagnosis and fraud detection, people have clear priorities over the two types of classification errors. For instance, diagnosing a patient with cancer to be healthy may lead to loss of life, which incurs a much higher cost than the other way around. The classical binary classification paradigm does not take into account such priorities, as it aims to minimize the overall classification error. In contrast, the Neyman-Pearson (NP) paradigm seeks classifiers with a minimal type II error while having the prioritized type I error constrained under a user-specified level, addressing asymmetric type I/II error priorities in the previously mentioned scenarios. Despite recent advances in the NP classification literature, two essential issues pose challenges: i) current theoretical framework assumes bounded feature support, which does not admit parametric settings; ii) in practice, existing NP classifiers involve splitting class 0 samples into two parts using a pre-fixed split proportion. To address the first challenge, we present NP-sLDA that adapts the popular sparse linear discriminant analysis (sLDA, Mai et al. (2012)) to the NP paradigm. On the theoretical front, this is the first theoretically justified NP classifier that takes parametric assumptions and unbounded feature support. We formulate a new conditional margin assumption and a new conditional detection condition to accommodate unbounded feature support and show that NP-sLDA satisfies the NP oracle inequalities. Numerical results show that NP-sLDA is a valuable addition to the existing NP classifiers. To address the second challenge, we construct a general data-adaptive sample splitting scheme that improves the classification performance upon the default half-half class 0 split used in Tong et al. (2018).
The Logistic Regression Algorithm
Logistic Regression is one of the most used Machine Learning algorithms for binary classification. It is a simple Algorithm that you can use as a performance baseline, it is easy to implement and it will do well enough in many tasks. Therefore every Machine Learning engineer should be familiar with its concepts. The building block concepts of Logistic Regression can also be helpful in deep learning while building neural networks. In this post, you will learn what Logistic Regression is, how it works, what are advantages and disadvantages and much more.
Data-Free/Data-Sparse Softmax Parameter Estimation with Structured Class Geometries
This note considers softmax parameter estimation when little/no labeled training data is available, but a priori information about the relative geometry of class label log-odds boundaries is available. It is shown that `data-free' softmax model synthesis corresponds to solving a linear system of parameter equations, wherein desired dominant class log-odds boundaries are encoded via convex polytopes that decompose the input feature space. When solvable, the linear equations yield closed-form softmax parameter solution families using class boundary polytope specifications only. This allows softmax parameter learning to be implemented without expensive brute force data sampling and numerical optimization. The linear equations can also be adapted to constrained maximum likelihood estimation in data-sparse settings. Since solutions may also fail to exist for the linear parameter equations derived from certain polytope specifications, it is thus also shown that there exist probabilistic classification problems over m convexly separable classes for which the log-odds boundaries cannot be learned using an m-class softmax model.
Optimal Clustering under Uncertainty
Dalton, Lori A., Benalcázar, Marco E., Dougherty, Edward R.
Classical clustering algorithms typically either lack an underlying probability framework to make them predictive or focus on parameter estimation rather than defining and minimizing a notion of error. Recent work addresses these issues by developing a probabilistic framework based on the theory of random labeled point processes and characterizing a Bayes clusterer that minimizes the number of misclustered points. The Bayes clusterer is analogous to the Bayes classifier. Whereas determining a Bayes classifier requires full knowledge of the feature-label distribution, deriving a Bayes clusterer requires full knowledge of the point process. When uncertain of the point process, one would like to find a robust clusterer that is optimal over the uncertainty, just as one may find optimal robust classifiers with uncertain feature-label distributions. Herein, we derive an optimal robust clusterer by first finding an effective random point process that incorporates all randomness within its own probabilistic structure and from which a Bayes clusterer can be derived that provides an optimal robust clusterer relative to the uncertainty. This is analogous to the use of effective class-conditional distributions in robust classification. After evaluating the performance of robust clusterers in synthetic mixtures of Gaussians models, we apply the framework to granular imaging, where we make use of the asymptotic granulometric moment theory for granular images to relate robust clustering theory to the application.