Country
On the Consistency of Optimal Bayesian Feature Selection in the Presence of Correlations
pour, Ali Foroughi, Dalton, Lori A.
Optimal Bayesian feature selection (OBFS) is a multivariat e supervised screening method designed from the ground up for bioma rker discovery. In this work, we prove that Gaussian OBFS is strongly consisten t under mild conditions, and provide rates of convergence for key posteriors i n the framework. These results are of enormous importance, since they identify pre cisely what features are selected by OBFS asymptotically, characterize the relativ e rates of convergence for posteriors on different types of features, provide condi tions that guarantee convergence, justify the use of OBFS when its internal assum ptions are invalid, and set the stage for understanding the asymptotic behavior of other algorithms based on the OBFS framework.
Generative Modeling with Denoising Auto-Encoders and Langevin Sampling
Block, Adam, Mroueh, Youssef, Rakhlin, Alexander
We study convergence of a generative modeling method that first estimates the score function of the distribution using Denoising Auto-Encoders (DAE) or Denoising Score Matching (DSM) and then employs Langevin diffusion for sampling. We show that both DAE and DSM provide estimates of the score of the Gaussian smoothed population density, allowing us to apply the machinery of Empirical Processes. We overcome the challenge of relying only on $L^2$ bounds on the score estimation error and provide finite-sample bounds in the Wasserstein distance between the law of the population distribution and the law of this sampling scheme. We then apply our results to the homotopy method of arXiv:1907.05600 and provide theoretical justification for its empirical success.
Edge-based sequential graph generation with recurrent neural networks
Bacciu, Davide, Micheli, Alessio, Podda, Marco
Graph generation with Machine Learning is an open problem with applica tions in various research fields. In this work, we propose to cast the gen erative process of a graph into a sequential one, relying on a node ordering procedu re. We use this sequential process to design a novel generative model compo sed of two recurrent neural networks that learn to predict the edges of gr aphs: the first network generates one endpoint of each edge, while the second network generates the other endpoint conditioned on the state of the first. We test o ur approach extensively on five different datasets, comparing with two well-know n baselines coming from graph literature, and two recurrent approaches, on e of which holds state of the art performances. Evaluation is conducted consider ing quantitative and qualitative characteristics of the generated samples. Results show that our approach is able to yield novel, and unique graphs originating from very different distributions, while retaining structural properties very similar to t hose in the training sample. Under the proposed evaluation framework, our ap proach is able to reach performances comparable to the current state of t he art on the graph generation task. Keywords: graph generation; recurrent neural networks; auto-regress ive models; deep learning 1. Introduction Graphs are well-known data structures that allow to store and acc ess relational data efficiently. Their use to represent information is ubiquit ous, especially in domains such as Biology [1], Chemistry [2] and Natural Langu age Processing [3]. In all these fields, as well as many others, data do no t exist in isolation, but are connected among themselves by complex relations hips. Hence, graphs are usually preferred to "flat" vectorial data whenever t here is the need to encode both relational knowledge and numerical information in a c oncise and compact way. This trend has been increasing especially since the advent of Graph Neural Net works [4] and contextual Neural Networks for Graphs [5], which paved the r oad for modern graph-based Deep Learning [6] models. As of today, Graph Neu ral Networks are used with success for predictive tasks such as semi-supervise d classification [7], link prediction [8], and text classification [9]. Besides being able to predict outcomes using graphs, one open and le ss studied problem in Machine Learning is how to instruct learning models t o generate graphs from arbitrary distributions. This implies that to learn a graph distribution, one cannot aim to explore the entire graph space, exc ept for trivial instances. Moreover, graph distributions of interest usua lly cover only a tiny portion of this large space.
Scalable bundling via dense product embeddings
Kumar, Madhav, Eckles, Dean, Aral, Sinan
Bundling, the practice of jointly selling two or more products at a discount, is a widely used strategy in industry and a well examined concept in academia. Historically, the focus has been on theoretical studies in the context of monopolistic firms and assumed product relationships, e.g., complementarity in usage. We develop a new machine-learning-driven methodology for designing bundles in a large-scale, cross-category retail setting. We leverage historical purchases and consideration sets created from clickstream data to generate dense continuous representations of products called embeddings. We then put minimal structure on these embeddings and develop heuristics for complementarity and substitutability among products. Subsequently, we use the heuristics to create multiple bundles for each product and test their performance using a field experiment with a large retailer. We combine the results from the experiment with product embeddings using a hierarchical model that maps bundle features to their purchase likelihood, as measured by the add-to-cart rate. We find that our embeddings-based heuristics are strong predictors of bundle success, robust across product categories, and generalize well to the retailer's entire assortment.
Regret Minimization in Partially Observable Linear Quadratic Control
Lale, Sahin, Azizzadenesheli, Kamyar, Hassibi, Babak, Anandkumar, Anima
Controlling unknown discrete-time systems is a fundamenta l problem in adaptive control and reinforcement learning. In this problem, an agent interacts w ith an environment, with unknown dynamics, and aims to minimize the overall average regulati ng costs. To achieve this goal, the agent is required to explore the environment to gain a better understanding of the environment dynamics, which is often called system identification. The a gent then utilizes this understanding to design a set of improved controllers that simultaneously reduces the possible future costs and also enables the agent to explore the important and unknown a spects of the system. In recent decades, this challenging problem has been extensively stu died and resulted in a set of foundational steps to study the stability and asymptotic convergence to o ptimal controllers [Lai et al., 1982, Lai and Wei, 1987]. While asymptotic analyses set the ground for the design of optimal control, understanding the finite time behavior of adaptive algorith ms is critical for real-world applications. In practice, one might prefer an algorithm that guarantees b etter performance on a much shorter horizon. Recent developments in the fields of statistics and machine learning along with control theory [Van Der Vaart and Wellner, 1996, Peña et al., 2009, Lai et al., 1982] empowers us to not only advance the study of the asymptotic efficiency of algorithms b ut also to analyze their finite-time behavior [Fiechter, 1997, Abbasi-Yadkori and Szepesvári, 2011]. In partially observable linear quadratic control, if the ag ent, a priori, is handed the system dynamics, the optimal control/policy has a closed-form in t he presence of Gaussian disturbances.
Boosting Algorithms for Estimating Optimal Individualized Treatment Rules
Wang, Duzhe, Fu, Haoda, Loh, Po-Ling
The proposed algorithms are based on the XGBoost algorithm, which is known as one of the most powerful algorithms in the machine learning literature. Our main idea is to model the conditional mean of clinical outcome or the decision rule via additive regression trees, and use the boosting technique to estimate each single tree iteratively. Our approaches overcome the challenge of correct model specification, which is required in current parametric methods. The major contribution of our proposed algorithms is providing efficient and accurate estimation of the highly nonlinear and complex optimal individualized treatment rules that often arise in practice. Finally, we illustrate the superior performance of our algorithms by extensive simulation studies and conclude with an application to the real data from a diabetes Phase III trial. 1 Introduction Precision medicine, as an emerging medical approach for disease treatment and prevention, has received more and more attention among government, healthcare industry and academia in recent years. It is a well-known fact that there exists a significant heterogeneity for patients in response to treatments. For example, as demonstrated in [9], for patients who are infected with human immunodeficiency virus and tuberculosis, their optimal timing of antiretroviral therapy (ART) varies significantly.
Data Augmentation for Histopathological Images Based on Gaussian-Laplacian Pyramid Blending
Ataky, Steve Tsham Mpinda, de Matos, Jonathan, Britto, Alceu de S. Jr., Oliveira, Luiz E. S., Koerich, Alessandro L.
Data imbalance is a major problem that affects several machine learning algorithms. Such problems are troublesome because most of the learning algorithms attempts to optimize a loss function based on error measures that do not take into account the data imbalance. Accordingly, the learning algorithm simply generates a trivial model that is biased toward predicting the most frequent class in the training data. Data augmentation techniques have been used to mitigate the data imbalance problem. However, in the case of histopathologic images (HIs), low-level as well as high-level data augmentation techniques still present performance issues when applied in the presence of inter-patient variability; whence the model tends to learn color representations, which are in fact related to the stain process. In this paper, we propose an approach capable of not only augmenting HIs database but also distributing the inter-patient variability by means of image blending using Gaussian-Laplacian pyramid. The proposed approach consists in finding the Gaussian pyramids of two images of different patients and finding the Laplacian pyramids thereof. Afterwards, the left half of one image and the right half of another are joined in each level of Laplacian pyramid, and from the joint pyramids, the original image is reconstructed. This composition, resulting from the blending process, combines stain variation of two patients, avoiding that color misleads the learning process. Experimental results on the BreakHis dataset have shown promising gains vis-\`a-vis the majority of traditional techniques presented in the literature.
Simultaneous Skull Conductivity and Focal Source Imaging from EEG Recordings with the help of Bayesian Uncertainty Modelling
Koulouri, Alexandra, Rimpilainen, Ville
The electroencephalography (EEG) source imaging problem is very sensitive to the electrical modelling of the skull of the patient under examination. Unfortunately, the currently available EEG devices and their embedded software do not take this into account; instead, it is common to use a literature-based skull conductivity parameter. In this paper, we propose a statistical method based on the Bayesian approximation error approach to compensate for source imaging errors due to the unknown skull conductivity and, simultaneously, to compute a low-order estimate for the actual skull conductivity value. By using simulated EEG data that corresponds to focal source activity, we demonstrate the potential of the method to reconstruct the underlying focal sources and low-order errors induced by the unknown skull conductivity. Subsequently, the estimated errors are used to approximate the skull conductivity. The results indicate clear improvements in the source localization accuracy and feasible skull conductivity estimates.
Evolving Loss Functions With Multivariate Taylor Polynomial Parameterizations
Gonzalez, Santiago, Miikkulainen, Risto
Loss function optimization for neural networks has recently emerged as a new direction for metalearning, with Genetic Loss Optimization (GLO) providing a general approach for the discovery and optimization of such functions. GLO represents loss functions as trees that are evolved and further optimized using evolutionary strategies. However, searching in this space is difficult because most candidates are not valid loss functions. In this paper, a new technique, Multivariate Taylor expansion-based genetic loss-function optimization (TaylorGLO), is introduced to solve this problem. It represents functions using a novel parameterization based on Taylor expansions, making the search more effective. TaylorGLO is able to find new loss functions that outperform those found by GLO in many fewer generations, demonstrating that loss function optimization is a productive avenue for metalearning.
Improving the Detection of Burnt Areas in Remote Sensing using Hyper-features Evolved by M3GP
--One problem found when working with satellite images is the radiometric variations across the image and different images. Intending to improve remote sensing models for the classification of burnt areas, we set two objectives. The first is to understand the relationship between feature spaces and the predictive ability of the models, allowing us to explain the differences between learning and generalization when training and testing in different datasets. We find that training on datasets built from more than one image provides models that generalize better . These results are explained by visualizing the dispersion of values on the feature space. The second objective is to evolve hyper-features that improve the performance of different classifiers on a variety of test sets. We find the hyper-features to be beneficial, and obtain the best models with XGBoost, even if the hyper-features are optimized for a different method. Deforestation has serious implications on biodiversity, on rural communities that depend on forests for survival, and on greenhouse gas emissions that drive the global climate. The machine learning (ML) community can help by providing predictive models that, after learning from a small sample of an image, can automatically classify the whole image. Although previous ML work in forest monitoring has shown good results, the predictive models are often applied on the same location where they were learnt, i.e., the models are trained and tested in samples from the same dataset (e.g., [1]) or time series from the same area (e.g., [2]).