AITopics

Bayesian inference is used extensively to quantify the uncertainty in an inferred field given the measurement of a related field when the two are linked by a mathematical model. Despite its many applications, Bayesian inference faces challenges when inferring fields that have discrete representations of large dimension, and/or have prior distributions that are difficult to characterize mathematically. In this work we demonstrate how the approximate distribution learned by a deep generative adversarial network (GAN) may be used as a prior in a Bayesian update to address both these challenges. We demonstrate the efficacy of this approach on two distinct, and remarkably broad, classes of problems. The first class leads to supervised learning algorithms for image classification with superior out of distribution detection and accuracy, and for image inpainting with built-in variance estimation. The second class leads to unsupervised learning algorithms for image denoising and for solving physics-driven inverse problems.

inference, inverse problem, variance, (16 more...)

doi: 10.13140/RG.2.2.28806.32322

2003.12597

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings of Structured Data

Grohe, Martin

Vector representations of graphs and relational structures, whether hand-crafted feature vectors or learned representations, enable us to apply standard data analysis and machine learning techniques to the structures. A wide range of methods for generating such embeddings have been studied in the machine learning and knowledge representation literature. However, vector embeddings have received relatively little attention from a theoretical point of view. Starting with a survey of embedding techniques that have been used in practice, in this paper we propose two theoretical approaches that we see as central for understanding the foundations of vector embeddings. We draw connections between the various approaches and suggest directions for future research.

graph, proceedings, vector, (15 more...)

2003.1259

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
(2 more...)

Genre:

Overview (0.87)
Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Incorporating Expert Prior in Bayesian Optimisation via Space Warping

Ramachandran, Anil, Gupta, Sunil, Rana, Santu, Li, Cheng, Venkatesh, Svetha

Bayesian optimisation is a well-known sample-efficient method for the optimisation of expensive black-box functions. However when dealing with big search spaces the algorithm goes through several low function value regions before reaching the optimum of the function. Since the function evaluations are expensive in terms of both money and time, it may be desirable to alleviate this problem. One approach to subside this cold start phase is to use prior knowledge that can accelerate the optimisation. In its standard form, Bayesian optimisation assumes the likelihood of any point in the search space being the optimum is equal. Therefore any prior knowledge that can provide information about the optimum of the function would elevate the optimisation performance. In this paper, we represent the prior knowledge about the function optimum through a prior distribution. The prior distribution is then used to warp the search space in such a way that space gets expanded around the high probability region of function optimum and shrinks around low probability region of optimum. We incorporate this prior directly in function model (Gaussian process), by redefining the kernel matrix, which allows this method to work with any acquisition function, i.e. acquisition agnostic approach. We show the superiority of our method over standard Bayesian optimisation method through optimisation of several benchmark functions and hyperparameter tuning of two algorithms: Support Vector Machine (SVM) and Random forest.

acquisition function, bayesian optimisation, optimisation, (14 more...)

doi: 10.1016/j.knosys.2020.105663

2003.1225

Country:

Oceania > Australia (0.14)
North America > United States > North Carolina (0.04)
Asia (0.04)

Genre: Research Report (0.64)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.55)

Mathur, Akhil, Isopoussu, Anton, Kawsar, Fahim, Berthouze, Nadia, Lane, Nicholas D.

Mic2Mic: Using Cycle-Consistent Generative Adversarial Networks to Overcome Microphone Variability in Speech Systems

Mobile and embedded devices are increasingly using microphones and audio-based computational models to infer user context. A major challenge in building systems that combine audio models with commodity microphones is to guarantee their accuracy and robustness in the real-world. Besides many environmental dynamics, a primary factor that impacts the robustness of audio models is microphone variability. In this work, we propose Mic2Mic -- a machine-learned system component -- which resides in the inference pipeline of audio models and at real-time reduces the variability in audio data caused by microphone-specific factors. Two key considerations for the design of Mic2Mic were: a) to decouple the problem of microphone variability from the audio task, and b) put a minimal burden on end-users to provide training data. With these in mind, we apply the principles of cycle-consistent generative adversarial networks (CycleGANs) to learn Mic2Mic using unlabeled and unpaired data collected from different microphones. Our experiments show that Mic2Mic can recover between 66% to 89% of the accuracy lost due to microphone variability for two common audio tasks.

mic2mic, microphone, variability, (15 more...)

doi: 10.1145/3302506.3310398

2003.12425

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Omidvar, Naeimeh, Maddah-Ali, Mohammad Ali, Mahdavi, Hamed

A Hybrid-Order Distributed SGD Method for Non-Convex Optimization to Balance Communication Overhead, Computational Complexity, and Convergence Rate

In this paper, we propose a method of distributed stochastic gradient descent (SGD), with low communication load and computational complexity, and still fast convergence. To reduce the communication load, at each iteration of the algorithm, the worker nodes calculate and communicate some scalers, that are the directional derivatives of the sample functions in some \emph{pre-shared directions}. However, to maintain accuracy, after every specific number of iterations, they communicate the vectors of stochastic gradients. To reduce the computational complexity in each iteration, the worker nodes approximate the directional derivatives with zeroth-order stochastic gradient estimation, by performing just two function evaluations rather than computing a first-order gradient vector. The proposed method highly improves the convergence rate of the zeroth-order methods, guaranteeing order-wise faster convergence. Moreover, compared to the famous communication-efficient methods of model averaging (that perform local model updates and periodic communication of the gradients to synchronize the local models), we prove that for the general class of non-convex stochastic problems and with reasonable choice of parameters, the proposed method guarantees the same orders of communication load and convergence rate, while having order-wise less computational complexity. Experimental results on various learning problems in neural networks applications demonstrate the effectiveness of the proposed approach compared to various state-of-the-art distributed SGD methods.

algorithm, balance communication overhead, iteration, (9 more...)

2003.12423

Country:

South America > Paraguay > Asunción > Asunción (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)

Genre: Research Report (0.82)

Industry:

Education (0.48)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Ravichandran, Naresh Balaji, Lansner, Anders, Herman, Pawel

Learning representations in Bayesian Confidence Propagation neural networks

Unsupervised learning of hierarchical representations has been one of the most vibrant research directions in deep learning during recent years. In this work we study biologically inspired unsupervised strategies in neural networks based on local Hebbian learning. We propose new mechanisms to extend the Bayesian Confidence Propagating Neural Network (BCPNN) architecture, and demonstrate their capability for unsupervised learning of salient hidden representations when tested on the MNIST dataset.

entropy, mcs, representation, (17 more...)

2003.12415

Country:

Europe > Sweden > Stockholm > Stockholm (0.05)
North America > United States (0.04)

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Kallus, Nathan, Mao, Xiaojie

On the role of surrogates in the efficient estimation of treatment effects with limited outcome data

We study the problem of estimating treatment effects when the outcome of primary interest (e.g., long-term health status) is only seldom observed but abundant surrogate observations (e.g., short-term health outcomes) are available. To investigate the role of surrogates in this setting, we derive the semiparametric efficiency lower bounds of average treatment effect (ATE) both with and without presence of surrogates, as well as several intermediary settings. These bounds characterize the best-possible precision of ATE estimation in each case, and their difference quantifies the efficiency gains from optimally leveraging the surrogates in terms of key problem characteristics when only limited outcome data are available. We show these results apply in two important regimes: when the number of surrogate observations is comparable to primary-outcome observations and when the former dominates the latter. Importantly, we take a missing-data approach that circumvents strong surrogate conditions which are commonly assumed in previous literature but almost always fail in practice. To show how to leverage the efficiency gains of surrogate observations, we propose ATE estimators and inferential methods based on flexible machine learning methods to estimate nuisance parameters that appear in the influence functions. We show our estimators enjoy efficiency and robustness guarantees under weak conditions.

efficiency lower bound, estimator, primary outcome, (15 more...)

2003.12408

Country:

North America > United States (0.92)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area (0.92)
Government > Regional Government > North America Government > United States Government > FDA (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

Kubo, Yusuke, Komori, Yuto, Okuyama, Toyonobu, Tokieda, Hiroshi

A copula-based visualization technique for a neural network

Interpretability of machine learning is defined as the extent to which humans can comprehend the reason of a decision. However, a neural network is not considered interpretable due to the ambiguity in its decision-making process. Therefore, in this study, we propose a new algorithm that reveals which feature values the trained neural network considers important and which paths are mainly traced in the process of decision-making. In the proposed algorithm, the score estimated by the correlation coefficients between the neural network layers that can be calculated by applying the concept of a pair copula was defined. We compared the estimated score with the feature importance values of Random Forest, which is sometimes regarded as a highly interpretable algorithm, in the experiment and confirmed that the results were consistent with each other. This algorithm suggests an approach for compressing a neural network and its parameter tuning because the algorithm identifies the paths that contribute to the classification or prediction results.

algorithm, correlation coefficient, neural network, (15 more...)

2003.12317

Country:

Europe (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Optimization of genomic classifiers for clinical deployment: evaluation of Bayesian optimization for identification of predictive models of acute infection and in-hospital mortality

Mayhew, Michael B., Tran, Elizabeth, Choi, Kirindi, Midic, Uros, Luethy, Roland, Damaraju, Nandita, Buturovic, Ljubomir

Acute infection, if not rapidly and accurately detected, can lead to sepsis, organ failure and even death. Currently, detection of acute infection as well as assessment of a patient's severity of illness are based on imperfect (and often superficial) measures of patient physiology. Characterization of a patient's immune response by quantifying expression levels of key genes from blood represents a potentially more timely and precise means of accomplishing both tasks. Machine learning methods provide a platform for development of deployment-ready classification models robust to the smaller, more heterogeneous datasets typical of healthcare. Identification of promising classifiers is dependent, in part, on hyperparameter optimization (HO), for which a number of approaches including grid search, random sampling and Bayesian optimization have been shown to be effective. In this analysis, we compare HO approaches for the development of diagnostic classifiers of acute infection and in-hospital mortality from gene expression of 29 diagnostic markers. Our comprehensive analysis of a multi-study patient cohort evaluates HO for three different classifier types and over a range of different optimization settings. Consistent with previous research, we find that Bayesian optimization is more efficient than grid search or random sampling-based methods, identifying promising classifiers with fewer evaluated hyperparameter configurations. However, we also find evidence of a lack of correspondence between internal and external validation performance of selected classifiers that complicates model selection for deployment as well as stymies development of clear-cut, practical guidelines for HO application in healthcare. We highlight the need for additional considerations about patient heterogeneity, dataset partitioning and optimization setup when applying HO methods in the healthcare context.

classifier, configuration, optimization, (12 more...)

2003.1231

Country:

North America > United States > California > San Mateo County > Burlingame (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Florida > Broward County > Fort Lauderdale (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.94)

Industry: Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

LIMP: Learning Latent Shape Representations with Metric Preservation Priors

Cosmo, Luca, Norelli, Antonio, Halimi, Oshri, Kimmel, Ron, Rodolà, Emanuele

In this paper, we advocate the adoption of metric preservation as a powerful prior for learning latent representations of deformable 3D shapes. Key to our construction is the introduction of a geometric distortion criterion, defined directly on the decoded shapes, translating the preservation of the metric on the decoding to the formation of linear paths in the underlying latent space. Our rationale lies in the observation that training samples alone are often insufficient to endow generative models with high fidelity, motivating the need for large training datasets. In contrast, metric preservation provides a rigorous way to control the amount of geometric distortion incurring in the construction of the latent space, leading in turn to synthetic samples of higher quality. We further demonstrate, for the first time, the adoption of differentiable intrinsic distances in the backpropagation of a geodesic loss. Our geometric priors are particularly relevant in the presence of scarce training data, where learning any meaningful latent structure can be especially challenging. The effectiveness and potential of our generative model is showcased in applications of style transfer, content generation, and shape completion.

deformation, generative model, latent space, (17 more...)

2003.12283

Country:

Asia > Middle East > Israel (0.04)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)