AITopics

1911.02982

Country:

Europe > Germany > Bremen > Bremen (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.87)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

arXiv.org Machine LearningNov-6-2019

Uncertainty relations and fluctuation theorems for Bayes nets

Wolpert, David H.

The pioneering paper [Ito and Sagawa, 2013] analyzed the non-equilibrium statistical physics of a set of multiple interacting systems, S, whose joint discrete-time evolution is specified by a Bayesian network. The major result of [Ito and Sagawa, 2013] was an integral fluctuation theorem (IFT) governing the sum of two quantities: the entropy production (EP) of an arbitrary single v in S, and the transfer entropy from v to the other systems. Here I extend the analysis in [Ito and Sagawa, 2013]. I derive several detailed fluctuation theorems (DFTs), concerning arbitrary subsets of all the systems (including the full set). I also derive several associated IFTs, concerning an arbitrary subset of the systems, thereby extending the IFT in [Ito and Sagawa, 2013]. In addition I derive "conditional" DFTs and IFTs, involving conditional probability distributions rather than (as in conventional fluctuation theorems) unconditioned distributions. I then derive thermodynamic uncertainty relations relating the total EP of the Bayes net to the set of all the precisions of probability currents within the individual systems. I end with an example of that uncertainty relation.

fluctuation theorem, solitary process, subsystem, (15 more...)

1911.027

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)
North America > United States > Arizona > Maricopa County > Tempe (0.04)
Europe > Austria > Vienna (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

arXiv.org Artificial IntelligenceNov-6-2019

Probabilistic Similarity Networks

Heckerman, David

Normative expert systems have not become commonplace because they have been difficult to build and use. Over the past decade, however, researchers have developed the influence diagram, a graphical representation of a decision maker's beliefs, alternatives, and preferences that serves as the knowledge base of a normative expert system. Most people who have seen the representation find it intuitive and easy to use. Consequently, the influence diagram has overcome significantly the barriers to constructing normative expert systems. Nevertheless, building influence diagrams is not practical for extremely large and complex domains. In this book, I address the difficulties associated with the construction of the probabilistic portion of an influence diagram, called a knowledge map, belief network, or Bayesian network. I introduce two representations that facilitate the generation of large knowledge maps. In particular, I introduce the similarity network, a tool for building the network structure of a knowledge map, and the partition, a tool for assessing the probabilities associated with a knowledge map. I then use these representations to build Pathfinder, a large normative expert system for the diagnosis of lymph-node diseases (the domain contains over 60 diseases and over 100 disease findings). In an early version of the system, I encoded the knowledge of the expert using an erroneous assumption that all disease findings were independent, given each disease. When the expert and I attempted to build a more accurate knowledge map for the domain that would capture the dependencies among the disease findings, we failed. Using a similarity network, however, we built the knowledge-map structure for the entire domain in approximately 40 hours. Furthermore, the partition representation reduced the number of probability assessments required by the expert from 75,000 to 14,000.

medical computer science group, ompr ehensive similarity network, ordinary global kno wledge map, (12 more...)

arXiv.org Artificial Intelligence

1911.06263

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > Minnesota (0.04)
(12 more...)

Genre: Research Report (0.81)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

arXiv.org Artificial IntelligenceNov-6-2019

Coverage-based Outlier Explanation

Wu, Yue, Akoglu, Leman, Davidson, Ian

Outlier detection is a core task in data mining with a plethora of algorithms that have enjoyed wide scale usage. Existing algorithms are primarily focused on detection, that is the identification of outliers in a given dataset. In this paper we explore the relatively under-studied problem of the outlier explanation problem. Our goal is, given a dataset that is already divided into outliers and normal instances, explain what characterizes the outliers. We explore the novel direction of a semantic explanation that a domain expert or policy maker is able to understand. We formulate this as an optimization problem to find explanations that are both interpretable and pure. Through experiments on real-world data sets, we quantitatively show that our method can efficiently generate better explanations compared with rule-based learners.

algorithm, outlier, outlier group, (12 more...)

arXiv.org Artificial Intelligence

1911.02617

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > Yolo County > Davis (0.04)

Genre: Research Report (0.82)

Industry:

Health & Medicine (0.69)
Automobiles & Trucks (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.49)
(2 more...)

Computational Separations between Sampling and Optimization

Talwar, Kunal

Two commonly arising computational tasks in Bayesian learning are Optimization (Maximum A Posteriori estimation) and Sampling (from the posterior distribution). In the convex case these two problems are efficiently reducible to each other. Recent work (Ma et al. 2019) shows that in the non-convex case, sampling can sometimes be provably faster. We present a simpler and stronger separation. We then compare sampling and optimization in more detail and show that they are provably incomparable: there are families of continuous functions for which optimization is easy but sampling is NP-hard, and vice versa. Further, we show function families that exhibit a sharp phase transition in the computational complexity of sampling, as one varies the natural temperature parameter. Our results draw on a connection to analogous separations in the discrete setting which are well-studied.

algorithm, exp, optimization, (15 more...)

1911.02074

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
North America > United States > District of Columbia > Washington (0.04)
(3 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Sulimov, Pavel, Sukmanova, Elena, Chereshnev, Roman, Kertesz-Farkas, Attila

Guided Layer-wise Learning for Deep Models using Side Information

Training of deep models for classification tasks is hindered by local minima problems and vanishing gradients, while unsupervised layer-wise pretraining does not exploit information from class labels. Here, we propose a new regularization technique, called diversifying regularization (DR), which applies a penalty on hidden units at any layer if they obtain similar features for different types of data. For generative models, DR is defined as divergence over the variational posteriori distributions and included in the maximum likelihood estimation as a prior. Thus, DR includes class label information for greedy pretraining of deep belief networks which result in a better weight initialization for fine-tuning methods. On the other hand, for discriminative training of deep neural networks, DR is defined as a distance over the features and included in the learning objective. With our experimental tests, we show that DR can help the backpropagation to cope with vanishing gradient problems and to provide faster convergence and smaller generalization errors.

generative model, information, regularization, (15 more...)

1911.02048

Country:

Asia > Middle East > Jordan (0.05)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Asia > Russia (0.04)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Berkovich, Pavel, Perim, Eric, Bruinsma, Wessel

GP-ALPS: Automatic Latent Process Selection for Multi-Output Gaussian Process Models

Wessel Bruinsma ‡ wpb23@cam.ac.uk 1. Introduction A principled approach to prediction tasks is to choose a statistical model that explains the data. The choice of the model class is crucial and has to observe the bias-variance tradeoff, which motivates the need for principled approaches to selecting the best model class from a set of options. Whilst model selection can be done manually by trial and error, the process tends to consume considerable time and resources and be prone to human biases. Bayesian model selection (MacKay, 1992; Rasmussen and Ghahramani, 2001), treats the model class as a random variable and computes its posterior distribution. It offers a built-in complexity regulariser, commonly known as Bayesian Occams razor, which penalises models whose complexity is excessive or too modest.

automatic latent process selection, gp-alp, posterior, (11 more...)

1911.01929

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.40)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)

Morales-Álvarez, Pablo, Ruiz, Pablo, Coughlin, Scott, Molina, Rafael, Katsaggelos, Aggelos K.

Scalable Variational Gaussian Processes for Crowdsourcing: Glitch Detection in LIGO

In the last years, crowdsourcing is transforming the way classification training sets are obtained. Instead of relying on a single expert annotator, crowdsourcing shares the labelling effort among a large number of collaborators. For instance, this is being applied to the data acquired by the laureate Laser Interferometer Gravitational Waves Observatory (LIGO), in order to detect glitches which might hinder the identification of true gravitational-waves. The crowdsourcing scenario poses new challenging difficulties, as it deals with different opinions from a heterogeneous group of annotators with unknown degrees of expertise. Probabilistic methods, such as Gaussian Processes (GP), have proven successful in modeling this setting. However, GPs do not scale well to large data sets, which hampers their broad adoption in real practice (in particular at LIGO). This has led to the recent introduction of deep learning based crowdsourcing methods, which have become the state-of-the-art. However, the accurate uncertainty quantification of GPs has been partially sacrificed. This is an important aspect for astrophysicists in LIGO, since a glitch detection system should provide very accurate probability distributions of its predictions. In this work, we leverage the most popular sparse GP approximation to develop a novel GP based crowdsourcing method that factorizes into mini-batches. This makes it able to cope with previously-prohibitive data sets. The approach, which we refer to as Scalable Variational Gaussian Processes for Crowdsourcing (SVGPCR), brings back GP-based methods to the state-of-the-art, and excels at uncertainty quantification. SVGPCR is shown to outperform deep learning based methods and previous probabilistic approaches when applied to the LIGO data. Moreover, its behavior and main properties are carefully analyzed in a controlled experiment based on the MNIST data set.

annotator, svgpcr, university, (17 more...)

1911.01915

Country:

North America > United States (0.46)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
Europe > Spain > Andalusia > Granada Province > Granada (0.04)
(3 more...)

Genre:

Research Report > Experimental Study (0.86)
Research Report > Strength High (0.68)

Industry:

Energy (0.68)
Government > Regional Government (0.46)
Education > Educational Setting (0.46)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Moustakides, George V., Basioti, Kalliopi

Training Neural Networks for Likelihood/Density Ratio Estimation

V arious problems in Engineering and Statistics require the computation of the likelihood ratio function of two probability densities. In classical approaches the two densities are assumed known or to belong to some known parametric family. In a data-driven version we replace this requirement with the availability of data sampled from the densities of interest. For most well known problems in Detection and Hypothesis testing we develop solutions by providing neural network based estimates of the likelihood ratio or its transformations. This task necessitates the definition of proper optimizations which can be used for the training of the network. The main purpose of this work is to offer a simple and unified methodology for defining such optimization problems with guarantees that the solution is indeed the desired function. Our results are extended to cover estimates for likelihood ratios of conditional densities and estimates for statistics encountered in local approaches. HE likelihood ratio of two probability densities is a function that appears in a variety of problems in Engineering and Statistics. Characteristic examples [1], [2] constitute Hypothesis testing, Signal detection, Sequential hypothesis testing, Sequential detection of changes, etc. Many of these problems also use the likelihood ratio under a transformed form with the most frequent example being the log-likelihood ratio. In all these problems the main assumption is that the corresponding probability densities are available under some functional form. What we aim in this work is to replace this requirement with the availability of data sampled from each of the densities of interest. As we mentioned, the computation of the likelihood ratio function relies on the knowledge of the probability densities which, for the majority of applications, is an unrealistic assumption. One can instead propose parametric families of densities and, with the help of available data, estimate the parameters and form the likelihood ratio function. However, with the advent of Data Science and Deep Learning there is a phenomenal increase in need for processing data coming from images, videos etc. For most of these cases it is very difficult to propose any meaningful parametric family of densities that could reliably describe their statistical behavior. Therefore, these techniques tend to be unsuitable for most of these datasets. If parametric families cannot be employed one can always resort to nonparametric density estimation [3] and then form the likelihood ratio. These approaches are purely data-driven but require two different approximations, namely one for each density.

estimation, likelihood ratio, neural network, (15 more...)

1911.00405

Country:

North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
Europe > Russia (0.04)
Europe > Portugal > Braga > Braga (0.04)
(4 more...)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)

arXiv.org Machine LearningNov-4-2019

Scalable Deep Generative Relational Models with High-Order Node Dependence

Fan, Xuhui, Li, Bin, Sisson, Scott Anthony, Li, Caoyuan, Chen, Ling

We propose a probabilistic framework for modelling and exploring the latent structure of relational data. Given feature information for the nodes in a network, the scalable deep generative relational model (SDREM) builds a deep network architecture that can approximate potential nonlinear mappings between nodes' feature information and the nodes' latent representations. Our contribution is two-fold: (1) We incorporate high-order neighbourhood structure information to generate the latent representations at each node, which vary smoothly over the network. (2) Due to the Dirichlet random variable structure of the latent representations, we introduce a novel data augmentation trick which permits efficient Gibbs sampling. The SDREM can be used for large sparse networks as its computational cost scales with the number of positive links. We demonstrate its competitive performance through improved link prediction performance on a range of real-world datasets.

artificial intelligence, bayesian inference, machine learning, (16 more...)

1911.01535

Country:

Asia > China > Shanghai > Shanghai (0.04)
Oceania > Australia > New South Wales (0.04)
North America > Canada (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Communications > Networks (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)