AITopics | Directed Networks

Collaborating Authors

Directed Networks

News Overviews Instructional Materials AI-Alerts Classics

[Discussion] How Gaussian naïve Bayes forms a non-linear decision boundary? • /r/MachineLearning

@machinelearnbotOct-24-2016, 03:15:23 GMT

Discussion[Discussion] How Gaussian naïve Bayes forms a non-linear decision boundary? Also, please explain decision boundary for decision trees. If the two gaussians are non isotropic you can derive that the bound is quadratic/elliptic curve.

artificial intelligence, machine learning, non-linear decision boundary, (3 more...)

@machinelearnbot

Industry: Media > News (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.80)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.80)

Add feedback

Fast $\epsilon$-free Inference of Simulation Models with Bayesian Conditional Density Estimation

Papamakarios, George, Murray, Iain

arXiv.org Machine LearningOct-24-2016

Many statistical models can be simulated forwards but have intractable likelihoods. Approximate Bayesian Computation (ABC) methods are used to infer properties of these models from data. Traditionally these methods approximate the posterior over parameters by conditioning on data being inside an $\epsilon$-ball around the observed data, which is only correct in the limit $\epsilon\!\rightarrow\!0$. Monte Carlo methods can then draw samples from the approximate posterior to approximate predictions or error bars on parameters. These algorithms critically slow down as $\epsilon\!\rightarrow\!0$, and in practice draw samples from a broader distribution than the posterior. We propose a new approach to likelihood-free inference based on Bayesian conditional density estimation. Preliminary inferences based on limited simulation data are used to guide later simulations. In some cases, learning an accurate parametric representation of the entire true posterior distribution requires fewer model simulations than Monte Carlo ABC methods need to produce a single sample from an approximate posterior.

artificial intelligence, machine learning, posterior, (15 more...)

arXiv.org Machine Learning

1605.06376

Genre: Research Report (0.64)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

A Bayesian Ensemble for Unsupervised Anomaly Detection

Yu, Edward, Parekh, Parth

arXiv.org Machine LearningOct-24-2016

Methods for unsupervised anomaly detection suffer from the fact that the data is unlabeled, making it difficult to assess the optimality of detection algorithms. Ensemble learning has shown exceptional results in classification and clustering problems, but has not seen as much research in the context of outlier detection. Existing methods focus on combining output scores of individual detectors, but this leads to outputs that are not easily interpretable. In this paper, we introduce a theoretical foundation for combining individual detectors with Bayesian classifier combination. Not only are posterior distributions easily interpreted as the probability distribution of anomalies, but bias, variance, and individual error rates of detectors are all easily obtained. Performance on real-world datasets shows high accuracy across varied types of time series data.

data mining, detector, machine learning, (19 more...)

arXiv.org Machine Learning

1610.07677

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
(2 more...)

Add feedback

Simpler PAC-Bayesian Bounds for Hostile Data

Alquier, Pierre, Guedj, Benjamin

arXiv.org Machine LearningOct-23-2016

Learning theory can be traced back to the late 60s and has attracted a great attention since. We refer to the monographs Devroye et al. (1996) and Vapnik (2000) for a survey. Most of the literature addresses the simplified case of i.i.d observations coupled with bounded loss functions. Many bounds on the excess risk holding with large probability were provided - these bounds are refered to as PAC learning bounds since Valiant (1984). In the late 90s, the PAC-Bayesian approach has been pioneered by Shawe-Taylor and Williamson (1997) and McAllester (1998, 1999). It consists in producing PAC bounds for a specific class of Bayesian-flavored estimators. Similarly to classical PAC results, most PAC-Bayesian bounds have been obtained with bounded loss functions (see Catoni, 2007, for some of the most accurate results). Note that Catoni (2004) provides bounds for unbouded loss, but still under very strong exponential moments assumptions. These assumptions were essentially not improved in the most recent works Guedj and Alquier (2013) and Bégin et al. (2016).

artificial intelligence, assumption, machine learning, (18 more...)

arXiv.org Machine Learning

1610.07193

Country: North America > United States (0.19)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)

Add feedback

Boltzmann-Machine Learning of Prior Distributions of Binarized Natural Images

Obuchi, Tomoyuki, Koma, Hirokazu, Yasuda, Muneki

arXiv.org Machine LearningOct-23-2016

Prior distributions of binarized natural images are learned by using a Boltzmann machine. According the results of this study, there emerges a structure with two sublattices in the interactions, and the nearest-neighbor and next-nearest-neighbor interactions correspondingly take two discriminative values, which reflects the individual characteristics of the three sets of pictures that we process. Meanwhile, in a longer spatial scale, a longer-range, although still rapidly decaying, ferromagnetic interaction commonly appears in all cases. The characteristic length scale of the interactions is universally up to approximately four lattice spacings $\xi \approx 4$. These results are derived by using the mean-field method, which effectively reduces the computational time required in a Boltzmann machine. An improved mean-field method called the Bethe approximation also gives the same results, as well as the Monte Carlo method does for small size images. These reinforce the validity of our analysis and findings. Relations to criticality, frustration, and simple-cell receptive fields are also discussed.

artificial intelligence, interaction, machine learning, (16 more...)

arXiv.org Machine Learning

doi: 10.7566/JPSJ.85.114803

1412.7012

Country: Asia > Japan (0.28)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

Formulas for Counting the Sizes of Markov Equivalence Classes of Directed Acyclic Graphs

He, Yangbo, Yu, Bin

arXiv.org Machine LearningOct-23-2016

The sizes of Markov equivalence classes of directed acyclic graphs play important roles in measuring the uncertainty and complexity in causal learning. A Markov equivalence class can be represented by an essential graph and its undirected subgraphs determine the size of the class. In this paper, we develop a method to derive the formulas for counting the sizes of Markov equivalence classes. We first introduce a new concept of core graph. The size of a Markov equivalence class of interest is a polynomial of the number of vertices given its core graph. Then, we discuss the recursive and explicit formula of the polynomial, and provide an algorithm to derive the size formula via symbolic computation for any given core graph. The proposed size formula derivation sheds light on the relationships between the size of a Markov equivalence class and its representation graph, and makes size counting efficient, even when the essential graphs contain non-sparse undirected subgraphs.

artificial intelligence, graph, machine learning, (14 more...)

arXiv.org Machine Learning

1610.07921

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Add feedback

Stochastic inference with spiking neurons in the high-conductance state

Petrovici, Mihai A., Bill, Johannes, Bytschok, Ilja, Schemmel, Johannes, Meier, Karlheinz

arXiv.org Machine LearningOct-23-2016

The highly variable dynamics of neocortical circuits observed in vivo have been hypothesized to represent a signature of ongoing stochastic inference but stand in apparent contrast to the deterministic response of neurons measured in vitro. Based on a propagation of the membrane autocorrelation across spike bursts, we provide an analytical derivation of the neural activation function that holds for a large parameter space, including the high-conductance state. On this basis, we show how an ensemble of leaky integrate-and-fire neurons with conductance-based synapses embedded in a spiking environment can attain the correct firing statistics for sampling from a well-defined target distribution. For recurrent networks, we examine convergence toward stationarity in computer simulations and demonstrate sample-based Bayesian inference in a mixed graphical model. This points to a new computational role of high-conductance states and establishes a rigorous link between deterministic neuron models and functional stochastic dynamics on the network level.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

doi: 10.1103/PhysRevE.94.042312

1610.07161

Country: Europe (0.28)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Dictionary Learning Strategies for Compressed Fiber Sensing Using a Probabilistic Sparse Model

Weiss, Christian, Zoubir, Abdelhak M.

arXiv.org Machine LearningOct-21-2016

We present a sparse estimation and dictionary learning framework for compressed fiber sensing based on a probabilistic hierarchical sparse model. To handle severe dictionary coherence, selective shrinkage is achieved using a Weibull prior, which can be related to non-convex optimization with $p$-norm constraints for $0 < p < 1$. In addition, we leverage the specific dictionary structure to promote collective shrinkage based on a local similarity model. This is incorporated in form of a kernel function in the joint prior density of the sparse coefficients, thereby establishing a Markov random field-relation. Approximate inference is accomplished using a hybrid technique that combines Hamilton Monte Carlo and Gibbs sampling. To estimate the dictionary parameter, we pursue two strategies, relying on either a deterministic or a probabilistic model for the dictionary parameter. In the first strategy, the parameter is estimated based on alternating estimation. In the second strategy, it is jointly estimated along with the sparse coefficients. The performance is evaluated in comparison to an existing method in various scenarios using simulations and experimental data.

artificial intelligence, bayesian inference, machine learning, (15 more...)

arXiv.org Machine Learning

1610.06902

Country:

Asia > Japan (0.28)
Europe > Germany (0.28)
North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Robust training on approximated minimal-entropy set

Xie, Tianpei, Narabadi, Nasser. M., Hero, Alfred O.

arXiv.org Machine LearningOct-21-2016

Large margin classifiers, such as the support vector machine (SVM) [1] and the maximum entropy discrimination (MED) classifier [2], have enjoyed great popularity in the signal processing and machine learning communities due to their broad applicability, robust performance, and the availability of fast software implementations. When the training data is representative of the test data, the performance of MED/SVM has theoretical guarantees that have been validated in practice [1], [3], [4]. Moreover, since the decision boundary of the MED/SVM is solely defined by a few support vectors, the algorithm can tolerate random feature distortions and perturbations. However, in many real applications, anomalous measurements are inherent to the data set due to strong environmental noise or possible sensor failures. Such anomalies arise in industrial process monitoring, video surveillance, tactical multimodal sensing, robust spectrum sensing [5], [6], and, more generally, any application that involves unattended sensors in difficult environments (Figure 1).

artificial intelligence, gemmed, machine learning, (18 more...)

arXiv.org Machine Learning

1610.06806

Country: North America > United States > Michigan (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators

Chen, Changyou, Ding, Nan, Carin, Lawrence

arXiv.org Machine LearningOct-21-2016

Recent advances in Bayesian learning with large-scale data have witnessed emergence of stochastic gradient MCMC algorithms (SG-MCMC), such as stochastic gradient Langevin dynamics (SGLD), stochastic gradient Hamiltonian MCMC (SGHMC), and the stochastic gradient thermostat. While finite-time convergence properties of the SGLD with a 1st-order Euler integrator have recently been studied, corresponding theory for general SG-MCMCs has not been explored. In this paper we consider general SG-MCMCs with high-order integrators, and develop theory to analyze finite-time convergence properties and their asymptotic invariant measures. Our theoretical results show faster convergence rates and more accurate invariant measures for SG-MCMCs with higher-order integrators. For example, with the proposed efficient 2nd-order symmetric splitting integrator, the {\em mean square error} (MSE) of the posterior average for the SGHMC achieves an optimal convergence rate of $L^{-4/5}$ at $L$ iterations, compared to $L^{-2/3}$ for the SGHMC and SGLD with 1st-order Euler integrators. Furthermore, convergence results of decreasing-step-size SG-MCMCs are also developed, with the same convergence rates as their fixed-step-size counterparts for a specific decreasing sequence. Experiments on both synthetic and real datasets verify our theory, and show advantages of the proposed method in two large-scale real applications.

artificial intelligence, integrator, machine learning, (17 more...)

arXiv.org Machine Learning

1610.06665

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback