AITopics | Directed Networks

Collaborating Authors

Directed Networks

News Overviews Instructional Materials AI-Alerts Classics

Coresets for Dependency Networks

Molina, Alejandro, Munteanu, Alexander, Kersting, Kristian

arXiv.org Machine LearningOct-16-2017

But how can we train graphical models on a massive data set? In this paper, we show how to construct coresets--compressed data sets which can be used as proxy for the original data and have provably bounded worst case error--for Gaussian dependency networks (DNs), i.e., cyclic directed graphical models over Gaussians, where the parents of each variable are its Markov blanket. Specifically, we prove that Gaussian DNs admit coresets of size independent of the size of the data set. Unfortunately, this does not extend to DNs over members of the exponential family in general. As we will prove, Poisson DNs do not admit small coresets. Despite this worst-case result, we will provide an argument why our coreset construction for DNs can still work well in practice on count data. To corroborate our theoretical results, we empirically evaluated the resulting Core DNs on real data sets. The results demonstrate significant gains over no or naive sub-sampling, even in the case of count data.

artificial intelligence, belief revision, machine learning, (14 more...)

arXiv.org Machine Learning

1710.03285

Country: Europe > Germany (0.47)

Genre: Research Report (0.84)

Industry:

Health & Medicine (0.46)
Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)
(2 more...)

Add feedback

Causal Inference on Multivariate and Mixed-Type Data

Marx, Alexander, Vreeken, Jilles

arXiv.org Machine LearningOct-16-2017

Given data over the joint distribution of two random variables $X$ and $Y$, we consider the problem of inferring the most likely causal direction between $X$ and $Y$. In particular, we consider the general case where both $X$ and $Y$ may be univariate or multivariate, and of the same or mixed data types. We take an information theoretic approach, based on Kolmogorov complexity, from which it follows that first describing the data over cause and then that of effect given cause is shorter than the reverse direction. The ideal score is not computable, but can be approximated through the Minimum Description Length (MDL) principle. Based on MDL, we propose two scores, one for when both $X$ and $Y$ are of the same single data type, and one for when they are mixed-type. We model dependencies between $X$ and $Y$ using classification and regression trees. As inferring the optimal model is NP-hard, we propose Crack, a fast greedy algorithm to determine the most likely causal direction directly from the data. Empirical evaluation on a wide range of data shows that Crack reliably, and with high accuracy, infers the correct causal direction on both univariate and multivariate cause-effect pairs over both single and mixed-type data.

artificial intelligence, dependency, machine learning, (16 more...)

arXiv.org Machine Learning

1702.06385

Country: Europe > Germany (0.68)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control

Jaques, Natasha, Gu, Shixiang, Bahdanau, Dzmitry, Hernández-Lobato, José Miguel, Turner, Richard E., Eck, Douglas

arXiv.org Artificial IntelligenceOct-16-2017

This paper proposes a general method for improving the structure and quality of sequences generated by a recurrent neural network (RNN), while maintaining information originally learned from data, as well as sample diversity. An RNN is first pre-trained on data using maximum likelihood estimation (MLE), and the probability distribution over the next token in the sequence learned by this model is treated as a prior policy. Another RNN is then trained using reinforcement learning (RL) to generate higher-quality outputs that account for domain-specific incentives while retaining proximity to the prior policy of the MLE RNN. To formalize this objective, we derive novel off-policy RL methods for RNNs from KL-control. The effectiveness of the approach is demonstrated on two applications; 1) generating novel musical melodies, and 2) computational molecular generation. For both problems, we show that the proposed method improves the desired properties and structure of the generated sequences, while maintaining information learned from data.

machine learning, reinforcement learning, sequence tutor, (16 more...)

arXiv.org Artificial Intelligence

1611.02796

Country: North America > United States (0.93)

Genre: Research Report > New Finding (0.68)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)

Add feedback

Bayesian Decision Theory Made Ridiculously Simple · Statistics @ Home

@machinelearnbotOct-15-2017, 14:30:08 GMT

So how do we determine the "best" decision? This requires that we first define some notion of what we want (what are we trying to do?). The formal object that we use to do this goes by many names depending on the field: I will refer to it as a Loss function ($\mathcal{L}$) but the same general concept may be alternatively called a cost function, a utility function, an acquisition function, or any number of different things. The crucial idea is that this is a function that allows us to quantify how bad/good a given decision ($a$) is given some information ($\theta$). What does it mean to quantify?

artificial intelligence, decision support system, machine learning, (10 more...)

@machinelearnbot

Technology:

Information Technology > Game Theory (0.77)
Information Technology > Decision Support Systems (0.77)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.40)

Add feedback

Emergence of Invariance and Disentangling in Deep Representations

Achille, Alessandro, Soatto, Stefano

arXiv.org Machine LearningOct-15-2017

Using established principles from Information Theory and Statistics, we show that in a deep neural network invariance to nuisance factors is equivalent to information minimality of the learned representation, and that stacking layers and injecting noise during training naturally bias the network towards learning invariant representations. We then show that, in order to avoid memorization, we need to limit the quantity of information stored in the weights, which leads to a novel usage of the Information Bottleneck Lagrangian on the weights as a learning criterion. This also has an alternative interpretation as minimizing a PAC-Bayesian bound on the test error. Finally, we exploit a duality between weights and activations induced by the architecture, to show that the information in the weights bounds the minimality and Total Correlation of the layers, therefore showing that regularizing the weights explicitly or implicitly, using SGD, not only helps avoid overfitting, but also fosters invariance and disentangling of the learned representation. The theory also enables predicting sharp phase transitions between underfitting and overfitting random labels at precise information values, and sheds light on the relation between the geometry of the loss function, in particular so-called "flat minima," and generalization.

artificial intelligence, machine learning, representation, (16 more...)

arXiv.org Machine Learning

1706.0135

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

CarsonScott/CALM

#artificialintelligenceOct-14-2017, 03:45:35 GMT

The following is an unsupervised machine-learning algorithm that recognizes and predicts input patterns in real-time. The algorithm is essentially a system of information-processing inspired by the cognitive functions of the mind. The system has a collection of interacting memory systems akin to the spychological concepts of short-term and long-term memory, which allows it to learn by observation and adapt based on the frequency of patterns and their relationships through time. An association value is a representation of conditional probability which is calculated between the current observation and the rest of the elements in the buffer, i.e. the previous N observations, then stored in the association matrix. Each elements current place in the buffer determines the strength of the delta applied to its association value, so more recent observation are more strongly associated with the current observations than those further in the buffer.

artificial intelligence, association value, machine learning, (9 more...)

#artificialintelligence

Industry: Health & Medicine > Consumer Health (0.38)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.38)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.38)

Add feedback

Learning Infinite RBMs with Frank-Wolfe

Ping, Wei, Liu, Qiang, Ihler, Alexander

arXiv.org Machine LearningOct-14-2017

In this work, we propose an infinite restricted Boltzmann machine (RBM), whose maximum likelihood estimation (MLE) corresponds to a constrained convex optimization. We consider the Frank-Wolfe algorithm to solve the program, which provides a sparse solution that can be interpreted as inserting a hidden unit at each iteration, so that the optimization process takes the form of a sequence of finite models of increasing complexity. As a side benefit, this can be used to easily and efficiently identify an appropriate number of hidden units during the optimization. The resulting model can also be used as an initialization for typical state-of-the-art RBM training algorithms such as contrastive divergence, leading to models with consistently higher test likelihood than random initialization.

artificial intelligence, bayesian inference, machine learning, (20 more...)

arXiv.org Machine Learning

1710.0527

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Industry: Government > Military (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.86)

Add feedback

Chapter 1 : Supervised Learning and Naive Bayes Classification -- Part 1 (Theory)

@machinelearnbotOct-13-2017, 03:41:54 GMT

Well if you guessed it to be Alice you are correct. Perhaps your reasoning would be the content has words love, great and wonderful that are used by Alice. Now let's add a combination and probability in the data we have.Suppose Alice and Bob uses following words with probabilities as show below. Now, can you guess who is the sender for the content: "Wonderful Love." Now what do you think?

artificial intelligence, machine learning, probability, (8 more...)

@machinelearnbot

Genre: Personal > Interview (0.58)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Burn-In Demonstrations for Multi-Modal Imitation Learning

Kuefler, Alex, Kochenderfer, Mykel J.

arXiv.org Machine LearningOct-13-2017

Recent work on imitation learning has generated policies that reproduce expert behavior from multi-modal data. However, past approaches have focused only on recreating a small number of distinct, expert maneuvers, or have relied on supervised learning techniques that produce unstable policies. This work extends InfoGAIL, an algorithm for multi-modal imitation learning, to reproduce behavior over an extended period of time. Our approach involves reformulating the typical imitation learning setting to include "burn-in demonstrations" upon which policies are conditioned at test time. We demonstrate that our approach outperforms standard InfoGAIL in maximizing the mutual information between predicted and unseen style labels in road scene simulations, and we show that our method leads to policies that imitate expert autonomous driving systems over long time horizons.

artificial intelligence, demonstration, machine learning, (13 more...)

arXiv.org Machine Learning

1710.0509

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (0.40)

Industry:

Automobiles & Trucks (1.00)
Transportation > Ground > Road (0.49)
Information Technology > Robotics & Automation (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)

Add feedback

Automated Scalable Bayesian Inference via Hilbert Coresets

Campbell, Trevor, Broderick, Tamara

arXiv.org Machine LearningOct-13-2017

The automation of posterior inference in Bayesian data analysis has enabled experts and nonexperts alike to use more sophisticated models, engage in faster exploratory modeling and analysis, and ensure experimental reproducibility. However, standard automated posterior inference algorithms are not tractable at the scale of massive modern datasets, and modifications to make them so are typically model-specific, require expert tuning, and can break theoretical guarantees on inferential quality. Building on the Bayesian coresets framework, this work instead takes advantage of data redundancy to shrink the dataset itself as a preprocessing step, providing fully-automated, scalable Bayesian inference with theoretical guarantees. We begin with an intuitive reformulation of Bayesian coreset construction as sparse vector sum approximation, and demonstrate that its automation and performance-based shortcomings arise from the use of the supremum norm. To address these shortcomings we develop Hilbert coresets, i.e., Bayesian coresets constructed under a norm induced by an inner-product on the log-likelihood function space. We propose two Hilbert coreset construction algorithms---one based on importance sampling, and one based on the Frank-Wolfe algorithm---along with theoretical guarantees on approximation quality as a function of coreset size. Since the exact computation of the proposed inner-products is model-specific, we automate the construction with a random finite-dimensional projection of the log-likelihood functions. The resulting automated coreset construction algorithm is simple to implement, and experiments on a variety of models with real and synthetic datasets show that it provides high-quality posterior approximations and a significant reduction in the computational cost of inference.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

1710.05053

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback