AITopics

1801.01649

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.48)

Industry:

Energy (1.00)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.65)

Leblond, Rémi, Alayrac, Jean-Baptiste, Osokin, Anton, Lacoste-Julien, Simon

SEARNN: Training RNNs with Global-Local Losses

arXiv.org Machine LearningMar-4-2018

We propose SEARNN, a novel training algorithm for recurrent neural networks (RNNs) inspired by the "learning to search" (L2S) approach to structured prediction. RNNs have been widely successful in structured prediction applications such as machine translation or parsing, and are commonly trained using maximum likelihood estimation (MLE). Unfortunately, this training loss is not always an appropriate surrogate for the test error: by only maximizing the ground truth probability, it fails to exploit the wealth of information offered by structured losses. Further, it introduces discrepancies between training and predicting (such as exposure bias) that may hurt test performance. Instead, SEARNN leverages test-alike search space exploration to introduce global-local losses that are closer to the test error. We first demonstrate improved performance over MLE on two different tasks: OCR and spelling correction. Then, we propose a subsampling strategy to enable SEARNN to scale to large vocabulary sizes. This allows us to validate the benefits of our approach on a machine translation task.

artificial intelligence, ground truth, machine learning, (21 more...)

1706.04499

Country: Europe (0.46)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Wang, Fulton, McCormick, Tyler H., Rudin, Cynthia, Gore, John

Modeling Recovery Curves With Application to Prostatectomy

arXiv.org Machine LearningMar-4-2018

In the medical community, there is a pressing need for personalized predictions of how a disruptive event, such as a treatment or disease, will impact particular bodily function levels. Of particular interest is the extent to which the function is initially perturbed by the event and the ensuing pattern of recovery. In many contexts, such as mental acuity following a stroke or sexual function following prostatectomy, the post-event trajectory generally exhibits what we call a recovery curve shape, characterized by an initial instantaneous drop followed by a monotonic rise towards an asymptotic level not exceeding the original function level. Here, we propose a Bayesian model that can be used to predict a patient's expected recovery curve, given information about the patient that is available before the event. This paper presents a decision aid for patients considering a medical treatment who want to know what adverse side effect the treatment would have on a particular bodily function. In particular, our model will be used to display to the patient a distribution over post-treatment function trajectories, conveying the uncertainty in predictions that should be considered in decision-making.

artificial intelligence, function value, machine learning, (19 more...)

1504.06964

Country: North America > United States (0.68)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Health Care Providers & Services (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Dziugaite, Gintare Karolina, Roy, Daniel M.

Entropy-SGD optimizes the prior of a PAC-Bayes bound: Generalization properties of Entropy-SGD and data-dependent priors

arXiv.org Machine LearningMar-3-2018

We show that Entropy-SGD (Chaudhari et al., 2017), when viewed as a learning algorithm, optimizes a PAC-Bayes bound on the risk of a Gibbs (posterior) classifier, i.e., a randomized classifier obtained by a risk-sensitive perturbation of the weights of a learned classifier. Entropy-SGD works by optimizing the bound's prior, violating the hypothesis of the PAC-Bayes theorem that the prior is chosen independently of the data. Indeed, available implementations of Entropy-SGD rapidly obtain zero training error on random labels and the same holds of the Gibbs posterior. In order to obtain a valid generalization bound, we rely on a result showing that data-dependent priors obtained by stochastic gradient Langevin dynamics (SGLD) yield valid PAC-Bayes bounds provided the target distribution of SGLD is $\epsilon$-differentially private. We observe that test error on MNIST and CIFAR10 falls within the (empirically nonvacuous) risk bounds computed under the assumption that SGLD reaches stationarity. In particular, Entropy-SGLD can be configured to yield relatively tight generalization bounds and still fit real labels, although these same settings do not obtain state-of-the-art performance.

artificial intelligence, machine learning, pac-bayes, (16 more...)

1712.09376

Country:

Europe (0.67)
North America > United States (0.46)
North America > Canada > Ontario (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

#artificialintelligenceMar-2-2018, 13:42:39 GMT

What is the difference between Markov chain approximation and variational approximation?

PageRank and RBMs are not Markov chain approximations, rather they use Markov chains in their implementation. Similarly, LDA (Latent Dirichlet Allocation) is a generative probabilistic model (aka Bayesian hierarchical model) and not a variational approximation. LDA may use variational approximation methods for inference. Let me take the LDA model as an example. In LDA, a complicated generative model is constructed to learn the topic allocation probabilities of different documents.

approximation, artificial intelligence, machine learning, (13 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.90)

Lee, Jaehoon, Bahri, Yasaman, Novak, Roman, Schoenholz, Samuel S., Pennington, Jeffrey, Sohl-Dickstein, Jascha

Deep Neural Networks as Gaussian Processes

arXiv.org Machine LearningMar-2-2018

It has long been known that a single-layer fully-connected neural network with an i.i.d. prior over its parameters is equivalent to a Gaussian process (GP), in the limit of infinite network width. This correspondence enables exact Bayesian inference for infinite width neural networks on regression tasks by means of evaluating the corresponding GP. Recently, kernel functions which mimic multi-layer random neural networks have been developed, but only outside of a Bayesian framework. As such, previous work has not identified that these kernels can be used as covariance functions for GPs and allow fully Bayesian prediction with a deep neural network. In this work, we derive the exact equivalence between infinitely wide deep networks and GPs. We further develop a computationally efficient pipeline to compute the covariance function for these GPs. We then use the resulting GPs to perform Bayesian inference for wide deep neural networks on MNIST and CIFAR-10. We observe that trained neural network accuracy approaches that of the corresponding GP with increasing layer width, and that the GP uncertainty is strongly correlated with trained network prediction error. We further find that test performance increases as finite-width trained networks are made wider and more similar to a GP, and thus that GP predictions typically outperform those of finite-width networks. Finally we connect the performance of these GPs to the recent theory of signal propagation in random neural networks.

kernel, neural network, nonlinearity, (16 more...)

1711.00165

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Peyré, Gabriel, Cuturi, Marco

Computational Optimal Transport

Optimal Transport (OT) is a mathematical gem at the interface between probability, analysis and optimization. The goal of that theory is to define geometric tools that are useful to compare probability distributions. Earlier contributions originated from Monge's work in the 18th century, to be later rediscovered under a different formalism by Tolstoi in the 1920's, Kantorovich, Hitchcock and Koopmans in the 1940's. The problem was solved numerically by Dantzig in 1949 and others in the 1950's within the framework of linear programming, paving the way for major industrial applications in the second half of the 20th century. OT was later rediscovered under a different light by analysts in the 90's, following important work by Brenier and others, as well as in the computer vision/graphics fields under the name of earth mover's distances. Recent years have witnessed yet another revolution in the spread of OT, thanks to the emergence of approximate solvers that can scale to sizes and dimensions that are relevant to data sciences. Thanks to this newfound scalability, OT is being increasingly used to unlock various problems in imaging sciences (such as color or texture processing), computer vision and graphics (for shape manipulation) or machine learning (for regression,classification and density fitting). This short book reviews OT with a bias toward numerical methods and their applications in data sciences, and sheds lights on the theoretical properties of OT that make it particularly useful for some of these applications.

deep learning, kullback-leibler divergence, upstream oil & gas, (23 more...)

1803.00567

Country:

North America > United States (1.00)
Asia (0.27)
Europe > United Kingdom (0.14)

Genre: Research Report (1.00)

Industry:

Energy > Oil & Gas > Upstream (1.00)
Health & Medicine (0.87)
Government (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Mathematics of Computing (1.00)
Information Technology > Data Science (1.00)
(7 more...)

Sharma, Srinagesh, Cutler, James W.

Kernel Embedding Approaches to Orbit Determination of Spacecraft Clusters

This paper presents a novel formulation and solution of orbit determination over finite time horizons as a learning problem. We present an approach to orbit determination under very broad conditions that are satisfied for n-body problems. These weak conditions allow us to perform orbit determination with noisy and highly non-linear observations such as those presented by range-rate only (Doppler only) observations. We show that domain generalization and distribution regression techniques can learn to estimate orbits of a group of satellites and identify individual satellites especially with prior understanding of correlations between orbits and provide asymptotic convergence conditions. The approach presented requires only visibility and observability of the underlying state from observations and is particularly useful for autonomous spacecraft operations using low-cost ground stations or sensors. We validate the orbit determination approach using observations of two spacecraft (GRIFEX and MCubed-2) along with synthetic datasets of multiple spacecraft deployments and lunar orbits. We also provide a comparison with the standard techniques (EKF) under highly noisy conditions.

artificial intelligence, bayesian inference, machine learning, (19 more...)

1803.0065

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry:

Education (0.65)
Aerospace & Defense (0.46)
Government > Regional Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Šošić, Adrian, Rueckert, Elmar, Peters, Jan, Zoubir, Abdelhak M., Koeppl, Heinz

Inverse Reinforcement Learning via Nonparametric Spatio-Temporal Subgoal Modeling

Recent advances in the field of inverse reinforcement learning (IRL) have yielded sophisticated frameworks which relax the original modeling assumption that the behavior of an observed agent reflects only a single intention. Instead, the demonstration data is typically divided into parts, to account for the fact that different trajectories may correspond to different intentions, e.g., because they were generated by different domain experts. In this work, we go one step further: using the intuitive concept of subgoals, we build upon the premise that even a single trajectory can be explained more efficiently locally within a certain context than globally, enabling a more compact representation of the observed behavior. Based on this assumption, we build an implicit intentional model of the agent's goals to forecast its behavior in unobserved situations. The result is an integrated Bayesian prediction framework which provides smooth policy estimates that are consistent with the expert's plan and significantly outperform existing IRL solutions. Most notably, our framework naturally handles situations where the intentions of the agent change with time and classical IRL algorithms fail. In addition, due to its probabilistic nature, the model can be straightforwardly applied in an active learning setting to guide the demonstration process of the expert.

demonstration, machine learning, reinforcement learning, (16 more...)

1803.00444

Country: Europe > Germany (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)

Ghasemian, Amir, Hosseinmardi, Homa, Clauset, Aaron

Evaluating Overfit and Underfit in Models of Network Community Structure

A common data mining task on networks is community detection, which seeks an unsupervised decomposition of a network into structural groups based on statistical regularities in the network's connectivity. Although many methods exist, the No Free Lunch theorem for community detection implies that each makes some kind of tradeoff, and no algorithm can be optimal on all inputs. Thus, different algorithms will over or underfit on different inputs, finding more, fewer, or just different communities than is optimal, and evaluation methods that use a metadata partition as a ground truth will produce misleading conclusions about general accuracy. Here, we present a broad evaluation of over and underfitting in community detection, comparing the behavior of 16 state-of-the-art community detection algorithms on a novel and structurally diverse corpus of 406 real-world networks. We find that (i) algorithms vary widely both in the number of communities they find and in their corresponding composition, given the same input, (ii) algorithms can be clustered into distinct high-level groups based on similarities of their outputs on real-world networks, and (iii) these differences induce wide variation in accuracy on link prediction and link description tasks. We introduce a new diagnostic for evaluating overfitting and underfitting in practice, and use it to roughly divide community detection methods into general and specialized learning algorithms. Across methods and inputs, Bayesian techniques based on the stochastic block model and a minimum description length approach to regularization represent the best general learning approach, but can be outperformed under specific circumstances. These results introduce both a theoretically principled approach to evaluate over and underfitting in models of network community structure and a realistic benchmark by which new methods may be evaluated and compared.

artificial intelligence, data mining, machine learning, (17 more...)

1802.10582

Country:

North America > United States > Colorado (0.28)
North America > United States > California (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)