AITopics

Probability estimation is one of the fundamental tasks in statistics and machine learning. However, standard methods for probability estimation on discrete objects do not handle object structure in a satisfactory manner. In this paper, we derive a general Bayesian network formulation for probability estimation on leaf-labeled trees that enables flexible approximations which can generalize beyond observations. We show that efficient algorithms for learning Bayesian networks can be easily extended to probability estimation on this challenging structured space. Experiments on both synthetic and real data show that our methods greatly outperform the current practice of using the empirical distribution, as well as a previous effort for probability estimation on trees.

artificial intelligence, bayesian inference, machine learning, (18 more...)

Country: North America (0.46)

Industry:

Health & Medicine > Therapeutic Area (0.95)
Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Shah, Harshil, Barber, David

Generative Neural Machine Translation

We introduce Generative Neural Machine Translation (GNMT), a latent variable architecture which is designed to model the semantics of the source and target sentences. We modify an encoder-decoder translation model by adding a latent variable as a language agnostic representation which is encouraged to learn the meaning of the sentence. GNMT achieves competitive BLEU scores on pure translation tasks, and is superior when there are missing words in the source sentence. We augment the model to facilitate multilingual translation and semi-supervised learning without adding parameters. This framework significantly reduces overfitting when there is limited paired data available, and is effective for translating between pairs of languages not seen during training.

machine learning, natural language, source sentence, (20 more...)

Country: Asia (0.15)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Figurnov, Mikhail, Mohamed, Shakir, Mnih, Andriy

Implicit Reparameterization Gradients

By providing a simple and efficient way of computing low-variance gradients of continuous random variables, the reparameterization trick has become the technique of choice for training a variety of latent variable models. However, it is not applicable to a number of important continuous distributions. We introduce an alternative approach to computing reparameterization gradients based on implicit differentiation and demonstrate its broader applicability by applying it to Gamma, Beta, Dirichlet, and von Mises distributions, which cannot be used with the classic reparameterization trick. Our experiments show that the proposed approach is faster and more accurate than the existing gradient estimators for these distributions.

artificial intelligence, machine learning, natural language, (15 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
(2 more...)

Knoblauch, Jeremias, Jewson, Jack E., Damoulas, Theodoros

Doubly Robust Bayesian Inference for Non-Stationary Streaming Data with $\beta$-Divergences

We present the very first robust Bayesian Online Changepoint Detection algorithm through General Bayesian Inference (GBI) with $\beta$-divergences. The resulting inference procedure is doubly robust for both the predictive and the changepoint (CP) posterior, with linear time and constant space complexity. We provide a construction for exponential models and demonstrate it on the Bayesian Linear Regression model. In so doing, we make two additional contributions: Firstly, we make GBI scalable using Structural Variational approximations that are exact as $\beta \to 0$. Secondly, we give a principled way of choosing the divergence parameter $\beta$ by minimizing expected predictive loss on-line. Reducing False Discovery Rates of \CPs from up to 99\% to 0\% on real world data, this offers the state of the art.

artificial intelligence, bayesian inference, machine learning, (13 more...)

Country: North America (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Chen, Tianle, Keng, Brian, Moreno, Javier

Multivariate Arrival Times with Recurrent Neural Networks for Personalized Demand Forecasting

arXiv.org Machine LearningDec-29-2018

Access to a large variety of data across a massive population has made it possible to predict customer purchase patterns and responses to marketing campaigns. In particular, accurate demand forecasts for popular products with frequent repeat purchases are essential since these products are one of the main drivers of profits. However, buyer purchase patterns are extremely diverse and sparse on a per-product level due to population heterogeneity as well as dependence in purchase patterns across product categories. Traditional methods in survival analysis have proven effective in dealing with censored data by assuming parametric distributions on inter-arrival times. Distributional parameters are then fitted, typically in a regression framework. On the other hand, neural-network based models take a non-parametric approach to learn relations from a larger functional class. However, the lack of distributional assumptions make it difficult to model partially observed data. In this paper, we model directly the inter-arrival times as well as the partially observed information at each time step in a survival-based approach using Recurrent Neural Networks (RNN) to model purchase times jointly over several products. Instead of predicting a point estimate for inter-arrival times, the RNN outputs parameters that define a distributional estimate. The loss function is the negative log-likelihood of these parameters given partially observed data. This approach allows one to leverage both fully observed data as well as partial information. By externalizing the censoring problem through a log-likelihood loss function, we show that substantial improvements over state-of-the-art machine learning methods can be achieved. We present experimental results based on two open datasets as well as a study on a real dataset from a large retailer.

customer, dataset, inter-arrival time, (13 more...)

1812.11444

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > New York (0.04)
Europe > United Kingdom (0.04)

Genre: Research Report (1.00)

Industry: Consumer Products & Services > Travel (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

arXiv.org Machine LearningDec-28-2018

Classification of Functioning, Disability, and Health: ICF-CY Self Care (SCADI Dataset) Using Predictive Analytics

Choudhury, Avishek

The International Classification of Functioning, Disability, and Health for Children and Youth (ICF-CY) is a scaffold for designating and systematizing data on functioning and disability. It offers a standard semantic and a theoretical foundation for the demarcation and extent of wellbeing and infirmity. The multidimensional layout of ICF-CY comprehends a plethora of information with about 1400 categories making it difficult to analyze. Our research proposes a predictive model that classify self-care problems on Self-Care Activities Dataset based on the ICF- CY. The data used in this study resides 206 attributes of 70 children with motor and physical disability. Our study implements, compare and analyze Random Forest, Support vector machine, Naive Bayes, Hoeffding tree, and Lazy locally weighted learning using two-tailed T-test at 95% confidence interval. Boruta algorithm involved in the study minimizes the data dimensionality to advocate the minimal-optimal set of predictors. Random forest gave the best classification accuracy of 84.75%; root mean squared error of 0.18 and receiver operating characteristic of 0.99. Predictive analytics can simplify the usage of ICF-CY by automating the classification process of disability, functioning, and health.

caring, classification, icf-cy, (13 more...)

1901.00756

Country:

North America > United States > New Jersey > Hudson County > Hoboken (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > Switzerland (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.69)
Research Report > Experimental Study (0.68)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Consumer Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Pakman, Ari, Paninski, Liam

Discrete Neural Processes

arXiv.org Machine LearningDec-28-2018

Many data generating processes involve latent random variables over discrete combinatorial spaces whose size grows factorially with the dataset. In these settings, existing posterior inference methods can be inaccurate and/or very slow. In this work we develop methods for efficient amortized approximate Bayesian inference over discrete combinatorial spaces, with applications to random permutations, probabilistic clustering (such as Dirichlet process mixture models) and random communities (such as stochastic block models). The approach is based on mapping distributed, symmetry-invariant representations of discrete arrangements into conditional probabilities. The resulting algorithms parallelize easily, yield iid samples from the approximate posteriors, and can easily be applied to both conjugate and non-conjugate models, as training only requires samples from the generative model.

generative model, permutation, symmetry, (13 more...)

1901.00409

Country:

Asia > Middle East > Lebanon (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

McAllister, Rowan, Kahn, Gregory, Clune, Jeff, Levine, Sergey

Robustness to Out-of-Distribution Inputs via Task-Aware Generative Uncertainty

arXiv.org Machine LearningDec-27-2018

Deep learning provides a powerful tool for machine perception when the observations resemble the training data. However, real-world robotic systems must react intelligently to their observations even in unexpected circumstances. This requires a system to reason about its own uncertainty given unfamiliar, out-of-distribution observations. Approximate Bayesian approaches are commonly used to estimate uncertainty for neural network predictions, but can struggle with out-of-distribution observations. Generative models can in principle detect out-of-distribution observations as those with a low estimated density. However, the mere presence of an out-of-distribution input does not by itself indicate an unsafe situation. In this paper, we present a method for uncertainty-aware robotic perception that combines generative modeling and model uncertainty to cope with uncertainty stemming from out-of-distribution states. Our method estimates an uncertainty measure about the model's prediction, taking into account an explicit (generative) model of the observation distribution to handle out-of-distribution inputs. This is accomplished by probabilistically projecting observations onto the training distribution, such that out-of-distribution inputs map to uncertain in-distribution observations, which in turn produce uncertain task-related predictions, but only if task-relevant parts of the image change. We evaluate our method on an action-conditioned collision prediction task with both simulated and real data, and demonstrate that our method of projecting out-of-distribution observations improves the performance of four standard Bayesian and non-Bayesian neural network approaches, offering more favorable trade-offs between the proportion of time a robot can remain autonomous and the proportion of impending crashes successfully avoided.

deep learning, neural network, out-of-distribution input, (21 more...)

1812.10687

Country: North America > United States > California (0.14)

Genre: Research Report (0.82)

Industry:

Materials > Chemicals > Industrial Gases > Liquified Gas (0.46)
Materials > Chemicals > Commodity Chemicals > Petrochemicals > LNG (0.46)
Energy > Oil & Gas > Midstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
(2 more...)

Bakirov, Rashid, Gabrys, Bogdan, Fay, Damien

Generic adaptation strategies for automated machine learning

arXiv.org Machine LearningDec-27-2018

Automation of machine learning model development is increasingly becoming an established research area. While automated model selection and automated data pre-processing have been studied in depth, there is, however, a gap concerning automated model adaptation strategies when multiple strategies are available. Manually developing an adaptation strategy, including estimation of relevant parameters can be time consuming and costly. In this paper we address this issue by proposing generic adaptation strategies based on approaches from earlier works. Experimental results after using the proposed strategies with three adaptive algorithms on 36 datasets confirm their viability. These strategies often achieve better or comparable performance with custom adaptation strategies and naive methods such as repeatedly using only one adaptive mechanism.

adaptation strategy, artificial intelligence, machine learning, (16 more...)

1812.10793

Country:

North America > United States > New York (0.14)
Europe > United Kingdom > England > Dorset (0.14)
Oceania > Australia (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Education (1.00)
Materials > Chemicals (0.94)
Energy > Oil & Gas (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)

Miguel, Antonio, Llombart, Jorge, Ortega, Alfonso, Lleida, Eduardo

Tied Hidden Factors in Neural Networks for End-to-End Speaker Recognition

arXiv.org Machine LearningDec-27-2018

In this paper we propose a method to model speaker and session variability and able to generate likelihood ratios using neural networks in an end-to-end phrase dependent speaker verification system. As in Joint Factor Analysis, the model uses tied hidden variables to model speaker and session variability and a MAP adaptation of some of the parameters of the model. In the training procedure our method jointly estimates the network parameters and the values of the speaker and channel hidden variables. This is done in a two-step backpropagation algorithm, first the network weights and factor loading matrices are updated and then the hidden variables, whose gradients are calculated by aggregating the corresponding speaker or session frames, since these hidden variables are tied. The last layer of the network is defined as a linear regression probabilistic model whose inputs are the previous layer outputs. This choice has the advantage that it produces likelihoods and additionally it can be adapted during the enrolment using MAP without the need of a gradient optimization. The decisions are made based on the ratio of the output likelihoods of two neural network models, speaker adapted and universal background model. The method was evaluated on the RSR2015 database.

factor analysis, neural network, transaction, (11 more...)

doi: 10.21437/Interspeech.2017-1314

1812.11946

Country:

North America > Canada > Ontario > Toronto (0.04)
North America > Canada > Quebec > Montreal (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(6 more...)

Genre: Research Report (0.64)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)