AITopics | Directed Networks

Collaborating Authors

Directed Networks

News Overviews Instructional Materials AI-Alerts Classics

XGBoostLSS -- An extension of XGBoost to probabilistic forecasting

arXiv.org Artificial IntelligenceJul-11-2019

We propose a new framework of XGBoost that predicts the entire conditional distribution of a univariate response variable. In particular, XGBoostLSS models all moments of a parametric distribution, i.e., mean, location, scale and shape (LSS), instead of the conditional mean only. Choosing from a wide range of continuous, discrete and mixed discrete-continuous distribution, modelling and predicting the entire conditional distribution greatly enhances the flexibility of XGBoost, as it allows to gain additional insight into the data generating process, as well as to create probabilistic forecasts from which prediction intervals and quantiles of interest can be derived. We present both a simulation study and real world examples that demonstrate the benefits of our approach.

artificial intelligence, machine learning, xgboostlss, (17 more...)

arXiv.org Artificial Intelligence

1907.03178

Country: Europe > Germany (0.29)

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

The Design of Mutual Information

Carrara, Nicholas

arXiv.org Machine LearningJul-10-2019

We derive the functional form of mutual information (MI) from a set of design criteria and a principle of maximal sufficiency. The (MI) between two sets of propositions is a global quantifier of correlations and is implemented as a tool for ranking joint probability distributions with respect to said correlations. The derivation parallels the derivations of relative entropy with an emphasis on the behavior of independent variables. By constraining the functional $I$ according to special cases, we arrive at its general functional form and hence establish a clear meaning behind its definition. We also discuss the notion of sufficiency and offer a new definition which broadens its applicability.

artificial intelligence, correlation, machine learning, (18 more...)

arXiv.org Machine Learning

1907.06992

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Time series cluster kernels to exploit informative missingness and incomplete label information

Mikalsen, Karl Øyvind, Soguero-Ruiz, Cristina, Bianchi, Filippo Maria, Revhaug, Arthur, Jenssen, Robert

arXiv.org Machine LearningJul-10-2019

The time series cluster kernel (TCK) provides a powerful tool for analysing multivariate time series subject to missing data. TCK is designed using an ensemble learning approach in which Bayesian mixture models form the base models. Because of the Bayesian approach, TCK can naturally deal with missing values without resorting to imputation and the ensemble strategy ensures robustness to hyperparameters, making it particularly well suited for unsupervised learning. However, TCK assumes missing at random and that the underlying missingness mechanism is ignorable, i.e. uninformative, an assumption that does not hold in many real-world applications, such as e.g. medicine. To overcome this limitation, we present a kernel capable of exploiting the potentially rich information in the missing values and patterns, as well as the information from the observed data. In our approach, we create a representation of the missing pattern, which is incorporated into mixed mode mixture models in such a way that the information provided by the missing patterns is effectively exploited. Moreover, we also propose a semi-supervised kernel, capable of taking advantage of incomplete label information to learn more accurate similarities. Experiments on benchmark data, as well as a real-world case study of patients described by longitudinal electronic health record data who potentially suffer from hospital-acquired infections, demonstrate the effectiveness of the proposed methods.

artificial intelligence, bayesian inference, machine learning, (19 more...)

arXiv.org Machine Learning

1907.05251

Country:

North America (0.46)
Europe > Norway (0.15)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Surgery (1.00)
Health & Medicine > Therapeutic Area > Gastroenterology (0.93)
Health & Medicine > Health Care Technology (0.69)
Health & Medicine > Therapeutic Area > Oncology (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
(2 more...)

Add feedback

Variational Autoencoders and Nonlinear ICA: A Unifying Framework

Khemakhem, Ilyes, Kingma, Diederik P., Hyvärinen, Aapo

arXiv.org Machine LearningJul-10-2019

The framework of variational autoencoders allows us to efficiently learn deep latent-variable models, such that the model's marginal distribution over observed variables fits the data. Often, we're interested in going a step further, and want to approximate the true joint distribution over observed and latent variables, including the true prior and posterior distributions over latent variables. This is known to be generally impossible due to unidentifiability of the model. We address this issue by showing that for a broad family of deep latent-variable models, identification of the true joint distribution over observed and latent variables is actually possible up to a simple transformation, thus achieving a principled and powerful form of disentanglement. Our result requires a factorized prior distribution over the latent variables that is conditioned on an additionally observed variable, such as a class label or almost any other observation. We build on recent developments in nonlinear ICA, which we extend to the case with noisy, undercomplete or discrete observations, integrated in a maximum likelihood framework. The result also trivially contains identifiable flow-based generative models as a special case.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Machine Learning

1907.04809

Country: Asia (0.28)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Exploiting Causality for Selective Belief Filtering in Dynamic Bayesian Networks (Extended Abstract)

Albrecht, Stefano V., Ramamoorthy, Subramanian

arXiv.org Artificial IntelligenceJul-10-2019

Dynamic Bayesian networks (DBNs) are a general model for stochastic processes with partially observed states. Belief filtering in DBNs is the task of inferring the belief state (i.e. the probability distribution over process states) based on incomplete and uncertain observations. In this article, we explore the idea of accelerating the filtering task by automatically exploiting causality in the process. We consider a specific type of causal relation, called passivity, which pertains to how state variables cause changes in other variables. We present the Passivity-based Selective Belief Filtering (PSBF) method, which maintains a factored belief representation and exploits passivity to perform selective updates over the belief factors. PSBF is evaluated in both synthetic processes and a simulated multi-robot warehouse, where it outperformed alternative filtering methods by exploiting passivity.

artificial intelligence, machine learning, passivity, (12 more...)

arXiv.org Artificial Intelligence

1907.0585

Country: North America > United States > Texas (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Near-optimal Bayesian Solution For Unknown Discrete Markov Decision Process

Tossou, Aristide, Dimitrakakis, Christos, Basu, Debabrota

arXiv.org Artificial IntelligenceJul-9-2019

We tackle the problem of acting in an unknown finite and discrete Markov Decision Process (MDP) for which the expected shortest path from any state to any other state is bounded by a finite number $D$. An MDP consists of $S$ states and $A$ possible actions per state. Upon choosing an action $a_t$ at state $s_t$, one receives a real value reward $r_t$, then one transits to a next state $s_{t+1}$. The reward $r_t$ is generated from a fixed reward distribution depending only on $(s_t, a_t)$ and similarly, the next state $s_{t+1}$ is generated from a fixed transition distribution depending only on $(s_t, a_t)$. The objective is to maximize the accumulated rewards after $T$ interactions. In this paper, we consider the case where the reward distributions, the transitions, $T$ and $D$ are all unknown. We derive the first polynomial time Bayesian algorithm, BUCRL{} that achieves up to logarithm factors, a regret (i.e the difference between the accumulated rewards of the optimal policy and our algorithm) of the optimal order $\tilde{\mathcal{O}}(\sqrt{DSAT})$. Importantly, our result holds with high probability for the worst-case (frequentist) regret and not the weaker notion of Bayesian regret. We perform experiments in a variety of environments that demonstrate the superiority of our algorithm over previous techniques. Our work also illustrates several results that will be of independent interest. In particular, we derive a sharper upper bound for the KL-divergence of Bernoulli random variables. We also derive sharper upper and lower bounds for Beta and Binomial quantiles. All the bound are very simple and only use elementary functions.

artificial intelligence, machine learning, quantile, (16 more...)

arXiv.org Artificial Intelligence

1906.09114

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Goal Recognition Design in Deterministic Environments

Keren, Sarah, Gal, Avigdor, Karpas, Erez

Journal of Artificial Intelligence ResearchJul-9-2019

Goal recognition design (GRD) facilitates understanding the goals of acting agents through the analysis and redesign of goal recognition models, thus offering a solution for assessing and minimizing the maximal progress of any agent in the model before goal recognition is guaranteed. In a nutshell, given a model of a domain and a set of possible goals, a solution to a GRD problem determines (1) the extent to which actions performed by an agent within the model reveal the agent’s objective; and (2) how best to modify the model so that the objective of an agent can be detected as early as possible. This approach is relevant to any domain in which rapid goal recognition is essential and the model design can be controlled. Applications include intrusion detection, assisted cognition, computer games, and human-robot collaboration. A GRD problem has two components: the analyzed goal recognition setting, and a design model specifying the possible ways the environment in which agents act can be modified so as to facilitate recognition. This work formulates a general framework for GRD in deterministic and partially observable environments, and offers a toolbox of solutions for evaluating and optimizing model quality for various settings. For the purpose of evaluation we suggest the worst case distinctiveness (WCD) measure, which represents the maximal cost of a path an agent may follow before its goal can be inferred by a goal recognition system. We offer novel compilations to classical planning for calculating WCD in settings where agents are bounded-suboptimal. We then suggest methods for minimizing WCD by searching for an optimal redesign strategy within the space of possible modifications, and using pruning to increase efficiency. We support our approach with an empirical evaluation that measures WCD in a variety of GRD settings and tests the efficiency of our compilation-based methods for computing it. We also examine the effectiveness of reducing WCD via redesign and the performance gain brought about by our proposed pruning strategy.

agent, modification, sequence, (14 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.11551

AI Access Foundation

11551

Journal of Artificial Intelligence Research

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Vietnam > Long An Province (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Genre:

Overview (0.67)
Research Report > Promising Solution (0.48)

Industry:

Information Technology (1.00)
Leisure & Entertainment > Games > Computer Games (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling > Plan Recognition (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)

Add feedback

Convergence Rates for Gaussian Mixtures of Experts

Ho, Nhat, Yang, Chiao-Yu, Jordan, Michael I.

arXiv.org Machine LearningJul-9-2019

We provide a theoretical treatment of over-specified Gaussian mixtures of experts with covariate-free gating networks. We establish the convergence rates of the maximum likelihood estimation (MLE) for these models. Our proof technique is based on a novel notion of \emph{algebraic independence} of the expert functions. Drawing on optimal transport theory, we establish a connection between the algebraic independence and a certain class of partial differential equations (PDEs). Exploiting this connection allows us to derive convergence rates and minimax lower bounds for parameter estimation.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

1907.04377

Country: North America (0.27)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)

Add feedback

Bayesian deep learning with hierarchical prior: Predictions from limited and noisy data

Luo, Xihaier, Kareem, Ahsan

arXiv.org Machine LearningJul-8-2019

Datasets in engineering applications are often limited and contaminated, mainly due to unavoidable measurement noise and signal distortion. Thus, using conventional data-driven approaches to build a reliable discriminative model, and further applying this identified surrogate to uncertainty analysis remains to be very challenging. A deep learning approach is presented to provide predictions based on limited and noisy data. To address noise perturbation, the Bayesian learning method that naturally facilitates an automatic updating mechanism is considered to quantify and propagate model uncertainties into predictive quantities. Specifically, hierarchical Bayesian modeling (HBM) is first adopted to describe model uncertainties, which allows the prior assumption to be less subjective, while also makes the proposed surrogate more robust. Next, the Bayesian inference is seamlessly integrated into the DL framework, which in turn supports probabilistic programming by yielding a probability distribution of the quantities of interest rather than their point estimates. Variational inference (VI) is implemented for the posterior distribution analysis where the intractable marginalization of the likelihood function over parameter space is framed in an optimization format, and stochastic gradient descent method is applied to solve this optimization problem. Finally, Monte Carlo simulation is used to obtain an unbiased estimator in the predictive phase of Bayesian inference, where the proposed Bayesian deep learning (BDL) scheme is able to offer confidence bounds for the output estimation by analyzing propagated uncertainties. The effectiveness of Bayesian shrinkage is demonstrated in improving predictive performance using contaminated data, and various examples are provided to illustrate concepts, methodologies, and algorithms of this proposed BDL modeling technique.

deep learning, posterior distribution, upstream oil & gas, (18 more...)

arXiv.org Machine Learning

1907.0424

Country:

North America > United States (0.67)
Asia (0.14)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Asymptotic Bayes risk for Gaussian mixture in a semi-supervised setting

Lelarge, Marc, Miolane, Leo

arXiv.org Machine LearningJul-8-2019

Semi-supervised learning (SSL) uses unlabeled data for training and has been shown to greatly improve performances when compared to a supervised approach on the labeled data available. This claim depends both on the amount of labeled data available and on the algorithm used. In this paper, we compute analytically the gap between the best fully-supervised approach on labeled data and the best semi-supervised approach using both labeled and unlabeled data. We quantify the best possible increase in performance obtained thanks to the unlabeled data, i.e. we compute the accuracy increase due to the information contained in the unlabeled data. Our work deals with a simple high-dimensional Gaussian mixture model for the data in a Bayesian setting. Our rigorous analysis builds on recent theoretical breakthroughs in high-dimensional inference and a large body of mathematical tools from statistical physics initially developed for spin glasses.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Machine Learning

1907.03792

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback