Goto

Collaborating Authors

 inference and learning


Exponential Quantum Communication Advantage in Distributed Inference and Learning

Neural Information Processing Systems

Training and inference with large machine learning models that far exceed the memory capacity of individual devices necessitates the design of distributed architectures, forcing one to contend with communication constraints. We present a framework for distributed computation over a quantum network in which data is encoded into specialized quantum states. We prove that for models within this framework, inference and training using gradient descent can be performed with exponentially less communication compared to their classical analogs, and with relatively modest overhead relative to standard gradient-based methods. We show that certain graph neural networks are particularly amenable to implementation within this framework, and moreover present empirical evidence that they perform well on standard benchmarks.To our knowledge, this is the first example of exponential quantum advantage for a generic class of machine learning problems that hold regardless of the data encoding cost. Moreover, we show that models in this class can encode highly nonlinear features of their inputs, and their expressivity increases exponentially with model depth.We also delineate the space of models for which exponential communication advantages hold by showing that they cannot hold for linear classification. Communication of quantum states that potentially limit the amount of information that can be extracted from them about the data and model parameters may also lead to improved privacy guarantees for distributed computation. Taken as a whole, these findings form a promising foundation for distributed machine learning over quantum networks.



Learning Dynamics from Input-Output Data with Hamiltonian Gaussian Processes

arXiv.org Artificial Intelligence

Embedding non-restrictive prior knowledge, such as energy conservation laws, in learning-based approaches is a key motive to construct physically consistent models from limited data, relevant for, e.g., model-based control. Recent work incorporates Hamiltonian dynamics into Gaussian Process (GP) regression to obtain uncertainty-quantifying models that adhere to the underlying physical principles. However, these works rely on velocity or momentum data, which is rarely available in practice. In this paper, we consider dynamics learning with non-conservative Hamiltonian GPs, and address the more realistic problem setting of learning from input-output data. We provide a fully Bayesian scheme for estimating probability densities of unknown hidden states, of GP hyperparameters, as well as of structural hyperparameters, such as damping coefficients. Considering the computational complexity of GPs, we take advantage of a reduced-rank GP approximation and leverage its properties for computationally efficient prediction and training. The proposed method is evaluated in a nonlinear simulation case study and compared to a state-of-the-art approach that relies on momentum measurements.


Neurons as Monte Carlo Samplers: Bayesian ๏ฟผInference and Learning in Spiking Networks

Neural Information Processing Systems

We propose a two-layer spiking network capable of performing approximate inference and learning for a hidden Markov model. The lower layer sensory neurons detect noisy measurements of hidden world states. The higher layer neurons with recurrent connections infer a posterior distribution over world states from spike trains generated by sensory neurons. We show how such a neuronal network with synaptic plasticity can implement a form of Bayesian inference similar to Monte Carlo methods such as particle filtering. Each spike in the population of inference neurons represents a sample of a particular hidden world state.


Exponential Quantum Communication Advantage in Distributed Inference and Learning

Neural Information Processing Systems

Training and inference with large machine learning models that far exceed the memory capacity of individual devices necessitates the design of distributed architectures, forcing one to contend with communication constraints. We present a framework for distributed computation over a quantum network in which data is encoded into specialized quantum states. We prove that for models within this framework, inference and training using gradient descent can be performed with exponentially less communication compared to their classical analogs, and with relatively modest overhead relative to standard gradient-based methods. We show that certain graph neural networks are particularly amenable to implementation within this framework, and moreover present empirical evidence that they perform well on standard benchmarks.To our knowledge, this is the first example of exponential quantum advantage for a generic class of machine learning problems that hold regardless of the data encoding cost. Moreover, we show that models in this class can encode highly nonlinear features of their inputs, and their expressivity increases exponentially with model depth.We also delineate the space of models for which exponential communication advantages hold by showing that they cannot hold for linear classification.


Neurons as Monte Carlo Samplers: Bayesian ๏ฟผInference and Learning in Spiking Networks

Neural Information Processing Systems

We propose a two-layer spiking network capable of performing approximate inference and learning for a hidden Markov model. The lower layer sensory neurons detect noisy measurements of hidden world states. The higher layer neurons with recurrent connections infer a posterior distribution over world states from spike trains generated by sensory neurons. We show how such a neuronal network with synaptic plasticity can implement a form of Bayesian inference similar to Monte Carlo methods such as particle filtering. Each spike in the population of inference neurons represents a sample of a particular hidden world state.


Hybrid Probabilistic Logic Programming: Inference and Learning

arXiv.org Artificial Intelligence

This thesis focuses on advancing probabilistic logic programming (PLP), which combines probability theory for uncertainty and logic programming for relations. The thesis aims to extend PLP to support both discrete and continuous random variables, which is necessary for applications with numeric data. The first contribution is the introduction of context-specific likelihood weighting (CS-LW), a new sampling algorithm that exploits context-specific independencies for computational gains. Next, a new hybrid PLP, DC#, is introduced, which integrates the syntax of Distributional Clauses with Bayesian logic programs and represents three types of independencies: i) conditional independencies (CIs) modeled in Bayesian networks; ii) context-specific independencies (CSIs) represented by logical rules, and iii) independencies amongst attributes of related objects in relational models expressed by combining rules. The scalable inference algorithm FO-CS-LW is introduced for DC#. Finally, the thesis addresses the lack of approaches for learning hybrid PLP from relational data with missing values and (probabilistic) background knowledge with the introduction of DiceML, which learns the structure and parameters of hybrid PLP and tackles the relational autocompletion problem. The conclusion discusses future directions and open challenges for hybrid PLP.


Inference and Learning for Probabilistic Description Logics

AAAI Conferences

The last years have seen an exponential increase in the interest for the development of methods for combining probability with Description Logics (DLs). These methods are very useful to model real world domains, where incompleteness and uncertainty are common. This combination has become a fundamental component of the Semantic Web.Our work started with the development of a probabilistic semantics for DL, called DISPONTE, that applies the distribution semantics to DLs. Under DISPONTE we annotate axioms of a theory with a probability, that can be interpreted as the degree of our belief in the corresponding axiom, and we assume that each axiom is independent of the others.ย Several algorithms have been proposed for supporting the development of the Semantic Web. Efficient DL reasoners, such us Pellet, are able to extract implicit information from the modeled ontologies. Despite the availability of many DL reasoners, the number of probabilistic reasoners is quite small. We developed BUNDLE, a reasoner based on Pellet that allows to compute the probability of queries. BUNDLE, like most DL reasoners, exploits an imperative language for implementing its reasoning algorithm. Nonetheless, usually reasoning algorithms use non-deterministic operators for doing inference. One of the most used approaches for doing reasoning is the tableau algorithm which applies a set of consistency preserving expansion rules to an ABox, but some of these rules are non-deterministic.In order to manage this non-determinism, we developed the system TRILL which performs inference over DISPONTE DLs. It implements the tableau algorithm in the declarative Prolog language, whose search strategy is exploited for taking into account the non-determinism of the reasoning process. Moreover, we developed a second version of TRILL, called TRILL^P, which implements some optimizations for reducing the running time.ย The parameters of probabilistic KBs are difficult to set. It is thus necessary to develop systems which automatically learn this parameters starting from the information available in the KB. We presented EDGE that learns the parameters of a DISPONTE KB, and LEAP, that learn the structure together with the parameters of a DISPONTE KB.ย The main objective is to apply the developed algorithms to Big Data. Nonetheless, the size of the data requires the implementation of algorithms able to handle it. It is thus necessary to exploit approaches based on the parallelization and on cloud computing. Nowadays, we are working to improve EDGE and LEAP in order to parallelize them.


Inference and learning in probabilistic logic programs using weighted Boolean formulas

arXiv.org Artificial Intelligence

Probabilistic logic programs are logic programs in which some of the facts are annotated with probabilities. This paper investigates how classical inference and learning tasks known from the graphical model community can be tackled for probabilistic logic programs. Several such tasks such as computing the marginals given evidence and learning from (partial) interpretations have not really been addressed for probabilistic logic programs before. The first contribution of this paper is a suite of efficient algorithms for various inference tasks. It is based on a conversion of the program and the queries and evidence to a weighted Boolean formula. This allows us to reduce the inference tasks to well-studied tasks such as weighted model counting, which can be solved using state-of-the-art methods known from the graphical model and knowledge compilation literature. The second contribution is an algorithm for parameter estimation in the learning from interpretations setting. The algorithm employs Expectation Maximization, and is built on top of the developed inference algorithms. The proposed approach is experimentally evaluated. The results show that the inference algorithms improve upon the state-of-the-art in probabilistic logic programming and that it is indeed possible to learn the parameters of a probabilistic logic program from interpretations.


Fisher Scoring and a Mixture of Modes Approach for Approximate Inference and Learning in Nonlinear State Space Models

Neural Information Processing Systems

The difficulties lie in the Monte-Carlo E-step which consists of sampling from the posterior distribution of the hidden variables given the observations. The new idea presented in this paper is to generate samples from a Gaussian approximation to the true posterior from which it is easy to obtain independent samples. The parameters of the Gaussian approximation are either derived from the extended Kalman filter or the Fisher scoring algorithm. In case the posterior density is multimodal we propose to approximate the posterior by a sum of Gaussians (mixture of modes approach). We show that sampling from the approximate posterior densities obtained by the above algorithms leads to better models than using point estimates for the hidden states. In our experiment, the Fisher scoring algorithm obtained a better approximation of the posterior mode than the EKF. For a multimodal distribution, the mixture of modes approach gave superior results. 1 INTRODUCTION Nonlinear state space models (NSSM) are a general framework for representing nonlinear time series. In particular, any NARMAX model (nonlinear auto-regressive moving average model with external inputs) can be translated into an equivalent NSSM.