Goto

Collaborating Authors

 tresp


Robust Regression with Twinned Gaussian Processes

Neural Information Processing Systems

We propose a Gaussian process (GP) framework for robust inference in which a GP prior on the mixing weights of a two-component noise model augments the standard process over latent function values. This approach is a generalization of the mixture likelihood used in traditional robust GP regression, and a specialization of the GP mixture models suggested by Tresp (2000) and Rasmussen and Ghahramani (2002). The value of this restriction is in its tractable expectation propagation updates, which allow for faster inference and model selection, and better convergence than the standard mixture. An additional benefit over the latter method lies in our ability to incorporate knowledge of the noise domain to influence predictions, and to recover with the predictive distribution information about the outlier distribution via the gating process. The model has asymptotic complexity equal to that of conventional robust methods, but yields more confident predictions on benchmark problems than classical heavy-tailed models and exhibits improved stability for data with clustered corruptions, for which they fail altogether.


Robust Regression with Twinned Gaussian Processes

Neural Information Processing Systems

We propose a Gaussian process (GP) framework for robust inference in which a GP prior on the mixing weights of a two-component noise model augments the standard process over latent function values. This approach is a generalization of the mixture likelihood used in traditional robust GP regression, and a specialization of the GP mixture models suggested by Tresp (2000) and Rasmussen and Ghahramani (2002). The value of this restriction is in its tractable expectation propagation updates, which allow for faster inference and model selection, and better convergence than the standard mixture. An additional benefit over the latter method lies in our ability to incorporate knowledge of the noise domain to influence predictions, and to recover with the predictive distribution information about the outlier distribution via the gating process. The model has asymptotic complexity equal to that of conventional robust methods, but yields more confident predictions on benchmark problems than classical heavy-tailed models and exhibits improved stability for data with clustered corruptions, for which they fail altogether.


The Tensor Brain: Semantic Decoding for Perception and Memory

arXiv.org Artificial Intelligence

We analyse perception and memory using mathematical models for knowledge graphs and tensors to gain insights in the corresponding functionalities of the human mind. Our discussion is based on the concept of propositional sentences consisting of \textit{subject-predicate-object} (SPO) triples for expressing elementary facts. SPO sentences are the basis for most natural languages but might also be important for explicit perception and declarative memories, as well as intra-brain communication and the ability to argue and reason. A set of SPO sentences can be described as a knowledge graph, which can be transformed into an adjacency tensor. We introduce tensor models, where concepts have dual representations as indices and associated embeddings, two constructs we believe are essential for the understanding of implicit and explicit perception and memory in the brain. We argue that a biological realization of perception and memory imposes constraints on information processing. In particular, we propose that explicit perception and declarative memories require a semantic decoder, which, in a simple realization, is based on four layers: First, a sensory memory layer, as a buffer for sensory input, second, an index layer representing concepts, third, a memoryless representation layer for the broadcasting of information and fourth, a working memory layer as a processing center and data buffer. In a Bayesian brain interpretation, semantic memory defines the prior for triple statements. We propose that, in evolution and during development, semantic memory, episodic memory and natural language evolved as emergent properties in the agents' process to gain deeper understanding of sensory information. We present a concrete model realization and validate some aspects of our proposed model on benchmark data where we demonstrate state-of-the-art performance.


Improving Visual Relation Detection using Depth Maps

arXiv.org Artificial Intelligence

State of the art visual relation detection methods have been relying on features extracted from RGB images including objects' 2D positions. In this paper, we argue that the 3D positions of objects in space can provide additional valuable information about object relations. This information helps not only to detect spatial relations, such as "standing behind", but also non-spatial relations, such as "holding". Since 3D information of a scene is not easily accessible, we propose incorporating a pre-trained RGB-to-Depth model within visual relation detection frameworks. We discuss different feature extraction strategies from depth maps and show their critical role in relation detection. Our experiments confirm that the performance of state-of-the-art visual relation detection approaches can significantly be improved by utilizing depth map information.


Mixture Approximations to Bayesian Networks

arXiv.org Artificial Intelligence

Structure and parameters in a Bayesian network uniquely specify the probability distribution of the modeled domain. The locality of both structure and probabilistic information are the great benefits of Bayesian networks and require the modeler to only specify local information. On the other hand this locality of information might prevent the modeler - and even more any other person - from obtaining a general overview of the important relationships within the domain. The goal of the work presented in this paper is to provide an "alternative" view on the knowledge encoded in a Bayesian network which might sometimes be very helpful for providing insights into the underlying domain. The basic idea is to calculate a mixture approximation to the probability distribution represented by the Bayesian network. The mixture component densities can be thought of as representing typical scenarios implied by the Bayesian model, providing intuition about the basic relationships. As an additional benefit, performing inference in the approximate model is very simple and intuitive and can provide additional insights. The computational complexity for the calculation of the mixture approximations criticaly depends on the measure which defines the distance between the probability distribution represented by the Bayesian network and the approximate distribution. Both the KL-divergence and the backward KL-divergence lead to inefficient algorithms. Incidentally, the latter is used in recent work on mixtures of mean field solutions to which the work presented here is closely related. We show, however, that using a mean squared error cost function leads to update equations which can be solved using the junction tree algorithm. We conclude that the mean squared error cost function can be used for Bayesian networks in which inference based on the junction tree is tractable. For large networks, however, one may have to rely on mean field approximations.


Adaptive Transfer Learning

AAAI Conferences

Transfer learning aims at reusing the knowledge in some source tasks to improve the learning of a target task. Many transfer learning methods assume that the source tasks and the target task be related, even though many tasks are not related in reality. However, when two tasks are unrelated, the knowledge extracted from a source task may not help, and even hurt, the performance of a target task. Thus, how to avoid negative transfer and then ensure a "safe transfer" of knowledge is crucial in transfer learning. In this paper, we propose an Adaptive Transfer learning algorithm based on Gaussian Processes (AT-GP), which can be used to adapt the transfer learning schemes by automatically estimating the similarity between a source and a target task. The main contribution of our work is that we propose a new semi-parametric transfer kernel for transfer learning from a Bayesian perspective, and propose to learn the model with respect to the target task, rather than all tasks as in multi-task learning. We can formulate the transfer learning problem as a unified Gaussian Process (GP) model. The adaptive transfer ability of our approach is verified on both synthetic and real-world datasets.


Combining Estimators Using Non-Constant Weighting Functions

Neural Information Processing Systems

This paper discusses the linearly weighted combination of estimators in which the weighting functions are dependent on the input. We show that the weighting functions can be derived either by evaluating the input dependent variance of each estimator or by estimating how likely it is that a given estimator has seen data in the region of the input space close to the input pattern. The latter solution is closely related to the mixture of experts approach and we show how learning rules for the mixture of experts can be derived from the theory about learning with missing features. The presented approaches are modular since the weighting functions can easily be modified (no retraining) if more estimators are added. Furthermore, it is easy to incorporate estimators which were not derived from data such as expert systems or algorithms.


Efficient Methods for Dealing with Missing Data in Supervised Learning

Neural Information Processing Systems

In many applications it is important to know how to react if the available information is incomplete, if sensors fail or if sources of information become A.t the time of the research for this paper, a visiting researcher at the Center for Biological and Computational Learning, MIT.


Efficient Methods for Dealing with Missing Data in Supervised Learning

Neural Information Processing Systems

In many applications it is important to know how to react if the available information is incomplete, if sensors fail or if sources of information become A.t the time of the research for this paper, a visiting researcher at the Center for Biological and Computational Learning, MIT.


Combining Estimators Using Non-Constant Weighting Functions

Neural Information Processing Systems

This paper discusses the linearly weighted combination of estimators in which the weighting functions are dependent on the input. We show that the weighting functions can be derived either by evaluating the input dependent variance of each estimator or by estimating how likely it is that a given estimator has seen data in the region of the input space close to the input pattern. The latter solution is closely related to the mixture of experts approach and we show how learning rules for the mixture of experts can be derived from the theory about learning with missing features. The presented approaches are modular since the weighting functions can easily be modified (no retraining) if more estimators are added. Furthermore, it is easy to incorporate estimators which were not derived from data such as expert systems or algorithms.