Not enough data to create a plot.
Try a different view from the menu above.
Country
Boosting Density Estimation
Several authors have suggested viewing boosting as a gradient descent search for a good fit in function space. We apply gradient-based boosting methodology to the unsupervised learning problem of density estimation. We show convergence properties of the algorithm and prove that a strength of weak learnability property appliesto this problem as well. We illustrate the potential of this approach through experiments with boosting Bayesian networks to learn density models.
Real-Time Monitoring of Complex Industrial Processes with Particle Filters
Morales-Menรฉndez, Rubรฉn, Freitas, Nando de, Poole, David
We consider two ubiquitous processes:an industrial dryer and a level tank. For these applications, wecompared three particle filtering variants: standard particle filtering, Rao-Blackwellised particle filtering and a version of Rao-Blackwellised particle filtering that does one-step look-ahead to select good sampling regions. We show that the overhead of the extra processing perparticle of the more sophisticated methods is more than compensated bythe decrease in error and variance.
Bayesian Monte Carlo
Ghahramani, Zoubin, Rasmussen, Carl E.
We investigate Bayesian alternatives to classical Monte Carlo methods for evaluating integrals. Bayesian Monte Carlo (BMC) allows the incorporation ofprior knowledge, such as smoothness of the integrand, into the estimation. In a simple problem we show that this outperforms any classical importance sampling method. We also attempt more challenging multidimensionalintegrals involved in computing marginal likelihoods ofstatistical models (a.k.a.
Combining Features for BCI
Dornhege, Guido, Blankertz, Benjamin, Curio, Gabriel, Mรผller, Klaus-Robert
Recently, interest is growing to develop an effective communication interface connectingthe human brain to a computer, the'Brain-Computer Interface' (BCI). One motivation of BCI research is to provide a new communication channel substituting normal motor output in patients with severe neuromuscular disabilities. In the last decade, various neurophysiological corticalprocesses, such as slow potential shifts, movement related potentials (MRPs) or event-related desynchronization (ERD) of spontaneous EEG rhythms, were shown to be suitable for BCI, and, consequently, differentindependent approaches of extracting BCI-relevant EEGfeatures for single-trial analysis are under investigation. Here, we present and systematically compare several concepts for combining such EEGfeatures to improve the single-trial classification. Feature combinations areevaluated on movement imagination experiments with 3 subjects where EEGfeatures are based on either MRPs or ERD, or both. Those combination methods that incorporate the assumption that the single EEG-featuresare physiologically mutually independent outperform the plain method of'adding' evidence where the single-feature vectors are simply concatenated. These results strengthen the hypothesis that MRP and ERD reflect at least partially independent aspects of cortical processes and open a new perspective to boost BCI effectiveness.
Feature Selection in Mixture-Based Clustering
Law, Martin H., Jain, Anil K., Figueiredo, Mรกrio
There exist many approaches to clustering, but the important issue of feature selection, i.e., selecting the data attributes that are relevant for clustering, is rarely addressed. Feature selection for clustering is difficult due to the absence of class labels. We propose two approaches to feature selection in the context of Gaussian mixture-based clustering. In the first one, instead of making hard selections, we estimate feature saliencies. An expectation-maximization (EM) algorithm is derived for this task. The second approach extends Koller and Sahami's mutual-informationbased featurerelevance criterion to the unsupervised case. Feature selection is then carried out by a backward search scheme. This scheme can be classified as a "wrapper", since it wraps mixture estimation in an outer layer that performs feature selection. Experimental results on synthetic and real data show that both methods have promising performance.
The Effect of Singularities in a Learning Machine when the True Parameters Do Not Lie on such Singularities
Watanabe, Sumio, Amari, Shun-ichi
A lot of learning machines with hidden variables used in information sciencehave singularities in their parameter spaces. At singularities, the Fisher information matrix becomes degenerate, resulting that the learning theory of regular statistical models does not hold. Recently, it was proven that, if the true parameter is contained in singularities, then the coefficient of the Bayes generalization erroris equal to the pole of the zeta function of the Kullback information.
Convergent Combinations of Reinforcement Learning with Linear Function Approximation
Schoknecht, Ralf, Merke, Artur
Convergence for iterative reinforcement learning algorithms like TD(O) depends on the sampling strategy for the transitions. However, inpractical applications it is convenient to take transition data from arbitrary sources without losing convergence. In this paper we investigate the problem of repeated synchronous updates based on a fixed set of transitions. This allows to analyse if a certain reinforcement learning algorithm and a certain functionapproximator are compatible. For the combination of the residual gradient algorithm with grid-based linear interpolation we show that there exists a universal constant learning rate such that the iteration converges independently of the concrete transition data. 1 Introduction The strongest convergence guarantees for reinforcement learning (RL) algorithms are available for the tabular case, where temporal difference algorithms for both policy evaluation and the general control problem converge with probability one independently of the concrete sampling strategy as long as all states are sampled infinitely often and the learning rate is decreased appropriately [2].
Extracting Relevant Structures with Side Information
The problem of extracting the relevant aspects of data, in face of multiple conflicting structures, is inherent to modeling of complex data. Extracting structurein one random variable that is relevant for another variable has been principally addressed recently via the information bottleneck method [15]. However, such auxiliary variables often contain more information thanis actually required due to structures that are irrelevant for the task. In many other cases it is in fact easier to specify what is irrelevant than what is, for the task at hand. Identifying the relevant structures, however, can thus be considerably improved by also minimizing theinformation about another, irrelevant, variable. In this paper we give a general formulation of this problem and derive its formal, as well as algorithmic, solution. Its operation is demonstrated in a synthetic example and in two real world problems in the context of text categorization andface images. While the original information bottleneck problem is related to rate distortion theory, with the distortion measure replaced by the relevant information, extracting relevant features while removing irrelevant ones is related to rate distortion with side information.
A Probabilistic Approach to Single Channel Blind Signal Separation
We present a new technique for achieving source separation when given only a single channel recording. The main idea is based on exploiting the inherent time structure of sound sources by learning a priori sets of basis filters in time domain that encode the sources in a statistically efficient manner. We derive a learning algorithm using a maximum likelihood approach given the observed single channel data and sets of basis filters.
Knowledge-Based Support Vector Machine Classifiers
Fung, Glenn M., Mangasarian, Olvi L., Shavlik, Jude W.
Prior knowledge in the form of multiple polyhedral sets, each belonging toone of two categories, is introduced into a reformulation of a linear support vector machine classifier. The resulting formulation leadsto a linear program that can be solved efficiently. Real world examples, from DNA sequencing and breast cancer prognosis, demonstrate the effectiveness of the proposed method. Numerical results show improvement in test set accuracy after the incorporation ofprior knowledge into ordinary, data-based linear support vector machine classifiers. One experiment also shows that a linear classifier,based solely on prior knowledge, far outperforms the direct application of prior knowledge rules to classify data.