AITopics

In this work, we study an information filtering model where the relevance labels associated to a sequence of feature vectors are realizations of an unknown probabilistic linear function. Building on the analysis of a restricted version of our model, we derive a general filtering rule based on the margin of a ridge regression estimator. While our rule may observe the label of a vector only by classfying the vector as relevant, experiments on a real-world document filtering problem show that the performance of our rule is close to that of the online classifier which is allowed to observe all labels. These empirical results are complemented by a theoretical analysis where we consider a randomized variant of our rule and prove that its expected number of mistakes is never much larger than that of the optimal filtering rule which knows the hidden linear model.

algorithm, fil, probability, (14 more...)

Country:

Europe > Italy (0.05)
North America > United States > New York (0.04)

Industry: Education > Educational Setting > Online (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.51)

Effective Dimension and Generalization of Kernel Learning

Zhang, Tong

We investigate the generalization performance of some learning problems in Hilbert function Spaces. We introduce a concept of scalesensitive effective data dimension, and show that it characterizes the convergence rate of the underlying learning problem. Using this concept, we can naturally extend results for parametric estimation problems in finite dimensional spaces to nonparametric kernel learning methods. We derive upper bounds on the generalization performance and show that the resulting convergent rates are optimal under various circumstances.

dimension, hilbert space, inequality, (15 more...)

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry: Education > Focused Education > Special Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Wiegerinck, Wim, Heskes, Tom

Fractional Belief Propagation

We consider loopy belief propagation for approximate inference in probabilistic graphical models. A limitation of the standard algorithm is that clique marginals are computed as if there were no loops in the graph. To overcome this limitation, we introduce fractional belief propagation. Fractional belief propagation is formulated in terms of a family of approximate free energies, which includes the Bethe free energy and the naive mean-field free as special cases. Using the linear response correction of the clique marginals, the scale parameters can be tuned. Simulation results illustrate the potential merits of the approach.

belief propagation, free energy, propagation, (15 more...)

Country:

Europe > Netherlands > Gelderland > Nijmegen (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Mateo County > Redwood City (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.89)

Bousquet, Olivier, Herrmann, Daniel

On the Complexity of Learning the Kernel Matrix

We investigate data based procedures for selecting the kernel when learning with Support Vector Machines. We provide generalization error bounds by estimating the Rademacher complexities of the corresponding function classes. In particular we obtain a complexity bound for function classes induced by kernels with given eigenvectors, i.e., we allow to vary the spectrum and keep the eigenvectors fix. This bound is only a logarithmic factor bigger than the complexity of the function class induced by a single kernel. However, optimizing the margin over such classes leads to overfitting. We thus propose a suitable way of constraining the class. We use an efficient algorithm to solve the resulting optimization problem, present preliminary experimental results, and compare them to an alignment-based approach.

complexity, kernel, kernel matrix, (15 more...)

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States > New York (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.55)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.49)

Scaling of Probability-Based Optimization Algorithms

Shapiro, J. L.

Population-based Incremental Learning is shown require very sensitive scaling of its learning rate. The learning rate must scale with the system size in a problem-dependent way. This is shown in two problems: the needle-in-a haystack, in which the learning rate must vanish exponentially in the system size, and in a smooth function in which the learning rate must vanish like the square root of the system size. Two methods are proposed for removing this sensitivity. A learning dynamics which obeys detailed balance is shown to give consistent performance over the entire range of learning rates. An analog of mutation is shown to require a learning rate which scales as the inverse system size, but is problem independent.

algorithm, pbil, probability, (14 more...)

Country:

North America > United States > Illinois > Champaign County > Champaign (0.04)
Europe > United Kingdom (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.64)
Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.52)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.47)

Scott, Clayton, Nowak, Robert

Dyadic Classification Trees via Structural Risk Minimization

Classification trees are one of the most popular types of classifiers, with ease of implementation and interpretation being among their attractive features. Despite the widespread use of classification trees, theoretical analysis of their performance is scarce. In this paper, we show that a new family of classification trees, called dyadic classification trees (DCTs), are near optimal (in a minimax sense) for a very broad range of classification problems. This demonstrates that other schemes (e.g., neural networks, support vector machines) cannot perform significantly better than DCTs in many cases. We also show that this near optimal performance is attained with linear (in the number of training data) complexity growing and pruning algorithms. Moreover, the performance of DCTs on benchmark datasets compares favorably to that of standard CART, which is generally more computationally intensive and which does not possess similar near optimality properties. Our analysis stems from theoretical results on structural risk minimization, on which the pruning rule for DCTs is based.

classification tree, classifier, dyadic classification tree, (13 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York (0.05)
North America > United States > Wisconsin (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.31)

Malzahn, Dörthe, Opper, Manfred

A Statistical Mechanics Approach to Approximate Analytical Bootstrap Averages

We apply the replica method of Statistical Physics combined with a variational method to the approximate analytical computation of bootstrap averages for estimating the generalization error. We demonstrate our approach on regression with Gaussian processes and compare our results with averages obtained by Monte-Carlo sampling.

approximation, bootstrap sample, generalization error, (14 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Denmark > Capital Region > Kongens Lyngby (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Sahani, Maneesh, Linden, Jennifer F.

Evidence Optimization Techniques for Estimating Stimulus-Response Functions

An essential step in understanding the function of sensory nervous systems is to characterize as accurately as possible the stimulus-response function (SRF) of the neurons that relay and process sensory information. One increasingly common experimental approach is to present a rapidly varying complex stimulus to the animal while recording the responses of one or more neurons, and then to directly estimate a functional transformation of the input that accounts for the neuronal firing. The estimation techniques usually employed, such as Wiener filtering or other correlation-based estimation of the Wiener or Volterra kernels, are equivalent to maximum likelihood estimation in a Gaussian-output-noise regression model. We explore the use of Bayesian evidence-optimization techniques to condition these estimates. We show that by learning hyperparameters that control the smoothness and sparsity of the transfer function it is possible to improve dramatically the quality of SRF estimates, as measured by their success in predicting responses to novel input.

predictive power, signal power, stimulus, (14 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York (0.04)
North America > United States > California > Los Angeles County > Pasadena (0.04)
Europe > United Kingdom (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.85)

An Estimation-Theoretic Framework for the Presentation of Multiple Stimuli

Eurich, Christian W.

A framework is introduced for assessing the encoding accuracy and the discriminational ability of a population of neurons upon simultaneous presentation of multiple stimuli. Minimal square estimation errors are obtained from a Fisher information analysis in an abstract compound space comprising the features of all stimuli. Even for the simplest case of linear superposition of responses and Gaussian tuning, the symmetries in the compound space are very different from those in the case of a single stimulus. The analysis allows for a quantitative description of attentional effects and can be extended to include neural nonlinearities such as nonclassical receptive fields.

compound space, présentation, stimuli, (16 more...)

Country: Europe > Germany > Bremen > Bremen (0.28)

Industry: Health & Medicine (0.71)

Technology: Information Technology > Artificial Intelligence (1.00)

Hoyer, Patrik O., Hyvärinen, Aapo

Interpreting Neural Response Variability as Monte Carlo Sampling of the Posterior

The responses of cortical sensory neurons are notoriously variable, with the number of spikes evoked by identical stimuli varying significantly from trial to trial. This variability is most often interpreted as'noise', purely detrimental to the sensory system. In this paper, we propose an alternative view in which the variability is related to the uncertainty, about world parameters, which is inherent in the sensory stimulus. Specifically, the responses of a population of neurons are interpreted as stochastic samples from the posterior distribution in a latent variable model. In addition to giving theoretical arguments supporting such a representational scheme, we provide simulations suggesting how some aspects of response variability might be understood in this framework.

neuron, posterior distribution, variability, (13 more...)

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(2 more...)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)