AITopics

Using methods of Statistical Physics, we investigate the rOle of model complexity in learning with support vector machines (SVMs). We show the advantages of using SVMs with kernels of infinite complexity on noisy target rules, which, in contrast to common theoretical beliefs, are found to achieve optimal generalization error although the training error does not converge to the generalization error. Moreover, we find a universal asymptotics of the learning curves which only depend on the target rule but not on the SVM kernel. 1 Introduction Powerful systems for data inference, like neural networks implement complex inputoutput relations by learning from example data. The price one has to pay for the flexibility of these models is the need to choose the proper model complexity for a given task, i.e. the system architecture which gives good generalization ability for novel data. This has become an important problem also for support vector machines [1].

artificial intelligence, kernel, machine learning, (17 more...)

Country: Europe > Germany (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Heisele, Bernd, Serre, Thomas, Pontil, Massimiliano, Vetter, Thomas, Poggio, Tomaso

Categorization by Learning and Combining Object Parts

We describe an algorithm for automatically learning discriminative components of objects with SVM classifiers. It is based on growing image parts by minimizing theoretical bounds on the error probability of an SVM. Component-based face classifiers are then combined in a second stage to yield a hierarchical SVM classifier. Experimental results in face classification show considerable robustness against rotations in depth and suggest performance at significantly better level than other face detection systems. Novel aspects of our approach are: a) an algorithm to learn component-based classification experts and their combination, b) the use of 3-D morphable models for training, and c) a maximum operation on the output of each component classifier which may be relevant for biological models of visual recognition.

artificial intelligence, classifier, machine learning, (15 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.94)

Wenning, Gregor, Obermayer, Klaus

Activity Driven Adaptive Stochastic Resonance

Cortical neurons might be considered as threshold elements integrating in parallel many excitatory and inhibitory inputs. Due to the apparent variability of cortical spike trains this yields a strongly fluctuating membrane potential, such that threshold crossings are highly irregular. Here we study how a neuron could maximize its sensitivity w.r.t. a relatively small subset of excitatory input. Weak signals embedded in fluctuations is the natural realm of stochastic resonance. The neuron's response is described in a hazard-function approximation applied to an Ornstein-Uhlenbeck process.

artificial intelligence, conductance, neuron, (15 more...)

Industry: Energy > Oil & Gas (0.37)

Technology: Information Technology > Artificial Intelligence (0.70)

Generalizable Relational Binding from Coarse-coded Distributed Representations

O', Reilly, Randall C., Busby, R. S.

We present a model of binding of relationship information in a spatial domain (e.g., square above triangle) that uses low-order coarse-coded conjunctive representations instead of more popular temporal synchrony mechanisms. Supporters of temporal synchrony argue that conjunctive representations lack both efficiency (i.e., combinatorial numbers of units are required) and systematicity (i.e., the resulting representations are overly specific and thus do not support generalization to novel exemplars). To counter these claims, we show that our model: a) uses far fewer hidden units than the number of conjunctions represented, by using coarse-coded, distributed representations where each unit has a broad tuning curve through high-dimensional conjunction space, and b) is capable of considerable generalization to novel inputs.

neural network, neurology, representation, (19 more...)

Country: North America > United States > Colorado > Boulder County > Boulder (0.14)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.94)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)

Khardon, Roni, Roth, Dan, Servedio, Rocco A.

Efficiency versus Convergence of Boolean Kernels for On-Line Learning Algorithms

We study online learning in Boolean domains using kernels which capture feature expansions equivalent to using conjunctions over basic features. We demonstrate a tradeoff between the computational efficiency with which these kernels can be computed and the generalization ability of the resulting classifier. We first describe several kernel functions which capture either limited forms of conjunctions or all conjunctions. We show that these kernels can be used to efficiently run the Perceptron algorithm over an exponential number of conjunctions; however we also prove that using such kernels the Perceptron algorithm can make an exponential number of mistakes even when learning simple functions. We also consider an analogous use of kernel functions to run the multiplicative-update Winnow algorithm over an expanded feature space of exponentially many conjunctions. While known upper bounds imply that Winnow can learn DNF formulae with a polynomial mistake bound in this setting, we prove that it is computationally hard to simulate Winnow's behavior for learning DNF over such a feature set, and thus that such kernel functions for Winnow are not efficiently computable.

algorithm, educational setting, neural network, (20 more...)

Country:

North America > United States > Massachusetts > Middlesex County (0.14)
North America > United States > Illinois (0.14)

Genre: Instructional Material > Online (0.40)

Industry: Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Kernel Methods (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.72)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.70)

Discriminative Direction for Kernel Classifiers

Golland, Polina

In many scientific and engineering applications, detecting and understanding differences between two groups of examples can be reduced to a classical problem of training a classifier for labeling new examples while making as few mistakes as possible. In the traditional classification setting, the resulting classifier is rarely analyzed in terms of the properties of the input data captured by the discriminative model. However, such analysis is crucial if we want to understand and visualize the detected differences. We propose an approach to interpretation of the statistical model in the original feature space that allows us to argue about the model in terms of the relevant changes to the input vectors. For each point in the input space, we define a discriminative direction to be the direction that moves the point towards the other class while introducing as little irrelevant change as possible with respect to the classifier function. We derive the discriminative direction for kernel-based classifiers, demonstrate the technique on several examples and briefly discuss its use in the statistical shape analysis, an application that originally motivated this work.

artificial intelligence, discriminative direction, health & medicine, (18 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Industry: Health & Medicine (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Blankertz, Benjamin, Curio, Gabriel, Müller, Klaus-Robert

Classifying Single Trial EEG: Towards Brain Computer Interfacing

Driven by the progress in the field of single-trial analysis of EEG, there is a growing interest in brain computer interfaces (BCIs), i.e., systems that enable human subjects to control a computer only by means of their brain signals. In a pseudo-online simulation our BCI detects upcoming finger movements in a natural keyboard typing condition and predicts their laterality. This can be done on average 100-230 ms before the respective key is actually pressed, i.e., long before the onset of EMG. Our approach is appealing for its short response time and high classification accuracy ( 96%) in a binary decision where no human training is involved. We compare discriminative classifiers like Support Vector Machines (SVMs) and different variants of Fisher Discriminant that possess favorable regularization properties for dealing with high noise cases (inter-trial variablity).

classification, health & medicine, neurology, (16 more...)

Country: Europe > Germany (0.29)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.90)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.68)

Tsuda, Koji, Kawanabe, Motoaki, Rätsch, Gunnar, Sonnenburg, Sören, Müller, Klaus-Robert

A New Discriminative Kernel From Probabilistic Models

Recently, Jaakkola and Haussler proposed a method for constructing kernel functions from probabilistic models. Their so called "Fisher kernel" has been combined with discriminative classifiers such as SVM and applied successfully in e.g.

artificial intelligence, kernel, machine learning, (15 more...)

Country: Europe > Germany (0.29)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.69)

Beal, Matthew J., Ghahramani, Zoubin, Rasmussen, Carl E.

The Infinite Hidden Markov Model

We show that it is possible to extend hidden Markov models to have a countably infinite number of hidden states. By using the theory of Dirichlet processes we can implicitly integrate out the infinitely many transition parameters, leaving only three hyperparameters which can be learned from data. These three hyperparameters define a hierarchical Dirichlet process capable of capturing a rich set of transition dynamics. The three hyperparameters control the time scale of the dynamics, the sparsity of the underlying state-transition matrix, and the expected number of distinct hidden states in a finite sequence. In this framework it is also natural to allow the alphabet of emitted symbols to be infinite-- consider, for example, symbols being possible words appearing in English text.

artificial intelligence, machine learning, sequence, (19 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Gaussian Process Regression with Mismatched Models

Sollich, Peter

The behaviour is much richer than for the matched case, and could guide the choice of (student) priors in real-world applications of GP regression; RBF students, for example, run the risk of very slow logarithmic decay of the learning curve if the target (teacher) is less smooth than assumed. An important issue for future work-some of which is in progress-is to analyse to which extent hyperparameter tuning (e.g. via evidence maximization) can make GP learning robust against some forms of model mismatch, e.g. a misspecified functional form of the covariance function. One would like to know, for example, whether a data-dependent adjustment of the lengthscale of an RBF covariance function would be sufficient to avoid the logarithmically slow learning of rough target functions.

artificial intelligence, covariance function, machine learning, (14 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)