AITopics

We give results about the learnability and required complexity of logical formulae to solve classification problems. These results are obtained by linking propositional logic with kernel machines. In particular we show that decision trees and disjunctive normal forms (DNF) can be represented by the help of a special kernel, linking regularized risk to separation margin. Subsequently we derive a number of lower bounds on the required complexity of logic formulae using properties of algorithms for generation of linear estimators, such as perceptron and maximal perceptron learning.

algorithm, decision tree, perceptron, (14 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.73)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.61)

Khardon, Roni, Roth, Dan, Servedio, Rocco A.

Efficiency versus Convergence of Boolean Kernels for On-Line Learning Algorithms

We study online learning in Boolean domains using kernels which capture feature expansions equivalent to using conjunctions over basic features. We demonstrate a tradeoff between the computational efficiency with which these kernels can be computed and the generalization ability of the resulting classifier. We first describe several kernel functions which capture either limited forms of conjunctions or all conjunctions. We show that these kernels can be used to efficiently run the Perceptron algorithm over an exponential number of conjunctions; however we also prove that using such kernels the Perceptron algorithm can make an exponential number of mistakes even when learning simple functions. We also consider an analogous use of kernel functions to run the multiplicative-update Winnow algorithm over an expanded feature space of exponentially many conjunctions. While known upper bounds imply that Winnow can learn DNF formulae with a polynomial mistake bound in this setting, we prove that it is computationally hard to simulate Winnow's behavior for learning DNF over such a feature set, and thus that such kernel functions for Winnow are not efficiently computable.

algorithm, conjunction, perceptron algorithm, (15 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Medford (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)

Genre: Instructional Material > Online (0.40)

Industry: Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Kernel Methods (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.72)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.70)

Amari, Shun-ichi, Park, Hyeyoung, Ozeki, Tomoko

Geometrical Singularities in the Neuromanifold of Multilayer Perceptrons

Singularities are ubiquitous in the parameter space of hierarchical models such as multilayer perceptrons. At singularities, the Fisher information matrix degenerates, and the Cramer-Rao paradigm does no more hold, implying that the classical model selection theory such as AIC and MDL cannot be applied. It is important to study the relation between the generalization error and the training error at singularities. The present paper demonstrates a method of analyzing these errors both for the maximum likelihood estimator and the Bayesian predictive distribution in terms of Gaussian random fields, by using simple models. 1 Introduction A neural network is specified by a number of parameters which are synaptic weights and biases. Learning takes place by modifying these parameters from observed input-output examples.

estimator, predictive distribution, singularity, (13 more...)

Country: Asia > Japan > Honshū > Kantō > Saitama Prefecture > Saitama (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.36)

Collins, Michael, Duffy, Nigel

Convolution Kernels for Natural Language

We describe the application of kernel methods to Natural Language Processing (NLP)problems. In many NLP tasks the objects being modeled are strings, trees, graphs or other discrete structures which require some mechanism to convert them into feature vectors. We describe kernels for various natural language structures, allowing rich, high dimensional representations ofthese structures. We show how a kernel over trees can be applied to parsing using the voted perceptron algorithm, and we give experimental results on the ATIS corpus of parse trees.

algorithm, convolution kernel, kernel, (12 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.05)
North America > United States > New Jersey (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Amari, Shun-ichi, Park, Hyeyoung, Ozeki, Tomoko

Geometrical Singularities in the Neuromanifold of Multilayer Perceptrons

Singularities are ubiquitous in the parameter space of hierarchical models such as multilayer perceptrons. At singularities, the Fisher information matrix degenerates, and the Cramer-Rao paradigm does no more hold, implying that the classical model selection theory suchas AIC and MDL cannot be applied. It is important to study the relation between the generalization error and the training error at singularities. The present paper demonstrates a method of analyzing these errors both for the maximum likelihood estimator andthe Bayesian predictive distribution in terms of Gaussian random fields, by using simple models. 1 Introduction A neural network is specified by a number of parameters which are synaptic weights and biases. Learning takes place by modifying these parameters from observed input-output examples.

artificial intelligence, machine learning, predictive distribution, (15 more...)

Country: Asia > Japan (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.36)

Warmuth, Manfred K., Rätsch, Gunnar, Mathieson, Michael, Liao, Jun, Lemmen, Christian

Active Learning in the Drug Discovery Process

We investigate the following data mining problem from Computational Chemistry: From a large data set of compounds, find those that bind to a target molecule in as few iterations of biological testing as possible. In each iteration a comparatively small batch of compounds is screened for binding to the target. We apply active learning techniques for selecting the successive batches. One selection strategy picks unlabeled examples closest to the maximum margin hyperplane. Another produces many weight vectors by running perceptrons over multiple permutations of the data.

artificial intelligence, fraction, machine learning, (18 more...)

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.30)

Kowalczyk, Adam, Smola, Alex J., Williamson, Robert C.

Kernel Machines and Boolean Functions

We give results about the learnability and required complexity of logical formulae to solve classification problems. These results are obtained by linking propositional logic with kernel machines. In particular we show that decision trees and disjunctive normal forms (DNF) can be represented bythe help of a special kernel, linking regularized risk to separation margin. Subsequently we derive a number of lower bounds on the required complexity of logic formulae using properties of algorithms for generation of linear estimators, such as perceptron and maximal perceptron learning.

artificial intelligence, logic & formal reasoning, machine learning, (17 more...)

Country: North America > United States > Massachusetts > Middlesex County (0.15)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.73)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.62)

Khardon, Roni, Roth, Dan, Servedio, Rocco A.

Efficiency versus Convergence of Boolean Kernels for On-Line Learning Algorithms

We study online learning in Boolean domains using kernels which capture featureexpansions equivalent to using conjunctions over basic features. Wedemonstrate a tradeoff between the computational efficiency with which these kernels can be computed and the generalization ability ofthe resulting classifier. We first describe several kernel functions which capture either limited forms of conjunctions or all conjunctions. We show that these kernels can be used to efficiently run the Perceptron algorithmover an exponential number of conjunctions; however we also prove that using such kernels the Perceptron algorithm can make an exponential number of mistakes even when learning simple functions. Wealso consider an analogous use of kernel functions to run the multiplicative-update Winnow algorithm over an expanded feature space of exponentially many conjunctions. While known upper bounds imply that Winnow can learn DNF formulae with a polynomial mistake bound in this setting, we prove that it is computationally hard to simulate Winnow's behaviorfor learning DNF over such a feature set, and thus that such kernel functions for Winnow are not efficiently computable.

artificial intelligence, conjunction, machine learning, (18 more...)

Country: North America > United States > Illinois (0.28)

Genre: Instructional Material > Online (0.40)

Industry: Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Kernel Methods (0.75)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.52)

Neural Information Processing SystemsDec-31-2001

Regularized Winnow Methods

Zhang, Tong

In theory, the Winnow multiplicative update has certain advantages over the Perceptron additive update when there are many irrelevant attributes. Recently, there has been much effort on enhancing the Perceptron algorithm by using regularization, leading to a class of linear classification methods called support vector machines. Similarly, it is also possible to apply the regularization idea to the Winnow algorithm, which gives methods we call regularized Winnows. We show that the resulting methods compare with the basic Winnows in a similar way that a support vector machine compares with the Perceptron. We investigate algorithmic issues and learning properties of the derived methods. Some experimental results will also be provided to illustrate different methods. 1 Introduction In this paper, we consider the binary classification problem that is to determine a label y E {-1, 1} associated with an input vector x. A useful method for solving this problem is through linear discriminant functions, which consist of linear combinations of the components of the input variable.

algorithm, hyperplane, winnow, (17 more...)