AITopics

Classification is achieved by a linear or nonlinear separating surface in the input space of the dataset. In this work we propose a very fast simple algorithm, based on an active set strategy for solving quadratic programs with bounds [18]. The algorithm is capable of accurately solving problems with millions of points and requires nothing more complicated than a commonly available linear equation solver [17, 1, 6] for a typically small (100) dimensional input space of the problem. Key to our approach are the following two changes to the standard linear SVM: 1. Maximize the margin (distance) between the parallel separating planes with respect to both orientation (w) as well as location relative to the origin b).

algorithm, matrix, support vector machine, (13 more...)

Country:

North America > United States > Wisconsin > Dane County > Madison (0.28)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
(8 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Lu, Wei, Rajapakse, Jagath C.

Constrained Independent Component Analysis

The paper presents a novel technique of constrained independent component analysis (CICA) to introduce constraints into the classical ICA and solve the constrained optimization problem by using Lagrange multiplier methods. This paper shows that CICA can be used to order the resulted independent components in a specific manner and normalize the demixing matrix in the signal separation procedure. It can systematically eliminate the ICA's indeterminacy on permutation and dilation. The experiments demonstrate the use of CICA in ordering of independent components while providing normalized demixing processes. Keywords: Independent component analysis, constrained independent component analysis, constrained optimization, Lagrange multiplier methods 1 Introduction Independent component analysis (ICA) is a technique to transform a multivariate random signal into a signal with components that are mutually independent in complete statistical sense [1].

constraint, independent component, lagrange multiplier method, (13 more...)

Country:

North America > United States > New York (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.77)

Lodhi, Huma, Shawe-Taylor, John, Cristianini, Nello, Watkins, Christopher J. C. H.

Text Classification using String Kernels

We introduce a novel kernel for comparing two text documents. The kernel is an inner product in the feature space consisting of all subsequences of length k. A subsequence is any ordered sequence of k characters occurring in the text though not necessarily contiguously. The subsequences are weighted by an exponentially decaying factor of their full length in the text, hence emphasising those occurrences which are close to contiguous. A direct computation of this feature vector would involve a prohibitive amount of computation even for modest values of k, since the dimension of the feature space grows exponentially with k. The paper describes how despite this fact the inner product can be efficiently evaluated by a dynamic programming technique.

computation, feature space, kernel, (12 more...)

Country:

North America > United States > California (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Kjems, Ulrik, Hansen, Lars Kai, Strother, Stephen C.

Generalizable Singular Value Decomposition for Ill-posed Datasets

So which of the two variances is "correct"? From a modelling point of view, the variance from the test example tells us the true story, so the training set variance should be regarded as biased.

projection, singular value decomposition, variance, (13 more...)

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
Europe > Germany (0.04)
Europe > Denmark > Capital Region > Kongens Lyngby (0.04)
Asia > Middle East > Jordan (0.04)

Industry: Health & Medicine (0.96)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.31)

Højen-Sørensen, Pedro A. d. F. R., Winther, Ole, Hansen, Lars Kai

Ensemble Learning and Linear Response Theory for ICA

The naive mean-field approach fails in this case whereas linear response theory-which gives an improved estimate of covariances-is very efficient. The examples given are for sources without temporal correlations .

equation, noise level, temporal correlation, (10 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Reading (0.04)
Europe > Sweden > Skåne County > Lund (0.04)
Europe > Denmark > Capital Region > Kongens Lyngby (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.49)

Hochreiter, Sepp, Mozer, Michael C.

Beyond Maximum Likelihood and Density Estimation: A Sample-Based Criterion for Unsupervised Learning of Complex Models

Two well known classes of unsupervised procedures that can be cast in this manner are generative and recoding models. In a generative unsupervised framework, the environment generates training exampleswhich we will refer to as observations-by sampling from one distribution; the other distribution is embodied in the model. Examples of generative frameworks are mixtures of Gaussians (MoG) [2], factor analysis [4], and Boltzmann machines [8]. In the recoding unsupervised framework, the model transforms points from an obser- vation space to an output space, and the output distribution is compared either to a reference distribution or to a distribution derived from the output distribution. An example is independent component analysis (leA) [11], a method that discovers a representation of vector-valued observations in which the statistical dependence among the vector elements in the output space is minimized.

nonlinear model, particle, sample-based approach, (11 more...)

Country:

North America > United States > Colorado > Boulder County > Boulder (0.14)
North America > Canada > Ontario > Toronto (0.14)
Europe > France (0.05)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.41)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.41)

Herbrich, Ralf, Graepel, Thore

Large Scale Bayes Point Machines

Subsequently, SVMs have been modified to handle regression [12] and GPs have been adapted to the problem of classification [8]. Both schemes essentially work in the same function space that is characterised by kernels (SVM) and covariance functions (GP), respectively. While the formal similarity of the two methods is striking the underlying paradigms of inference are very different. The SVM was inspired by results from statistical/PAC learning theory while GPs are usually considered in a Bayesian framework. This ideological clash can be viewed as a continuation in machine learning of the by now classical disagreement between Bayesian and frequentistic statistics.

algorithm, classifier, generalisation error, (15 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)

Gray, Alexander G., Moore, Andrew W.

`N-Body' Problems in Statistical Learning

We present efficient algorithms for all-point-pairs problems, or'Nbody'-like problems, which are ubiquitous in statistical learning. We focus on six examples, including nearest-neighbor classification, kernel density estimation, outlier detection, and the two-point correlation.

algorithm, correlation, node, (15 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
North America > United States > California > San Francisco County > San Francisco (0.05)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

A New Approximate Maximal Margin Classification Algorithm

Gentile, Claudio

A new incremental learning algorithm is described which approximates the maximal margin hyperplane w.r.t.

algorithm, almap, correction, (13 more...)

Country:

North America > United States > District of Columbia > Washington (0.04)
Europe > Italy > Lombardy > Milan (0.04)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

Frey, Brendan J., Kannan, Anitha

Accumulator Networks: Suitors of Local Probability Propagation

One way to approximate inference in richly-connected graphical models is to apply the sum-product algorithm (a.k.a. The sum-product algorithm can be directly applied in Gaussian networks and in graphs for coding, but for many conditional probability functions - including the sigmoid function - direct application of the sum-product algorithm is not possible. We introduce "accumulator networks" that have low local complexity (but exponential global complexity) so the sum-product algorithm can be directly applied. In an accumulator network, the probability of a child given its parents is computed by accumulating the inputs from the parents in a Markov chain or more generally a tree. After giving expressions for inference and learning in accumulator networks, we give results on the "bars problem" and on the problem of extracting translated, overlapping faces from an image. 1 Introduction Graphical probability models with hidden variables are capable of representing complex dependencies between variables, filling in missing data and making Bayesoptimal decisions using probabilistic inferences (Hinton and Sejnowski 1986; Pearl 1988; Neal 1992).

accumulator network, inference, probability, (15 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Jordan (0.05)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.91)