AITopics

We address in this paper the question of how the knowledge of the marginal distribution P (x) can be incorporated in a learning algorithm. We suggest three theoretical methods for taking into account this distribution for regularization and provide links to existing graph-based semi-supervised learning algorithms. We also propose practical implementations.

artificial intelligence, machine learning, regularizer, (17 more...)

Country: Europe > Germany (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Malzahn, Dörthe, Opper, Manfred

Approximate Analytical Bootstrap Averages for Support Vector Classifiers

We compute approximate analytical bootstrap averages for support vector classificationusing a combination of the replica method of statistical physics and the TAP approach for approximate inference. We test our method on a few datasets and compare it with exact averages obtained by extensive Monte-Carlo sampling.

artificial intelligence, machine learning, support vector classifier, (14 more...)

Country:

North America > United States (0.14)
Europe > Germany (0.14)
Europe > Denmark (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.72)

Still, Susanne, Bialek, William, Bottou, Léon

Geometric Clustering Using the Information Bottleneck Method

We argue that K-means and deterministic annealing algorithms for geometric clusteringcan be derived from the more general Information Bottleneck approach.If we cluster the identities of data points to preserve information about their location, the set of optimal solutions is massively degenerate. But if we treat the equations that define the optimal solution as an iterative algorithm, then a set of "smooth" initial conditions selects solutions with the desired geometrical properties. In addition to conceptual unification,we argue that this approach can be more efficient and robust than classic algorithms.

algorithm, artificial intelligence, machine learning, (17 more...)

Country: North America > United States (0.29)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)

Opper, Manfred, Winther, Ole

Variational Linear Response

A general linear response method for deriving improved estimates of correlations in the variational Bayes framework is presented.

approximation, artificial intelligence, machine learning, (16 more...)

Country: Europe > United Kingdom (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Donoho, David, Stodden, Victoria

When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts?

We interpret nonnegative matrix factorization geometrically, as the problem of finding a simplicial cone which contains a cloud of data points and which is contained in the positive orthant. We show that under certain conditions, basically requiring that some of the data are spread across the faces of the positive orthant, there is a unique such simplicial cone.We give examples of synthetic image articulation databases which obey these conditions; these require separated support and factorial sampling.For such databases there is a generative model in terms of'parts' and NMF correctly identifies the'parts'. We show that our theoretical results are predictive of the performance of published NMF code, by running the published algorithms on one of our synthetic image articulation databases.

artificial intelligence, machine learning, simplicial cone, (15 more...)

Country: North America > United States > California > Santa Clara County (0.15)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.72)

Vovk, Vladimir, Shafer, Glenn, Nouretdinov, Ilia

Self-calibrating Probability Forecasting

In the problem of probability forecasting the learner's goal is to output, given a training set and a new object, a suitable probability measure on the possible values of the new object's label. An online algorithm for probability forecasting is said to be well-calibrated if the probabilities it outputs agree with the observed frequencies. We give a natural nonasymptotic formalizationof the notion of well-calibratedness, which we then study under the assumption of randomness (the object/label pairs are independent and identically distributed). It turns out that, although no probability forecasting algorithm is automatically well-calibrated in our sense, there exists a wide class of algorithms for "multiprobability forecasting" (such algorithms are allowed to output a set, ideally very narrow, of probability measures) which satisfy this property; we call the algorithms in this class "Venn probability machines". Our experimental results demonstrate that a 1-Nearest Neighbor Venn probability machine performs reasonably well on a standard benchmark data set, and one of our theoretical results asserts that a simple Venn probability machine asymptotically approaches the true conditional probabilities regardless, and without knowledge, of the true probability measure generating the examples.

artificial intelligence, bayesian inference, machine learning, (18 more...)

Country: North America > United States > California (0.68)

Genre: Research Report (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)

Audibert, Jean-yves, Bousquet, Olivier

PAC-Bayesian Generic Chaining

There exist many different generalization error bounds for classification. Each of these bounds contains an improvement over the others for certain situations.Our goal is to combine these different improvements into a single bound. In particular we combine the PAC-Bayes approach introduced byMcAllester [1], which is interesting for averaging classifiers, with the optimal union bound provided by the generic chaining technique developed by Fernique and Talagrand [2]. This combination is quite natural sincethe generic chaining is based on the notion of majorizing measures, whichcan be considered as priors on the set of classifiers, and such priors also arise in the PACbayesian setting.

artificial intelligence, machine learning, union, (17 more...)

Country: Europe > Germany (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Scott, Clayton, Nowak, Robert

Near-Minimax Optimal Classification with Dyadic Classification Trees

The classifiers are based on dyadic classification trees (DCTs), which involve adaptively pruned partitions of the feature space. A key aspect of DCTs is their spatial adaptivity, which enables local (ratherthan global) fitting of the decision boundary. Our risk analysis involves a spatial decomposition of the usual concentration inequalities, leading to a spatially adaptive, data-dependent pruning criterion. For any distribution on (X, Y) whose Bayes decision boundary behaves locally like a Lipschitz smooth function, we show that the DCT error converges to the Bayes error at a rate within a logarithmic factor of the minimax optimal rate.

artificial intelligence, decision boundary, machine learning, (17 more...)

Country: North America > United States > Wisconsin (0.28)

Industry: Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Monteleoni, Claire, Jaakkola, Tommi S.

Online Learning of Non-stationary Sequences

We consider an online learning scenario in which the learner can make predictions on the basis of a fixed set of experts. We derive upper and lower relative loss bounds for a class of universal learning algorithms involving aswitching dynamics over the choice of the experts. On the basis of the performance bounds we provide the optimal a priori discretization forlearning the parameter that governs the switching dynamics. We demonstrate the new algorithm in the context of wireless networks.

algorithm, artificial intelligence, machine learning, (17 more...)

Country: North America > United States > New York (0.14)

Industry: Education > Educational Setting > Online (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Networks (0.89)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.61)

Sparseness of Support Vector Machines---Some Asymptotically Sharp Bounds

Steinwart, Ingo

The decision functions constructed by support vector machines (SVM's) usually depend only on a subset of the training set--the so-called support vectors. We derive asymptotically sharp lower and upper bounds on the number of support vectors for several standard types of SVM's. In particular, weshow for the Gaussian RBF kernel that the fraction of support vectors tends to twice the Bayes risk for the L1-SVM, to the probability of noise for the L2-SVM, and to 1 for the LS-SVM.

artificial intelligence, machine learning, theorem 3, (15 more...)

Country: North America > United States (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)