AITopics

We consider the problem of learning classifiers for structurally incomplete data, where some objects have a subset of features inherently absent due to complex relationships between the features. The common approach for handling missing features is to begin with a preprocessing phase that completes the missing features, and then use a standard classification procedure. In this paper we show how incomplete data can be classified directly without any completion of the missing features using a max-margin learning framework. We formulate this task using a geometrically-inspired objective function, and discuss two optimization approaches: The linearly separable case is written as a set of convex feasibility problems, and the non-separable case has a non-convex objective that we optimize iteratively. By avoiding the pre-processing phase in which the data is completed, these approaches offer considerable computational savings. More importantly, we show that by elegantly handling complex patterns of missing values, our approach is both competitive with other methods when the values are missing at random and outperforms them when the missing values have nontrivial structure. We demonstrate our results on two real-world problems: edge prediction in metabolic pathways, and automobile detection in natural images.

classification, classifier, enzyme, (16 more...)

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.34)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Transportation > Ground > Road (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Bickel, Steffen, Scheffer, Tobias

Dirichlet-Enhanced Spam Filtering based on Biased Samples

We study a setting that is motivated by the problem of filtering spam messages for many users. Each user receives messages according to an individual, unknown distribution, reflected only in the unlabeled inbox. The spam filter for a user is required to perform well with respect to this distribution. Labeled messages from publicly available sources can be utilized, but they are governed by a distinct distribution, not adequately representing most inboxes. We devise a method that minimizes a loss function with respect to a user's personal distribution based on the available biased sample. A nonparametric hierarchical Bayesian model furthermore generalizes across users by learning a common prior which is imposed on new email accounts. Empirically, we observe that bias-corrected learning outperforms naive reliance on the assumption of independent and identically distributed data; Dirichlet-enhanced generalization across users outperforms a single ("one size fits all") filter as well as independent filters for all users.

assumption, email, spam, (14 more...)

Country:

Europe > Germany > Saarland > Saarbrücken (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Security & Privacy > Spam Filtering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
(2 more...)

Ando, Rie K., Zhang, Tong

Learning on Graph with Laplacian Regularization

We consider a general form of transductive learning on graphs with Laplacian regularization, and derive margin-based generalization bounds using appropriate geometric properties of the graph. We use this analysis to obtain a better understanding of the role of normalization of the graph Laplacian matrix as well as the effect of dimension reduction. The results suggest a limitation of the standard degree-based normalization. We propose a remedy from our analysis and demonstrate empirically that the remedy leads to improved classification performance.

dimension reduction, graph, normalization, (14 more...)

Country:

North America > United States > Rhode Island (0.04)
North America > United States > New York > Richmond County > New York City (0.04)
North America > United States > New York > Queens County > New York City (0.04)
(4 more...)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.30)

Chen, Yuanhao, Zhu, Long, Yuille, Alan L.

Unsupervised Learning of a Probabilistic Grammar for Object Detection and Parsing

We describe an unsupervised method for learning a probabilistic grammar of an object from a set of training examples. Our approach is invariant to the scale and rotation of the objects. We illustrate our approach using thirteen objects from the Caltech 101 database. In addition, we learn the model of a hybrid object class where we do not know the specific object or its position, scale or pose. This is illustrated by learning a hybrid class consisting of faces, motorbikes, and airplanes. The individual objects can be recovered as different aspects of the grammar for the object class.

algorithm, grammar, triplet, (14 more...)

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.15)
North America > United States > New York (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.69)
(2 more...)

Zass, Ron, Shashua, Amnon

Nonnegative Sparse PCA

We describe a nonnegative variant of the "Sparse PCA" problem. The goal is to create a low dimensional representation from a collection of points which on the one hand maximizes the variance of the projected points and on the other uses only parts of the original coordinates, and thereby creating a sparse representation. What distinguishes our problem from other Sparse PCA formulations is that the projection involves only nonnegative weights of the original coordinates -- a desired quality in various fields, including economics, bioinformatics and computer vision. Adding nonnegativity contributes to sparseness, where it enforces a partitioning of the original coordinates among the new axes. We describe a simple yet efficient iterative coordinate-descent type of scheme which converges to a local optimum of our optimization criteria, giving good results on large real world datasets.

algorithm, principal vector, sparseness, (14 more...)

Country:

Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.05)
North America > United States (0.04)
Europe > Netherlands > South Holland > Dordrecht (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.30)

Shelton, Christian R., Huie, Wesley, Kan, Kin F.

Chained Boosting

We describe a method to learn to make sequential stopping decisions, such as those made along a processing pipeline. We envision a scenario in which a series of decisions must be made as to whether to continue to process. Further processing costs time and resources, but may add value. Our goal is to create, based on historic data, a series of decision rules (one at each stage in the pipeline) that decide, based on information gathered up to that point, whether to continue processing the part. We demonstrate how our framework encompasses problems from manufacturing to vision processing. We derive a quadratic (in the number of decisions) bound on testing performance and provide empirical results on object detection.

algorithm, classifier, resolution, (17 more...)

Country:

North America > United States > California > Riverside County > Riverside (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.96)

Cross-Validation Optimization for Large Scale Hierarchical Classification Kernel Methods

Seeger, Matthias

We propose a highly efficient framework for kernel multi-class models with a large and structured set of classes. Kernel parameters are learned automatically by maximizing the cross-validation log likelihood, and predictive probabilities are estimated. We demonstrate our approach on large scale text classification tasks with hierarchical class structure, achieving state-of-the-art results in an order of magnitude less time than previous work.

classification, hyperparameter, mvm, (15 more...)

Country: Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.61)

Rieck, Konrad, Laskov, Pavel, Sonnenburg, Sören

Computation of Similarity Measures for Sequential Data using Generalized Suffix Trees

We propose a generic algorithm for computation of similarity measures for sequential data. The algorithm uses generalized suffix trees for efficient calculation of various kernel, distance and non-metric similarity functions. Its worst-case run-time is linear in the length of sequences and independent of the underlying embedding language, which can cover words, k-grams or all contained subsequences. Experiments with network intrusion detection, DNA analysis and text processing applications demonstrate the utility of distances and similarity coefficients for sequences as alternatives to classical kernel functions.

algorithm, generalized suffix tree, similarity measure, (14 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Germany > Berlin (0.05)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
(3 more...)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.69)
Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Keerthi, S. S., Sindhwani, Vikas, Chapelle, Olivier

An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models

We consider the task of tuning hyperparameters in SVM models based on minimizing a smooth performance validation function, e.g., smoothed k-fold crossvalidation error, using nonlinear optimization techniques. The key computation in this approach is that of the gradient of the validation function with respect to hyperparameters. We show that for large-scale problems involving a wide choice of kernel-based models and validation functions, this computation can be very efficiently done; often within just a fraction of the training time. Empirical results show that a near-optimal set of hyperparameters can be identified by our approach with very few training rounds and gradient computations. .

computation, hyperparameter, validation function, (16 more...)

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Los Angeles County > Burbank (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.71)

Harel, Jonathan, Koch, Christof, Perona, Pietro

Graph-Based Visual Saliency

A new bottom-up visual saliency model, Graph-Based Visual Saliency (GBVS), is proposed. It consists of two steps: rst forming activation maps on certain feature channels, and then normalizing them in a way which highlights conspicuity and admits combination with other maps. The model is simple, and biologically plausible insofar as it is naturally parallelized. This model powerfully predicts human xations on 749 variations of 108 natural images, achieving 98% of the ROC area of a human-based control, whereas the classical algorithms of Itti & Koch ([2], [3], [4]) achieve only 84%.

algorithm, graph, saliency map, (15 more...)

Country: North America > United States > California > Los Angeles County > Pasadena (0.04)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)