AITopics

We consider the ensemble clustering problem where the task is to'aggregate' multiple clustering solutions into a single consolidated clustering that maximizes the shared information among given clustering solutions. We obtain several new results for this problem. First, we note that the notion of agreement under such circumstances can be better captured using an agreement measure based on a 2D string encoding rather than voting strategy based methods proposed in literature. Using this generalization, we first derive a nonlinear optimization model to maximize thenew agreement measure. We then show that our optimization problem can be transformed into a strict 0-1 Semidefinite Program (SDP) via novel convexification techniqueswhich can subsequently be relaxed to a polynomial time solvable SDP. Our experiments indicate improvements not only in terms of the proposed agreement measure but also the existing agreement measures based on voting strategies. We discuss evaluations on clustering and image segmentation databases.

algorithm, artificial intelligence, machine learning, (19 more...)

Country: North America > United States (0.47)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)

Shashanka, Madhusudana, Raj, Bhiksha, Smaragdis, Paris

Sparse Overcomplete Latent Variable Decomposition of Counts Data

An important problem in many fields is the analysis of counts data to extract meaningful latent components. Methods like Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA) have been proposed for this purpose. However, they are limited in the number of components they can extract and also do not have a provision to control the expressiveness" of the extracted components. In this paper, we present a learning formulation to address these limitations by employing the notion of sparsity. We start with the PLSA framework and use an entropic prior in a maximum a posteriori formulation to enforce sparsity. We show that this allows the extraction of overcomplete sets of latent components which better characterize the data. We present experimental evidence of the utility of such representations."

basis distribution, machine learning, natural language, (16 more...)

Country: North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.34)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.34)

Shamir, Ohad, Tishby, Naftali

Cluster Stability for Finite Samples

Over the past few years, the notion of stability in data clustering has received growing attention as a cluster validation criterion in a sample-based framework. However, recent work has shown that as the sample size increases, any clustering model will usually become asymptotically stable. This led to the conclusion that stability is lacking as a theoretical and practical tool. The discrepancy between this conclusion and the success of stability in practice has remained an open question, whichwe attempt to address. Our theoretical approach is that stability, as used by cluster validation algorithms, is similar in certain respects to measures of generalization in a model-selection framework. In such cases, the model chosen governsthe convergence rate of generalization bounds. By arguing that these rates are more important than the sample size, we are led to the prediction that stability-based cluster validation algorithms should not degrade with increasing sample size, despite the asymptotic universal stability. This prediction is substantiated bya theoretical analysis as well as some empirical results. We conclude that stability remains a meaningful cluster validation criterion over finite samples.

artificial intelligence, machine learning, stability, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.91)

Sanborn, Adam, Griffiths, Thomas L.

Markov Chain Monte Carlo with People

Many formal models of cognition implicitly use subjective probability distributions to capture the assumptions of human learners.

artificial intelligence, machine learning, probability, (17 more...)

Country: North America > United States > California (0.28)

Genre: Research Report > Experimental Study (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

Mnih, Andriy, Salakhutdinov, Ruslan R.

Probabilistic Matrix Factorization

The resulting model is able to generalize considerably better for users with very few ratings.

artificial intelligence, machine learning, pmf model, (14 more...)

Country:

North America > United States (0.46)
North America > Canada > Ontario > Toronto (0.15)

Genre: Research Report > New Finding (0.68)

Industry:

Media > Film (0.34)
Leisure & Entertainment (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.99)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)

Ross, Michael, Cohen, Andrew

GRIFT: A graphical model for inferring visual classification features from human data

This paper describes a new model for human visual classification that enables the recovery of image features that explain human subjects' performance on different visual classification tasks. Unlike previous methods, this algorithm does not model their performance with a single linear classifier operating on raw image pixels. Instead, it models classification as the combination of multiple feature detectors. This approach extracts more information about human visual classification than has been previously possible with other methods and provides a foundation for further exploration.

artificial intelligence, detector, machine learning, (17 more...)

Country: North America > United States > Massachusetts (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.91)

Steck, Harald, Krishnapuram, Balaji, Dehing-oberije, Cary, Lambin, Philippe, Raykar, Vikas C.

On Ranking in Survival Analysis: Bounds on the Concordance Index

In this paper, we show that classical survival analysis involving censored data can naturally be cast as a ranking problem. The concordance index (CI), which quantifies the quality of rankings, is the standard performance measure for model \emph{assessment} in survival analysis. In contrast, the standard approach to \emph{learning} the popular proportional hazard (PH) model is based on Cox's partial likelihood. In this paper we devise two bounds on CI--one of which emerges directly from the properties of PH models--and optimize them \emph{directly}. Our experimental results suggest that both methods perform about equally well, with our new approach giving slightly better results than the Cox's method. We also explain why a method designed to maximize the Cox's partial likelihood also ends up (approximately) maximizing the CI.

artificial intelligence, concordance index, machine learning, (15 more...)

Country: Europe > Netherlands (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Rahimi, Ali, Recht, Benjamin

Random Features for Large-Scale Kernel Machines

To accelerate the training of kernel machines, we propose to map the input data to a randomized low-dimensional feature space and then apply existing fast linear methods. The features are designed so that the inner products of the transformed data are approximately equal to those in the feature space of a user specified shiftinvariant kernel.We explore two sets of random features, provide convergence bounds on their ability to approximate various radial basis kernels, and show that in large-scale classification and regression tasks linear machine learning algorithms appliedto these features outperform state-of-the-art large-scale kernel machines.

artificial intelligence, kernel, machine learning, (16 more...)

Country: North America > United States (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.48)

Kiciman, Emre, Maltz, David, Platt, John C.

Fast Variational Inference for Large-scale Internet Diagnosis

Web servers on the Internet need to maintain high reliability, but the cause of intermittent failures of web transactions is non-obvious. We use Bayesian inference to diagnose problems with web services. This diagnosis problem is far larger than any previously attempted: it requires inference of 10^4 possible faults from 10^5 observations. Further, such inference must be performed in less than a second. Inference can be done at this speed by combining a variational approximation, a mean-field approximation, and the use of stochastic gradient descent to optimize a variational cost function. We use this fast inference to diagnose a time series of anomalous HTTP requests taken from a real web service. The inference is fast enough to analyze network logs with billions of entries in a matter of hours.

artificial intelligence, inference, machine learning, (14 more...)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.62)

Pelckmans, Kristiaan, Suykens, Johan, Moor, Bart D.

A Risk Minimization Principle for a Class of Parzen Estimators

This paper explores the use of a Maximal Average Margin (MAM) optimality principle for the design of learning algorithms. It is shown that the application of this risk minimization principle results in a class of (computationally) simple learning machines similar to the classical Parzen window classifier. A direct relation with the Rademacher complexities is established, as such facilitating analysis and providing a notion of certainty of prediction. This analysis is related to Support Vector Machines by means of a margin transformation. The power of the MAM principle is illustrated further by application to ordinal regression tasks, resulting in an $O(n)$ algorithm able to process large datasets in reasonable time.

artificial intelligence, machine learning, prediction, (13 more...)

Country:

North America (0.28)
Europe > Belgium (0.15)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.55)