Goto

Collaborating Authors

 Genre


Statistical Analysis of Semi-Supervised Regression

Neural Information Processing Systems

Semi-supervised methods use unlabeled data in addition to labeled data to construct predictors.While existing semi-supervised methods have shown some promising empirical performance, their development has been based largely based on heuristics. In this paper we study semi-supervised learning from the viewpoint of minimax theory. Our first result shows that some common methods based on regularization using graph Laplacians do not lead to faster minimax rates of convergence. Thus,the estimators that use the unlabeled data do not have smaller risk than the estimators that use only labeled data. We then develop several new approaches that provably lead to improved performance. The statistical tools of minimax analysis are thus used to offer some new perspective on the problem of semi-supervised learning.


A New View of Automatic Relevance Determination

Neural Information Processing Systems

Automatic relevance determination (ARD), and the closely-related sparse Bayesian learning (SBL) framework, are effective tools for pruning large numbers of irrelevant features. However, popular update rules used for this process are either prohibitively slow in practice and/or heuristic in nature without proven convergence properties. This paper furnishes an alternative means of optimizing a general ARD cost function using an auxiliary function that can naturally be solved using a series of re-weighted L1 problems. The result is an efficient algorithm that can be implemented using standard convex programming toolboxes and is guaranteed to converge to a stationary point unlike existing methods. The analysis also leads to additional insights into the behavior of previous ARD updates as well as the ARD cost function. For example, the standard fixed-point updates of MacKay (1992) are shown to be iteratively solving a particular min-max problem, although they are not guaranteed to lead to a stationary point. The analysis also reveals that ARD is exactly equivalent to performing MAP estimation using a particular feature- and noise-dependent \textit{non-factorial} weight prior with several desirable properties over conventional priors with respect to feature selection. In particular, it provides a tighter approximation to the L0 quasi-norm sparsity measure than the L1 norm. Overall these results suggests alternative cost functions and update procedures for selecting features and promoting sparse solutions.


COFI RANK - Maximum Margin Matrix Factorization for Collaborative Ranking

Neural Information Processing Systems

In this paper, we consider collaborative filtering as a ranking problem. We present a method which uses Maximum Margin Matrix Factorization and optimizes ranking insteadof rating. We employ structured output prediction to optimize directly for ranking scores. Experimental results show that our method gives very good ranking scores and scales well on collaborative filtering tasks.



An Analysis of Inference with the Universum

Neural Information Processing Systems

These assumptions are usually encoded in the regulariser or the prior. A generic learning algorithm usually makes rather weak assumptions about the regularities underlying the data.


Ensemble Clustering using Semidefinite Programming

Neural Information Processing Systems

We consider the ensemble clustering problem where the task is to'aggregate' multiple clustering solutions into a single consolidated clustering that maximizes the shared information among given clustering solutions. We obtain several new results for this problem. First, we note that the notion of agreement under such circumstances can be better captured using an agreement measure based on a 2D string encoding rather than voting strategy based methods proposed in literature. Using this generalization, we first derive a nonlinear optimization model to maximize thenew agreement measure. We then show that our optimization problem can be transformed into a strict 0-1 Semidefinite Program (SDP) via novel convexification techniqueswhich can subsequently be relaxed to a polynomial time solvable SDP. Our experiments indicate improvements not only in terms of the proposed agreement measure but also the existing agreement measures based on voting strategies. We discuss evaluations on clustering and image segmentation databases.




On Ranking in Survival Analysis: Bounds on the Concordance Index

Neural Information Processing Systems

In this paper, we show that classical survival analysis involving censored data can naturally be cast as a ranking problem. The concordance index (CI), which quantifies the quality of rankings, is the standard performance measure for model \emph{assessment} in survival analysis. In contrast, the standard approach to \emph{learning} the popular proportional hazard (PH) model is based on Cox's partial likelihood. In this paper we devise two bounds on CI--one of which emerges directly from the properties of PH models--and optimize them \emph{directly}. Our experimental results suggest that both methods perform about equally well, with our new approach giving slightly better results than the Cox's method. We also explain why a method designed to maximize the Cox's partial likelihood also ends up (approximately) maximizing the CI.


Congruence between model and human attention reveals unique signatures of critical visual events

Neural Information Processing Systems

Current computational models of bottom-up and top-down components of attention arepredictive of eye movements across a range of stimuli and of simple, fixed visual tasks (such as visual search for a target among distractors). However, todate there exists no computational framework which can reliably mimic human gaze behavior in more complex environments and tasks, such as driving a vehicle through traffic. Here, we develop a hybrid computational/behavioral framework, combining simple models for bottom-up salience and top-down relevance, andlooking for changes in the predictive power of these components at different critical event times during 4.7 hours (500,000 video frames) of observers playing car racing and flight combat video games. This approach is motivated by our observation that the predictive strengths of the salience and relevance models exhibitreliable temporal signatures during critical event windows in the task sequence--for example, when the game player directly engages an enemy plane in a flight combat game, the predictive strength of the salience model increases significantly, while that of the relevance model decreases significantly. Our new framework combines these temporal signatures to implement several event detectors. Critically,we find that an event detector based on fused behavioral and stimulus information (in the form of the model's predictive strength) is much stronger than detectors based on behavioral information alone (eye position) or image information alone(model prediction maps). This approach to event detection, based on eye tracking combined with computational models applied to the visual input, may have useful applications as a less-invasive alternative to other event detection approaches based on neural signatures derived from EEG or fMRI recordings.