Country
Class-size Independent Generalization Analsysis of Some Discriminative Multi-Category Classification
We consider the problem of deriving class-size independent generalization bounds for some regularized discriminative multi-category classification methods. In particular, we obtain an expected generalization bound for a standard formulation of multi-category support vector machines. Based on the theoretical result, we argue that the formulation over-penalizes misclassification error, which in theory may lead to poor generalization performance. A remedy, based on a generalization of multi-category logistic regression (conditional maximum entropy), is then proposed, and its theoretical properties are examined.
A Probabilistic Model for Online Document Clustering with Application to Novelty Detection
Zhang, Jian, Ghahramani, Zoubin, Yang, Yiming
In this paper we propose a probabilistic model for online document clustering. We use nonparametric Dirichlet process prior to model the growing number of clusters, and use a prior of general English language model as the base distribution to handle the generation of novel clusters. Furthermore, cluster uncertainty is modeled with a Bayesian Dirichletmultinomial distribution. We use empirical Bayes method to estimate hyperparameters based on a historical dataset. Our probabilistic model is applied to the novelty detection task in Topic Detection and Tracking (TDT) and compared with existing approaches in the literature.
Probabilistic Computation in Spiking Populations
Zemel, Richard S., Natarajan, Rama, Dayan, Peter, Huys, Quentin J.
As animals interact with their environments, they must constantly update estimates about their states. Bayesian models combine prior probabilities, a dynamical model and sensory evidence to update estimates optimally. These models are consistent with the results of many diverse psychophysical studies. However, little is known about the neural representation and manipulation of such Bayesian information, particularly in populations of spiking neurons. We consider this issue, suggesting a model based on standard neural architecture and activations. We illustrate the approach on a simple random walk example, and apply it to a sensorimotor integration task that provides a particularly compelling example of dynamic probabilistic computation. Bayesian models have been used to explain a gamut of experimental results in tasks which require estimates to be derived from multiple sensory cues.
Self-Tuning Spectral Clustering
Zelnik-manor, Lihi, Perona, Pietro
We study a number of open issues in spectral clustering: (i) Selecting the appropriate scale of analysis, (ii) Handling multi-scale data, (iii) Clustering with irregular background clutter, and, (iv) Finding automatically the number of groups. We first propose that a'local' scale should be used to compute the affinity between each pair of points. This local scaling leads to better clustering especially when the data includes multiple scales and when the clusters are placed within a cluttered background. We further suggest exploiting the structure of the eigenvectors to infer automatically the number of groups. This leads to a new algorithm in which the final randomly initialized k-means stage is eliminated.
The Rescorla-Wagner Algorithm and Maximum Likelihood Estimation of Causal Parameters
This paper analyzes generalization of the classic Rescorla-Wagner (R-W) learning algorithm and studies their relationship to Maximum Likelihood estimation of causal parameters. We prove that the parameters of two popular causal models, P and P C, can be learnt by the same generalized linear Rescorla-Wagner (GLRW) algorithm provided genericity conditions apply. We characterize the fixed points of these GLRW algorithms and calculate the fluctuations about them, assuming that the input is a set of i.i.d.
Two-Dimensional Linear Discriminant Analysis
Ye, Jieping, Janardan, Ravi, Li, Qi
Linear Discriminant Analysis (LDA) is a well-known scheme for feature extraction and dimension reduction. It has been used widely in many applications involving high-dimensional data, such as face recognition and image retrieval. An intrinsic limitation of classical LDA is the so-called singularity problem, that is, it fails when all scatter matrices are singular. A well-known approach to deal with the singularity problem is to apply an intermediate dimension reduction stage using Principal Component Analysis (PCA) before LDA. The algorithm, called PCA LDA, is used widely in face recognition. However, PCA LDA has high costs in time and space, due to the need for an eigen-decomposition involving the scatter matrices. In this paper, we propose a novel LDA algorithm, namely 2DLDA, which stands for 2-Dimensional Linear Discriminant Analysis.
Efficient Kernel Machines Using the Improved Fast Gauss Transform
Yang, Changjiang, Duraiswami, Ramani, Davis, Larry S.
Such a complexity is significant even for moderate size problems and is prohibitive for large datasets. We present an approximation technique based on the improved fast Gauss transform to reduce the computation to O(N). We also give an error bound for the approximation, and provide experimental results on the UCI datasets.
Solitaire: Man Versus Machine
Yan, Xiang, Diaconis, Persi, Rusmevichientong, Paat, Roy, Benjamin V.
In this paper, we use the rollout method for policy improvement to analyze a version of Klondike solitaire. This version, sometimes called thoughtful solitaire, has all cards revealed to the player, but then follows the usual Klondike rules. A strategy that we establish, using iterated rollouts, wins about twice as many games on average as an expert human player does.
Using Random Forests in the Structured Language Model
In this paper, we explore the use of Random Forests (RFs) in the structured language model (SLM), which uses rich syntactic information in predicting the next word based on words already seen. The goal in this work is to construct RFs by randomly growing Decision Trees (DTs) using syntactic information and investigate the performance of the SLM modeled by the RFs in automatic speech recognition. RFs, which were originally developed as classifiers, are a combination of decision tree classifiers. Each tree is grown based on random training data sampled independently and with the same distribution for all trees in the forest, and a random selection of possible questions at each node of the decision tree. Our approach extends the original idea of RFs to deal with the data sparseness problem encountered in language modeling. RFs have been studied in the context of n-gram language modeling and have been shown to generalize well to unseen data. We show in this paper that RFs using syntactic information can also achieve better performance in both perplexity (PPL) and word error rate (WER) in a large vocabulary speech recognition system, compared to a baseline that uses Kneser-Ney smoothing.