Not enough data to create a plot.
Try a different view from the menu above.
Country
Fast Algorithms for Large-State-Space HMMs with Applications to Web Usage Analysis
Felzenszwalb, Pedro F., Huttenlocher, Daniel P., Kleinberg, Jon M.
In applying Hidden Markov Models to the analysis of massive data streams, it is often necessary to use an artificially reduced set of states; this is due in large part to the fact that the basic HMM estimation algorithms have a quadratic dependence on the size of the state set. We present algorithms that reduce this computational bottleneck to linear or near-linear time, when the states can be embedded in an underlying grid of parameters. This type of state representation arises in many domains; in particular, we show an application to traffic analysis at a high-volume Web site.
On the Dynamics of Boosting
Rudin, Cynthia, Daubechies, Ingrid, Schapire, Robert E.
In order to understand AdaBoost's dynamics, especially its ability to maximize margins, we derive an associated simplified nonlinear iterated map and analyze its behavior in low-dimensional cases. We find stable cycles for these cases, which can explicitly be used to solve for Ada-Boost's output. By considering AdaBoost as a dynamical system, we are able to prove Rรคtsch and Warmuth's conjecture that AdaBoost may fail to converge to a maximal-margin combined classifier when given a'nonoptimal' weak learning algorithm.
Inferring State Sequences for Non-linear Systems with Embedded Hidden Markov Models
Neal, Radford M., Beal, Matthew J., Roweis, Sam T.
We describe a Markov chain method for sampling from the distribution of the hidden state sequence in a nonlinear dynamical system, given a sequence of observations. This method updates all states in the sequence simultaneously using an embedded Hidden Markov Model (HMM). An update begins with the creation of "pools" of candidate states at each time. We then define an embedded HMM whose states are indexes within these pools. Using a forward-backward dynamic programming algorithm, we can efficiently choose a state sequence with the appropriate probabilities from the exponentially large number of state sequences that pass through states in these pools. We illustrate the method in a simple one-dimensional example, and in an example showing how an embedded HMM can be used to in effect discretize the state space without any discretization error. We also compare the embedded HMM to a particle smoother on a more substantial problem of inferring human motion from 2D traces of markers.
Semi-supervised Protein Classification Using Cluster Kernels
Weston, Jason, Zhou, Dengyong, Elisseeff, Andrรฉ, Noble, William S., Leslie, Christina S.
A key issue in supervised protein classification is the representation of input sequencesof amino acids. Recent work using string kernels for protein datahas achieved state-of-the-art classification performance. However, suchrepresentations are based only on labeled data -- examples with known 3D structures, organized into structural classes -- while in practice, unlabeled data is far more plentiful.
Learning Curves for Stochastic Gradient Descent in Linear Feedforward Networks
Werfel, Justin, Xie, Xiaohui, Seung, H. S.
Gradient-following learning methods can encounter problems of implementation inmany applications, and stochastic variants are frequently used to overcome these difficulties. We derive quantitative learning curves for three online training methods used with a linear perceptron: direct gradient descent, node perturbation, and weight perturbation. The maximum learning rate for the stochastic methods scales inversely with the first power of the dimensionality of the noise injected into the system; withsufficiently small learning rate, all three methods give identical learning curves. These results suggest guidelines for when these stochastic methods will be limited in their utility, and considerations for architectures in which they will be effective.
Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates
Bartlett, Peter L., Jordan, Michael I., Mcauliffe, Jon D.
Many classification algorithms, including the support vector machine, boosting and logistic regression, can be viewed as minimum contrast methods that minimize a convex surrogate of the 0-1 loss function. We characterize the statistical consequences of using such a surrogate by providing ageneral quantitative relationship between the risk as assessed using the 0-1 loss and the risk as assessed using any nonnegative surrogate loss function. We show that this relationship gives nontrivial bounds under theweakest possible condition on the loss function--that it satisfy a pointwise form of Fisher consistency for classification. The relationship is based on a variational transformation of the loss function that is easy to compute in many applications. We also present a refined version of this result in the case of low noise. Finally, we present applications of our results to the estimation of convergence rates in the general setting of function classes that are scaled hulls of a finite-dimensional base class.
Towards Social Robots: Automatic Evaluation of Human-Robot Interaction by Facial Expression Classification
Littlewort, G.C., Bartlett, M.S., Fasel, I.R., Chenu, J., Kanda, T., Ishiguro, H., Movellan, J.R.
Computer animated agents and robots bring a social dimension to human computerinteraction and force us to think in new ways about how computers could be used in daily life. Face to face communication is a real-time process operating at a time scale of less than a second. In this paper we present progress on a perceptual primitive to automatically detect frontal faces in the video stream and code them with respect to 7 dimensions in real time: neutral, anger, disgust, fear, joy, sadness, surprise. Theface finder employs a cascade of feature detectors trained with boosting techniques [13, 2]. The expression recognizer employs a novel combination of Adaboost and SVM's. The generalization performance to new subjects for a 7-way forced choice was 93.3% and 97% correct on two publicly available datasets.
Semidefinite Relaxations for Approximate Inference on Graphs with Cycles
Jordan, Michael I., Wainwright, Martin J.
We present a new method for calculating approximate marginals for probability distributions defined by graphs with cycles, based on a Gaussian entropybound combined with a semidefinite outer bound on the marginal polytope. This combination leads to a log-determinant maximization problemthat can be solved by efficient interior point methods [8]. As with the Bethe approximation and its generalizations [12], the optimizing arguments of this problem can be taken as approximations to the exact marginals. In contrast to Bethe/Kikuchi approaches, our variational problemis strictly convex and so has a unique global optimum. An additional desirable feature is that the value of the optimal solution is guaranteed to provide an upper bound on the log partition function. In experimental trials, the performance of the log-determinant relaxation is comparable to or better than the sum-product algorithm, and by a substantial marginfor certain problem classes. Finally, the zero-temperature limit of our log-determinant relaxation recovers a class of well-known semidefinite relaxations for integer programming [e.g., 3].
New Algorithms for Efficient High Dimensional Non-parametric Classification
liu, Ting, Moore, Andrew W., Gray, Alexander
This paper is about non-approximate acceleration of high dimensional nonparametric operations such as k nearest neighbor classifiers and the prediction phase of Support Vector Machine classifiers. We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly find the datapoints close to the query, but merely need to ask questions about the properties about that set of datapoints. This offers a small amount of computational leeway, and we investigate how much that leeway can be exploited. For clarity, this paper concentrates on pure k-NN classification and the prediction phase of SVMs. We introduce new ball tree algorithms that on real-world datasets give accelerations of 2-fold up to 100-fold compared against highly optimized traditional ball-tree-based k-NN.
Identifying Structure across Pre-partitioned Data
Marx, Zvika, Dagan, Ido, Shamir, Eli
We propose an information-theoretic clustering approach that incorporates a pre-known partition of the data, aiming to identify common clusters that cut across the given partition. In the standard clustering setting the formation of clusters is guided by a single source of feature information. The newly utilized pre-partition factor introduces an additional bias that counterbalances the impact of the features whenever they become correlated with this known partition. The resulting algorithmic framework was applied successfully to synthetic data, as well as to identifying text-based cross-religion correspondences.