Goto

Collaborating Authors

 Instructional Material


On-line Learning from Finite Training Sets in Nonlinear Networks

Neural Information Processing Systems

Online learning is one of the most common forms of neural net(cid:173) work training. We present an analysis of online learning from finite training sets for non-linear networks (namely, soft-committee ma(cid:173) chines), advancing the theory to more realistic learning scenarios. Dynamical equations are derived for an appropriate set of order parameters; these are exact in the limiting case of either linear networks or infinite training sets. Preliminary comparisons with simulations suggest that the theory captures some effects of finite training sets, but may not yet account correctly for the presence of local minima.


On-Line Learning with Restricted Training Sets: Exact Solution as Benchmark for General Theories

Neural Information Processing Systems

O(ws(s log d log(dqh/ s))) and O(ws((h/ s) log q) log(dqh/ s)) are upper bounds for the VC-dimension of a set of neural networks of units with piecewise polynomial activation functions, where s is the depth of the network, h is the number of hidden units, w is the number of adjustable parameters, q is the maximum of the number of polynomial segments of the activation function, and d is the maximum degree of the polynomials; also n(wslog(dqh/s)) is a lower bound for the VC-dimension of such a network set, which are tight for the cases s 8(h) and s is constant. For the special case q 1, the VC-dimension is 8(ws log d).


From Coexpression to Coregulation: An Approach to Inferring Transcriptional Regulation among Gene Classes from Large-Scale Expression Data

Neural Information Processing Systems

We provide preliminary evidence that eXlstmg algorithms for inferring from gene expression data can be adapted to large-scale gene expression data coming from hybridization microarrays. The essential steps are (1) clustering many genes by their expression time-course data into a minimal set of clusters of co-expressed genes, (2) theoretically modeling the various conditions under which the time-courses are measured using a continious-time analog recurrent neural network for the cluster mean time-courses, (3) fitting such a regulatory model to the cluster mean time courses by simulated annealing with weight decay, and (4) analysing several such fits for commonalities the connection matrices. This procedure can be used to assess the adequacy of existing and future gene expression time-course data sets for determ ining transcriptional regulatory relationships such as coregulation .


Efficiency versus Convergence of Boolean Kernels for On-Line Learning Algorithms

Neural Information Processing Systems

We study online learning in Boolean domains using kernels which cap- ture feature expansions equivalent to using conjunctions over basic fea- tures. We demonstrate a tradeoff between the computational efficiency with which these kernels can be computed and the generalization abil- ity of the resulting classifier. We first describe several kernel functions which capture either limited forms of conjunctions or all conjunctions. We show that these kernels can be used to efficiently run the Percep- tron algorithm over an exponential number of conjunctions; however we also prove that using such kernels the Perceptron algorithm can make an exponential number of mistakes even when learning simple func- tions. We also consider an analogous use of kernel functions to run the multiplicative-update Winnow algorithm over an expanded feature space of exponentially many conjunctions.


On the Generalization Ability of On-Line Learning Algorithms

Neural Information Processing Systems

In this paper we show that on-line algorithms for classification and re- gression can be naturally used to obtain hypotheses with good data- dependent tail bounds on their risk. Our results are proven without re- quiring complicated concentration-of-measure arguments and they hold for arbitrary on-line learning algorithms. Furthermore, when applied to concrete on-line algorithms, our results yield tail bounds that in many cases are comparable or better than the best known bounds.


Adaptive Caching by Refetching

Neural Information Processing Systems

We are constructing caching policies that have 13-20% lower miss rates than the best of twelve baseline policies over a large variety of request streams. This represents an improvement of 49–63% over Least Recently Used, the most commonly implemented policy. We achieve this not by designing a specific new policy but by using on-line Machine Learning algorithms to dynamically shift between the standard policies based on their observed miss rates. A thorough experimental evaluation of our techniques is given, as well as a discussion of what makes caching an interesting on-line learning problem.


Matrix Exponential Gradient Updates for On-line Learning and Bregman Projection

Neural Information Processing Systems

We address the problem of learning a symmetric positive definite matrix. The central issue is to design parameter updates that preserve positive definiteness. Our updates are motivated with the von Neumann diver- gence. Rather than treating the most general case, we focus on two key applications that exemplify our methods: On-line learning with a simple square loss and finding a symmetric positive definite matrix subject to symmetric linear constraints. The updates generalize the Exponentiated Gradient (EG) update and AdaBoost, respectively: the parameter is now a symmetric positive definite matrix of trace one instead of a probability vector (which in this context is a diagonal positive definite matrix with trace one).


Fast Gaussian Process Regression using KD-Trees

Neural Information Processing Systems

We consider (regression) estimation of a function x u(x) from noisy observations. If the data-generating process is not well understood, simple parametric learning algorithms, for example ones from the generalized linear model (GLM) family, may be hard to apply because of the difficulty of choosing good features. In contrast, the nonparametric Gaussian process (GP) model [19] offers a flexible and powerful alternative. However, a major drawback of GP models is that the computational cost of learning is about O(n 3), and the cost of making a single prediction is O(n), where n is the number of training examples. This high computational complexity severely limits its scalability to large problems, and we believe has proved a significant barrier to the wider adoption of the GP model.


An online Hebbian learning rule that performs Independent Component Analysis

Neural Information Processing Systems

Independent component analysis (ICA) is a powerful method to decouple signals. Most of the algorithms performing ICA do not consider the temporal correlations of the signal, but only higher moments of its amplitude distribution. Moreover, they require some preprocessing of the data (whitening) so as to remove second order correlations. In this paper, we are interested in understanding the neural mechanism responsible for solving ICA. We present an online learning rule that exploits delayed correlations in the input.


Online Metric Learning and Fast Similarity Search

Neural Information Processing Systems

Metric learning algorithms can provide useful distance functions for a variety of domains, and recent work has shown good accuracy for problems where the learner can access all distance constraints at once. However, in many real applications, constraints are only available incrementally, thus necessitating methods that can perform online updates to the learned metric. Existing online algorithms offer bounds on worst-case performance, but typically do not perform well in practice as compared to their offline counterparts. We present a new online metric learning algorithm that updates a learned Mahalanobis metric based on LogDet regularization and gradient descent. We prove theoretical worst-case performance bounds, and empirically compare the proposed method against existing online metric learning algorithms.