Gradient Descent
Text-Based Information Retrieval Using Exponentiated Gradient Descent
Papka, Ron, Callan, James P., Barto, Andrew G.
The following investigates the use of single-neuron learning algorithms to improve the performance of text-retrieval systems that accept natural-language queries. A retrieval process is explained that transforms the natural-language query into the query syntax of a real retrieval system: the initial query is expanded using statistical and learning techniques and is then used for document ranking and binary classification. The results of experiments suggest that Kivinen and Warmuth's Exponentiated Gradient Descent learning algorithm works significantly better than previous approaches. 1 Introduction The following work explores two learning algorithms - Least Mean Squared (LMS) [1] and Exponentiated Gradient Descent (EG) [2] - in the context of text-based Information Retrieval (IR) systems. The experiments presented in [3] use connectionist learning models to improve the retrieval of relevant documents from a large collection of text. Previous work in the area employs various techniques for improving retrieval [6, 7, 14].
Using Curvature Information for Fast Stochastic Search
Orr, Genevieve B., Leen, Todd K.
We present an algorithm for fast stochastic gradient descent that uses a nonlinear adaptive momentum scheme to optimize the late time convergence rate. The algorithm makes effective use of curvature information,requires only O(n) storage and computation, and delivers convergence rates close to the theoretical optimum. We demonstrate the technique on linear and large nonlinear backprop networks.
Text-Based Information Retrieval Using Exponentiated Gradient Descent
Papka, Ron, Callan, James P., Barto, Andrew G.
The following investigates the use of single-neuron learning algorithms to improve the performance of text-retrieval systems that accept natural-language queries. A retrieval process is explained that transforms the natural-language query into the query syntax of a real retrieval system: the initial query is expanded using statistical and learning techniques and is then used for document ranking and binary classification. The results of experiments suggest that Kivinen and Warmuth's Exponentiated Gradient Descent learning algorithm works significantly better than previous approaches. 1 Introduction The following work explores two learning algorithms - Least Mean Squared (LMS) [1] and Exponentiated Gradient Descent (EG) [2] - in the context of text-based Information Retrieval (IR) systems. The experiments presented in [3] use connectionist to improve the retrieval of relevant documents from a largelearning models collection of text. Previous the area employs various techniques for improving retrieval [6, 7, 14].
One-unit Learning Rules for Independent Component Analysis
Neural one-unit learning rules for the problem of Independent Component Analysis(ICA) and blind source separation are introduced. In these new algorithms, every ICA neuron develops into a separator thatfinds one of the independent components. The learning rules use very simple constrained Hebbianjanti-Hebbian learning in which decorrelating feedback may be added. To speed up the convergence of these stochastic gradient descent rules, a novel computationally efficientfixed-point algorithm is introduced. 1 Introduction Independent Component Analysis (ICA) (Comon, 1994; Jutten and Herault, 1991) is a signal processing technique whose goal is to express a set of random variables aslinear combinations of statistically independent component variables. The main applications of ICA are in blind source separation, feature extraction, and blind deconvolution.
Neural Learning in Structured Parameter Spaces - Natural Riemannian Gradient
Shun-ichi Amari RIKEN Frontier Research Program, RIKEN, Hirosawa 2-1, Wako-shi 351-01, Japan amari@zoo.riken.go.jp Abstract The parameter space of neural networks has a Riemannian metric structure.The natural Riemannian gradient should be used instead of the conventional gradient, since the former denotes the true steepest descent direction of a loss function in the Riemannian space. The behavior of the stochastic gradient learning algorithm is much more effective if the natural gradient is used. The present paper studies the information-geometrical structure of perceptrons and other networks, and prove that the online learning method based on the natural gradient is asymptotically as efficient as the optimal batch algorithm. Adaptive modification of the learning constant is proposed and analyzed in terms of the Riemannian measure andis shown to be efficient. The natural gradient is finally applied to blind separation of mixtured independent signal sources. 1 Introd uction Neural learning takes place in the parameter space of modifiable synaptic weights of a neural network.
Dynamics of On-Line Gradient Descent Learning for Multilayer Neural Networks
We consider the problem of online gradient descent learning for general two-layer neural networks. An analytic solution is presented and used to investigate the role of the learning rate in controlling the evolution and convergence of the learning process. Two-layer networks with an arbitrary number of hidden units have been shown to be universal approximators [1] for such N-to-one dimensional maps. We investigate the emergence of generalization ability in an online learning scenario [2], in which the couplings are modified after the presentation of each example so as to minimize the corresponding error. The resulting changes in {J} are described as a dynamical evolution; the number of examples plays the role of time.
Learning long-term dependencies is not as difficult with NARX networks
Lin, Tsungnan, Horne, Bill G., Tiño, Peter, Giles, C. Lee
It has recently been shown that gradient descent learning algorithms for recurrent neural networks can perform poorly on tasks that involve long-term dependencies. In this paper we explore this problem for a class of architectures called NARX networks, which have powerful representational capabilities. Previous work reported that gradient descent learning is more effective in NARX networks than in recurrent networks with "hidden states". We show that although NARX networks do not circumvent the problem of long-term dependencies, they can greatly improve performance on such problems. We present some experimental'results that show that NARX networks can often retain information for two to three times as long as conventional recurrent networks.
Dynamics of On-Line Gradient Descent Learning for Multilayer Neural Networks
We consider the problem of online gradient descent learning for general two-layer neural networks. An analytic solution is presented and used to investigate the role of the learning rate in controlling the evolution and convergence of the learning process. Two-layer networks with an arbitrary number of hidden units have been shown to be universal approximators [1] for such N-to-one dimensional maps. We investigate the emergence of generalization ability in an online learning scenario [2], in which the couplings are modified after the presentation of each example so as to minimize the corresponding error. The resulting changes in {J} are described as a dynamical evolution; the number of examples plays the role of time.
Dynamics of On-Line Gradient Descent Learning for Multilayer Neural Networks
Sollat CONNECT, The Niels Bohr Institute Blegdamsdvej 17 Copenhagen 2100, Denmark Abstract We consider the problem of online gradient descent learning for general two-layer neural networks. An analytic solution is presented andused to investigate the role of the learning rate in controlling theevolution and convergence of the learning process. Two-layer networks with an arbitrary number of hidden units have been shown to be universal approximators [1] for such N-to-one dimensional maps. We investigate the emergence of generalization ability in an online learning scenario [2], in which the couplings are modified after the presentation of each example so as to minimize the corresponding error. The resulting changes in {J} are described as a dynamical evolution; the number of examples plays the role of time.
Stable Dynamic Parameter Adaption
A stability criterion for dynamic parameter adaptation is given. In the case of the learning rate of backpropagation, a class of stable algorithms is presented and studied, including a convergence proof. 1 INTRODUCTION All but a few learning algorithms employ one or more parameters that control the quality of learning. Backpropagation has its learning rate and momentum parameter; Boltzmannlearning uses a simulated annealing schedule; Kohonen learning a learning rate and a decay parameter; genetic algorithms probabilities, etc. The investigator always has to set the parameters to specific values when trying to solve a certain problem. Traditionally, the metaproblem of adjusting the parameters is solved by relying on a set of well-tested values of other problems or an intensive search for good parameter regions by restarting the experiment with different values. Inthis situation, a great deal of expertise and/or time for experiment design is required (as well as a huge amount of computing time).