Not enough data to create a plot.
Try a different view from the menu above.
Technology
Noisy Neural Networks and Generalizations
Siegelmann, Hava T., Roitershtein, Alexander, Ben-Hur, Asa
In this paper we define a probabilistic computational model which generalizes many noisy neural network models, including the recent work of Maass and Sontag [5]. We identify weak ergodicjty as the mechanism responsible for restriction of the computational power of probabilistic models to definite languages, independent of the characteristics of the noise: whether it is discrete or analog, or if it depends on the input or not, and independent of whether the variables are discrete or continuous. We give examples of weakly ergodic models including noisy computational systems with noise depending on the current state and inputs, aggregate models, and computational systems which update in continuous time. 1 Introduction Noisy neural networks were recently examined, e.g.
Understanding Stepwise Generalization of Support Vector Machines: a Toy Model
Risau-Gusman, Sebastian, Gordon, Mirta B.
In this article we study the effects of introducing structure in the input distribution of the data to be learnt by a simple perceptron. We determine the learning curves within the framework of Statistical Mechanics.Stepwise generalization occurs as a function of the number of examples when the distribution of patterns is highly anisotropic. Although extremely simple, the model seems to capture therelevant features of a class of Support Vector Machines which was recently shown to present this behavior.
Broadband Direction-Of-Arrival Estimation Based on Second Order Statistics
Rosca, Justinian P., Ruanaidh, Joseph ร, Jourjine, Alexander, Rickard, Scott
N wideband sources recorded using N closely spaced receivers can feasibly be separated based only on second order statistics when using a physical model of the mixing process. In this case we show that the parameter estimation problem can be essentially reduced to considering directions of arrival and attenuations of each signal. The paper presents two demixing methods operating in the time and frequency domain and experimentally shows that it is always possible to demix signals arriving at different angles. Moreover, one can use spatial cues to solve the channel selection problem and a post-processing Wiener filter to ameliorate the artifacts caused by demixing.
Information Factorization in Connectionist Models of Perception
Movellan, Javier R., McClelland, James L.
We examine a psychophysical law that describes the influence of stimulus and context on perception. According to this law choice probability ratios factorize into components independently controlled bystimulus and context. It has been argued that this pattern of results is incompatible with feedback models of perception. In this paper we examine this claim using neural network models defined via stochastic differential equations. We show that the law is related to a condition named channel separability and has little to do with the existence of feedback connections. In essence, channels areseparable if they converge into the response units without direct lateral connections to other channels and if their sensors are not directly contaminated by external inputs to the other channels. Implicationsof the analysis for cognitive and computational neurosicence are discussed.
Probabilistic Methods for Support Vector Machines
One of the open questions that remains is how to set the'tunable' parameters of an SVM algorithm: While methods forchoosing the width of the kernel function and the noise parameter C (which controls how closely the training data are fitted) have been proposed [4, 5] (see also, very recently, [6]), the effect of the overall shape of the kernel function remains imperfectly understood [1]. Error bars (class probabilities) for SVM predictions - important for safety-critical applications, for example - are also difficult to obtain. In this paper I suggest that a probabilistic interpretation of SVMs could be used to tackle these problems. It shows that the SVM kernel defines a prior over functions on the input space, avoiding the need to think in terms of high-dimensional feature spaces. It also allows one to define quantities such as the evidence (likelihood) for a set of hyperparameters (C, kernel amplitude Ko etc). I give a simple approximation to the evidence which can then be maximized to set such hyperparameters. The evidence is sensitive to the values of C and Ko individually, in contrast to properties (such as cross-validation error) of the deterministic solution, which only depends on the product CKo. It can thfrefore be used to assign an unambiguous value to C, from which error bars can be derived.
Some Theoretical Results Concerning the Convergence of Compositions of Regularized Linear Functions
Recently, sample complexity bounds have been derived for problems involving linearfunctions such as neural networks and support vector machines. In this paper, we extend some theoretical results in this area by deriving dimensional independent covering number bounds for regularized linearfunctions under certain regularization conditions. We show that such bounds lead to a class of new methods for training linear classifiers withsimilar theoretical advantages of the support vector machine. Furthermore, we also present a theoretical analysis for these new methods fromthe asymptotic statistical point of view. This technique provides better description for large sample behaviors of these algorithms. 1 Introduction In this paper, we are interested in the generalization performance of linear classifiers obtained fromcertain algorithms.
Boosting Algorithms as Gradient Descent
Mason, Llew, Baxter, Jonathan, Bartlett, Peter L., Frean, Marcus R.
Recent theoretical results suggest that the effectiveness of these algorithms is due to their tendency to produce large margin classifiers [1, 18]. Loosely speaking, if a combination of classifiers correctly classifies most of the training data with a large margin, then its error probability is small. In [14] we gave improved upper bounds on the misclassification probability of a combined classifier in terms of the average over the training data of a certain cost function of the margins. That paper also described DOOM, an algorithm for directly minimizingthe margin cost function by adjusting the weights associated with Boosting Algorithms as Gradient Descent 513 each base classifier (the base classifiers are suppiled to DOOM). DOOM exhibits performance improvements over AdaBoost, even when using the same base hypotheses, whichprovides additional empirical evidence that these margin cost functions are appropriate quantities to optimize. In this paper, we present a general class of algorithms (called AnyBoost) which are gradient descent algorithms for choosing linear combinations of elements of an inner product function space so as to minimize some cost functional. The normal operation of a weak learner is shown to be equivalent to maximizing a certain inner product. We prove convergence of AnyBoost under weak conditions. In Section 3, we show that this general class of algorithms includes as special cases nearly all existing voting methods.
Learning from User Feedback in Image Retrieval Systems
Vasconcelos, Nuno, Lippman, Andrew
We formulate the problem of retrieving images from visual databases as a problem of Bayesian inference. This leads to natural and effective solutions for two of the most challenging issues in the design of a retrieval system: providing support for region-based queries without requiring prior image segmentation, and accounting for user-feedback during a retrieval session. We present a new learning algorithm that relies on belief propagation to account for both positive and negative examples of the user's interests.