Statistical Learning
Adaptive Online Gradient Descent
Hazan, Elad, Rakhlin, Alexander, Bartlett, Peter L.
We study the rates of growth of the regret in online convex optimization. First, we show that a simple extension of the algorithm of Hazan et al eliminates the need for a priori knowledge of the lower bound on the second derivatives of the observed functions. We then provide an algorithm, Adaptive Online Gradient Descent, which interpolates between the results of Zinkevich for linear functions and of Hazan et al for strongly convex functions, achieving intermediate rates between T and log T . Furthermore, we show strong optimality of the algorithm. Finally, we provide an extension of our results to general norms.
DIFFRAC: a discriminative and flexible framework for clustering
Bach, Francis R., Harchaoui, Zaรฏd
We present a novel linear clustering framework (Diffrac) which relies on a linear discriminative cost function and a convex relaxation of a combinatorial optimization problem. The large convex optimization problem is solved through a sequence of lower dimensional singular value decompositions. This framework has several attractive properties: (1) although apparently similar to K-means, it exhibits superior clustering performance than K-means, in particular in terms of robustness to noise. (2) It can be readily extended to non linear clustering if the discriminative cost function is based on positive definite kernels, and can then be seen as an alternative to spectral clustering. (3) Prior information on the partition is easily incorporated, leading to state-of-the-art performance for semi-supervised learning, for clustering or classification. We present empirical evaluations of our algorithms on synthetic and real medium-scale datasets.
Progressive mixture rules are deviation suboptimal
We consider the learning task consisting in predicting as well as the best function in a finite reference set G up to the smallest possible additive term. If R(g) denotes the generalization error of a prediction function g, under reasonable assumptions on the loss function (typically satisfied by the least square loss when the output is bounded), it is known that the progressive mixture rule g_n satisfies E R(g_n) < min_{g in G} R(g) + Cst (log|G|)/n where n denotes the size of the training set, E denotes the expectation wrt the training set distribution. This work shows that, surprisingly, for appropriate reference sets G, the deviation convergence rate of the progressive mixture rule is only no better than Cst / sqrt{n}, and not the expected Cst / n. It also provides an algorithm which does not suffer from this drawback.
A Spectral Regularization Framework for Multi-Task Structure Learning
Argyriou, Andreas, Pontil, Massimiliano, Ying, Yiming, Micchelli, Charles A.
Learning the common structure shared by a set of supervised tasks is an important practical and theoretical problem. Knowledge of this structure may lead to better generalizationperformance on the tasks and may also facilitate learning new tasks. We propose a framework for solving this problem, which is based on regularization withspectral functions of matrices. This class of regularization problems exhibits appealing computational properties and can be optimized efficiently by an alternating minimization algorithm. In addition, we provide a necessary and sufficient condition for convexity of the regularizer.
Variational Inference for Diffusion Processes
Archambeau, Cรฉdric, Opper, Manfred, Shen, Yuan, Cornford, Dan, Shawe-taylor, John S.
Diffusion processes are a family of continuous-time continuous-state stochastic processes that are in general only partially observed. The joint estimation of the forcing parameters and the system noise (volatility) in these dynamical systems is a crucial, but non-trivial task, especially when the system is nonlinear and multi-modal. We propose a variational treatment of diffusion processes, which allows us to estimate these parameters by simple gradient techniques and which is computationally less demanding than most MCMC approaches. Furthermore, our parameter inference scheme does not break down when the time step gets smaller, unlike most current approaches. Finally, we show how a cheap estimate of the posterior over the parameters can be constructed based on the variational free energy.
Fitted Q-iteration in continuous action-space MDPs
Antos, Andrรกs, Szepesvรกri, Csaba, Munos, Rรฉmi
We consider continuous state, continuous action batch reinforcement learning where the goal is to learn a good policy from a sufficiently rich trajectory generated by another policy. We study a variant of fitted Q-iteration, where the greedy action selection is replaced by searching for a policy in a restricted set of candidate policies by maximizing the average action values. We provide a rigorous theoretical analysis of this algorithm, proving what we believe is the first finite-time bounds for value-function based algorithms for continuous state- and action-space problems.
On the Geometry of Discrete Exponential Families with Application to Exponential Random Graph Models
Fienberg, Stephen E., Rinaldo, Alessandro, Zhou, Yi
There has been an explosion of interest in statistical models for analyzing network data, and considerable interest in the class of exponential random graph (ERG) models, especially in connection with difficulties in computing maximum likelihood estimates. The issues associated with these difficulties relate to the broader structure of discrete exponential families. This paper re-examines the issues in two parts. First we consider the closure of $k$-dimensional exponential families of distribution with discrete base measure and polyhedral convex support $\mathrm{P}$. We show that the normal fan of $\mathrm{P}$ is a geometric object that plays a fundamental role in deriving the statistical and geometric properties of the corresponding extended exponential families. We discuss its relevance to maximum likelihood estimation, both from a theoretical and computational standpoint. Second, we apply our results to the analysis of ERG models. In particular, by means of a detailed example, we provide some characterization of the properties of ERG models, and, in particular, of certain behaviors of ERG models known as degeneracy.
A New Clustering Algorithm Based Upon Flocking On Complex Network
Li, Qiang, He, Yan, Jiang, Jing-ping
We have proposed a model based upon flocking on a complex network, and then developed two clustering algorithms on the basis of it. In the algorithms, firstly a \textit{k}-nearest neighbor (knn) graph as a weighted and directed graph is produced among all data points in a dataset each of which is regarded as an agent who can move in space, and then a time-varying complex network is created by adding long-range links for each data point. Furthermore, each data point is not only acted by its \textit{k} nearest neighbors but also \textit{r} long-range neighbors through fields established in space by them together, so it will take a step along the direction of the vector sum of all fields. It is more important that these long-range links provides some hidden information for each data point when it moves and at the same time accelerate its speed converging to a center. As they move in space according to the proposed model, data points that belong to the same class are located at a same position gradually, whereas those that belong to different classes are away from one another. Consequently, the experimental results have demonstrated that data points in datasets are clustered reasonably and efficiently, and the rates of convergence of clustering algorithms are fast enough. Moreover, the comparison with other algorithms also provides an indication of the effectiveness of the proposed approach.
Identifying Relevant Eigenimages - a Random Matrix Approach
Ding, Yu, Chung, Yiu-Cho, Huang, Kun, Simonetti, Orlando P.
Dimensional reduction of high dimensional data can be achieved by keeping only the relevant eigenmodes after principal component analysis. However, differentiating relevant eigenmodes from the random noise eigenmodes is problematic. A new method based on the random matrix theory and a statistical goodness-of-fit test is proposed in this paper. It is validated by numerical simulations and applied to real-time magnetic resonance cardiac cine images.
A Growing Self-Organizing Network for Reconstructing Curves and Surfaces
In the original Self-Organizing Map (SOM) algorithm by Teuvo Kohonen [1] a lattice of connected units learns a representation of an input data distribution. During the learning process, the weight vector - i.e. a position in the input space - associated to each unit is progressively adapted to the input distribution by finding the unit that best matches each input and moving it'closer' to that input, together with a subset of neighboring units, to an extent that decreases with the distance on the lattice from the best matching unit. As the adaptation progresses, the SOM tends to represent the topology input data distribution in the sense that it maps inputs that are'close' in the input space to units that are neighbors in the lattice. In the Neural Gas (NG) algorithm [2], the topology of the network of units is not fixed, as it is with SOMs, but is learnt from the input distribution as part of the adaptation process. In particular, Martinetz and Schulten have shown in [3] that, under certain conditions, the Neural Gas algorithm tends to constructing a restricted Delaunay graph, namely a triangulation with remarkable topological properties to be discussed later. They deem the structure constructed by the algorithm a topology representing network (TRN). Besides the thread of subsequent developments in the field of neural networks, the work by Martintetz and Schulten have raised also a considerable interest in the community of computational topology and geometry. The studies that followed in this direction have produced a number of theoretical results that are nowadays at the foundations of some popular methods for curve and surface reconstruction in computer graphics ([4]), although they have little or nothing in common with neural networks algorithms.