Evolutionary Systems


Cortical.io - Biologically-inspired natural language understanding

#artificialintelligence

We help you extract key information from contracts, increase the speed and quality of resolution of support requests, monitor in real time what your customers are saying in social media, and much more!


Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates

Neural Information Processing Systems

In this work, we improve upon the stepwise analysis of noisy iterative learning algorithms initiated by Pensia, Jog, and Loh (2018) and recently extended by Bu, Zou, and Veeravalli (2019). Our main contributions are significantly improved mutual information bounds for Stochastic Gradient Langevin Dynamics via data-dependent estimates. Our approach is based on the variational characterization of mutual information and the use of data-dependent priors that forecast the mini-batch gradient based on a subset of the training samples. Our approach is broadly applicable within the information-theoretic framework of Russo and Zou (2015) and Xu and Raginsky (2017). Our bound can be tied to a measure of flatness of the empirical risk surface.


Generalization of ERM in Stochastic Convex Optimization: The Dimension Strikes Back

Neural Information Processing Systems

In stochastic convex optimization the goal is to minimize a convex function $F(x) \doteq \E_{f\sim D}[f(x)]$ over a convex set $\K \subset \R d$ where $D$ is some unknown distribution and each $f(\cdot)$ in the support of $D$ is convex over $\K$. The optimization is based on i.i.d. We also demonstrate that for a more general class of range-bounded (but not Lipschitz-bounded) stochastic convex programs an even stronger gap appears already in dimension 2. Papers published at the Neural Information Processing Systems Conference.


Generalization Bounds for Domain Adaptation

Neural Information Processing Systems

In this paper, we provide a new framework to study the generalization bound of the learning process for domain adaptation. Without loss of generality, we consider two kinds of representative domain adaptation settings: one is domain adaptation with multiple sources and the other is domain adaptation combining source and target data. In particular, we introduce two quantities that capture the inherent characteristics of domains. For either kind of domain adaptation, based on the two quantities, we then develop the specific Hoeffding-type deviation inequality and symmetrization inequality to achieve the corresponding generalization bound based on the uniform entropy number. By using the resultant generalization bound, we analyze the asymptotic convergence and the rate of convergence of the learning process for such kind of domain adaptation.


Generalization Bounds and Consistency for Latent Structural Probit and Ramp Loss

Neural Information Processing Systems

We consider latent structural versions of probit loss and ramp loss. We show that these surrogate loss functions are consistent in the strong sense that for any feature map (finite or infinite dimensional) they yield predictors approaching the infimum task loss achievable by any linear predictor over the given features. We also give finite sample generalization bounds (convergence rates) for these loss functions. These bounds suggest that probit loss converges more rapidly. However, ramp loss is more easily optimized and may ultimately be more practical.


Two-Layer Generalization Analysis for Ranking Using Rademacher Average

Neural Information Processing Systems

This paper is concerned with the generalization analysis on learning to rank for information retrieval (IR). In IR, data are hierarchically organized, i.e., consisting of queries and documents per query. Previous generalization analysis for ranking, however, has not fully considered this structure, and cannot explain how the simultaneous change of query number and document number in the training data will affect the performance of algorithms. In this paper, we propose performing generalization analysis under the assumption of two-layer sampling, i.e., the i.i.d. Such a sampling can better describe the generation mechanism of real data, and the corresponding generalization analysis can better explain the real behaviors of learning to rank algorithms.


Generalization Errors and Learning Curves for Regression with Multi-task Gaussian Processes

Neural Information Processing Systems

We provide some insights into how task correlations in multi-task Gaussian process (GP) regression affect the generalization error and the learning curve. We analyze the asymmetric two-task case, where a secondary task is to help the learning of a primary task. Within this setting, we give bounds on the generalization error and the learning curve of the primary task. Our approach admits intuitive understandings of the multi-task GP by relating it to single-task GPs. For the case of one-dimensional input-space under optimal sampling with data only for the secondary task, the limitations of multi-task GP can be quantified explicitly.


Generalization Error Analysis of Quantized Compressive Learning

Neural Information Processing Systems

Compressive learning is an effective method to deal with very high dimensional datasets by applying learning algorithms in a randomly projected lower dimensional space. In this paper, we consider the learning problem where the projected data is further compressed by scalar quantization, which is called quantized compressive learning. Generalization error bounds are derived for three models: nearest neighbor (NN) classifier, linear classifier and least squares regression. Besides studying finite sample setting, our asymptotic analysis shows that the inner product estimators have deep connection with NN and linear classification problem through the variance of their debiased counterparts. By analyzing the extra error term brought by quantization, our results provide useful implications to the choice of quantizers in applications involving different learning tasks.


Algorithm-Dependent Generalization Bounds for Overparameterized Deep Residual Networks

Neural Information Processing Systems

The skip-connections used in residual networks have become a standard architecture choice in deep learning due to the increased generalization and stability of networks with this architecture, although there have been limited theoretical guarantees for this improved performance. In this work, we analyze overparameterized deep residual networks trained by gradient descent following random initialization, and demonstrate that (i) the class of networks learned by gradient descent constitutes a small subset of the entire neural network function class, and (ii) this subclass of networks is sufficiently large to guarantee small training error. By showing (i) we are able to demonstrate that deep residual networks trained with gradient descent have a small generalization gap between training and test error, and together with (ii) this guarantees that the test error will be small. Our optimization and generalization guarantees require overparameterization that is only logarithmic in the depth of the network, which helps explain why residual networks are preferable to fully connected ones. Papers published at the Neural Information Processing Systems Conference.


Generalization Bounds in the Predict-then-Optimize Framework

Neural Information Processing Systems

The predict-then-optimize framework is fundamental in many practical settings: predict the unknown parameters of an optimization problem, and then solve the problem using the predicted values of the parameters. A natural loss function in this environment is to consider the cost of the decisions induced by the predicted parameters, in contrast to the prediction error of the parameters. This loss function was recently introduced in [Elmachtoub and Grigas, 2017], which called it the Smart Predict-then-Optimize (SPO) loss. Since the SPO loss is nonconvex and noncontinuous, standard results for deriving generalization bounds do not apply. In this work, we provide an assortment of generalization bounds for the SPO loss function.