Plotting

 taxnodes:Technology: Overviews


Reviewer 1 of these papers investigate the relationship between regret and stability of an online learning algorithm and a comparison

Neural Information Processing Systems

We thank all reviewers for their comments. Minor comments will be addressed in the final version. Comparison with related work Thanks for the references to work of Ross & Bagnell, Saha et al., and Arora et al. Your questioning of the dimension dependence in Theorem 3.2 and Corollary 3.3 is valid. OGD/FTRL algorithms in these settings will not incur the dimension dependence. Further, this dimension dependence only arises in Theorem 3.2 and Corollary 3.3.


A Supplementary Material A.1 Additional Literature Review

Neural Information Processing Systems

Batch Normalization, Softmax, and Other Layers The current formulation of our improved Lipschitz estimation algorithm is applicable to architectures consisting of convolutional, fully connected, and slope-restricted activation layers.


Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise

Neural Information Processing Systems

The growing importance of massive datasets used for deep learning makes robustness to label noise a critical property for classifiers to have. Sources of label noise include automatic labeling, non-expert labeling, and label corruption by data poisoning adversaries. Numerous previous works assume that no source of labels can be trusted. We relax this assumption and assume that a small subset of the training data is trusted. This enables substantial label corruption robustness performance gains. In addition, particularly severe label noise can be combated by using a set of trusted data with clean labels. We utilize trusted data by proposing a loss correction technique that utilizes trusted examples in a data-efficient manner to mitigate the effects of label noise on deep neural network classifiers. Across vision and natural language processing tasks, we experiment with various label noises at several strengths, and show that our method significantly outperforms existing methods.


On Neuronal Capacity

Neural Information Processing Systems

We define the capacity of a learning machine to be the logarithm of the number (or volume) of the functions it can implement. We review known results, and derive new results, estimating the capacity of several neuronal models: linear and polynomial threshold gates, linear and polynomial threshold gates with constrained weights (binary weights, positive weights), and ReLU neurons. We also derive some capacity estimates and bounds for fully recurrent networks, as well as feedforward networks.


b59307fdacf7b2db12ec4bd5ca1caba8-AuthorFeedback.pdf

Neural Information Processing Systems

We thank the reviewers for the thoughtful comments. We will address ALL comments in our final version. Reviewer 1: We do not claim that we improve a standard model for image classification. However, our sample complexity speedup performance is most likely to be superior. Weights sharing across active rounds - We train the network from scratch on each active learning round. It might not always be the case that more data induce deeper (more complicated) model... - We think that this situation Random querying baseline - we agree that random querying (i.e.


Size-Noise Tradeoffs in Generative Networks

Neural Information Processing Systems

This paper investigates the ability of generative networks to convert their input noise distributions into other distributions. Firstly, we demonstrate a construction that allows ReLU networks to increase the dimensionality of their noise distribution by implementing a "space-filling" function based on iterated tent maps. We show this construction is optimal by analyzing the number of affine pieces in functions computed by multivariate ReLU networks. Secondly, we provide efficient ways (using polylog(1/ɛ) nodes) for networks to pass between univariate uniform and normal distributions, using a Taylor series approximation and a binary search gadget for computing function inverses. Lastly, we indicate how high dimensional distributions can be efficiently transformed into low dimensional distributions.


Gradient Guidance for Diffusion Models: An Optimization Perspective

Neural Information Processing Systems

Diffusion models have demonstrated empirical successes in various applications and can be adapted to task-specific needs via guidance. This paper studies a form of gradient guidance for adapting a pre-trained diffusion model towards optimizing user-specified objectives. We establish a mathematical framework for guided diffusion to systematically study its optimization theory and algorithmic design. Our theoretical analysis spots a strong link between guided diffusion models and optimization: gradient-guided diffusion models are essentially sampling solutions to a regularized optimization problem, where the regularization is imposed by the pre-training data. As for guidance design, directly bringing in the gradient of an external objective function as guidance would jeopardize the structure in generated samples. We investigate a modified form of gradient guidance based on a forward prediction loss, which leverages the information in pre-trained score functions and provably preserves the latent structure. We further consider an iteratively fine-tuned version of gradient-guided diffusion where guidance and score network are both updated with newly generated samples. This process mimics a first-order optimization iteration in expectation, for which we proved Õ(1/K) convergence rate to the global optimum when the objective function is concave.


The Case for Evaluating Causal Models Using Interventional Measures and Empirical Data

Neural Information Processing Systems

Causal modeling is central to many areas of artificial intelligence, including complex reasoning, planning, knowledge-base construction, robotics, explanation, and fairness. An active community of researchers develops and enhances algorithms that learn causal models from data, and this work has produced a series of impressive technical advances. However, evaluation techniques for causal modeling algorithms have remained somewhat primitive, limiting what we can learn from experimental studies of algorithm performance, constraining the types of algorithms and model representations that researchers consider, and creating a gap between theory and practice. We argue for more frequent use of evaluation techniques that examine interventional measures rather than structural or observational measures, and that evaluate using empirical data rather than synthetic data. We survey the current practice in evaluation and show that the techniques we recommend are rarely used in practice. We show that such techniques are feasible and that data sets are available to conduct such evaluations. We also show that these techniques produce substantially different results than using structural measures and synthetic data.