Goto

Collaborating Authors

 ada





SupplementaryMaterial: Appendices

Neural Information Processing Systems

Symplectic integrators arethe numerical integrators thatpreservethisconservation law;hence, theycanbeinasense considered as adiscrete Hamiltonian system that is an approximation to the target Hamiltonian system. As shown above, a discrete gradient is defined in Definition 1. However,most oftheexisting discrete gradients require explicit representation of the Hamiltonian; hence, they are not available for neural networks. An exception is the Ito-Abe method[24] Hence, the proposed automatic discrete differentiation algorithm isindispensable for practical application of the discrete gradient methodforneuralnetworks. Seealso [17,22]. The target equations for this study are the differential equations with acertain geometric structure. The typical examples of the manifolds with such a2-tensor are the Riemannian manifold [4]and thesymplectic manifold [29].



1f3b0b15d6bb860dcfa6e5c8ba7d3d96-Paper-Conference.pdf

Neural Information Processing Systems

Proof.Theper-playith actioninhindsight pro lea is xed: '(a)= X However, see, thisratewouldbeveryslow, andtheconvergencemetric, notverystrong, e.g., inexpectediterates. Given >0, consider Moreauenvelop' (a) .


146f7dd4c91bc9d80cf4458ad6d6cd1b-AuthorFeedback.pdf

Neural Information Processing Systems

Loosely speaking, the margin of apoint depends on the output of the voting classifier,and does not involvethe7 sigmoid function. For base learners, the same size means the same number of leaves (and no restriction on depth for both algorithms37 compared). Inthe supplementalmaterial, submitted along withthe paper,we included the same experiment onthree more data45 sets to give 4 data sets of increasing size to analyze and demonstrate our new theoretical bound on. Themean validation errorandstandard deviation fortheForest Coverdataset47 example from the paper are (0.0298, 0.00037) for LightGBM and (0.0327, 0.00053) for AdaBoost. The standard48 deviation wasso small that we chose toonly show3runs on the plots.


Anchor Data Augmentation

Neural Information Processing Systems

We propose a novel algorithm for data augmentation in nonlinear over-parametrized regression. Our data augmentation algorithm borrows from the literature on causality. Contrary to the current state-of-the-art solutions that rely on modifications of Mixup algorithm, we extend the recently proposed distributionally robust Anchor regression (AR) method for data augmentation. Our Anchor Data Augmentation (ADA) uses several replicas of the modified samples in AR to provide more training examples, leading to more robust regression predictions. We apply ADA to linear and nonlinear regression problems using neural networks. ADA is competitive with state-of-the-art C-Mixup solutions.


A Continuous Mapping For Augmentation Design

Neural Information Processing Systems

Automated data augmentation (ADA) techniques have played an important role in boosting the performance of deep models. Such techniques mostly aim to optimize a parameterized distribution over a discrete augmentation space. Thus, are restricted by the discretization of the search space which normally is handcrafted. To overcome the limitations, we take the first step to constructing a continuous mapping from $\mathbb{R}^d$ to image transformations (an augmentation space). Using this mapping, we take a novel approach where 1) we pose the ADA as a continuous optimization problem over the parameters of the augmentation distribution; and 2) use Stochastic Gradient Langevin Dynamics to learn and sample augmentations. This allows us to potentially explore the space of infinitely many possible augmentations, which otherwise was not possible due to the discretization of the space. This view of ADA is radically different from the standard discretization based view of ADA, and it opens avenues for utilizing the vast efficient gradient-based algorithms available for continuous optimization problems. Results over multiple benchmarks demonstrate the efficiency improvement of this work compared with previous methods.