Goto

Collaborating Authors

 Country


PyTorch: An Imperative Style, High-Performance Deep Learning Library

arXiv.org Machine Learning

Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it provides an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs. In this paper, we detail the principles that drove the implementation of PyTorch and how they are reflected in its architecture. We emphasize that every aspect of PyTorch is a regular Python program under the full control of its user. We also explain how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance. We demonstrate the efficiency of individual subsystems, as well as the overall speed of PyTorch on several common benchmarks.


Online and Bandit Algorithms for Nonstationary Stochastic Saddle-Point Optimization

arXiv.org Machine Learning

Saddle-point optimization problems are an important class of optimization problems with applications to game theory, multi-agent reinforcement learning and machine learning. A majority of the rich literature available for saddle-point optimization has focused on the offline setting. In this paper, we study nonstationary versions of stochastic, smooth, strongly-convex and strongly-concave saddle-point optimization problem, in both online (or first-order) and multi-point bandit (or zeroth-order) settings. We first propose natural notions of regret for such nonstationary saddle-point optimization problems. We then analyze extragradient and Frank-Wolfe algorithms, for the unconstrained and constrained settings respectively, for the above class of nonstationary saddle-point optimization problems. We establish sub-linear regret bounds on the proposed notions of regret in both the online and bandit setting.


A Study of Black Box Adversarial Attacks in Computer Vision

arXiv.org Machine Learning

Machine learning has seen tremendous advances in the past few years which has lead to deep learning models being deployed in varied applications of day-to-day life. Attacks on such models using perturbations, particularly in real-life scenarios, pose a serious challenge to their applicability, pushing research into the direction which aims to enhance the robustness of these models. After the introduction of these perturbations by Szegedy et al., significant amount of research has focused on the reliability of such models, primarily in two aspects - white-box, where the adversary has access to the targeted model and related parameters; and the black-box, which resembles a real-life scenario with the adversary having almost no knowledge of the model to be attacked. We propose to attract attention on the latter scenario and thus, present a comprehensive comparative study among the different adversarial black-box attack approaches proposed till date. The second half of this literature survey focuses on the defense techniques. This is the first study, to the best of our knowledge, that specifically focuses on the black-box setting to motivate future work on the same.


Large scale representation learning from triplet comparisons

arXiv.org Machine Learning

In this paper, we discuss the fundamental problem of representation learning from a new perspective. It has been observed in many supervised/unsupervised DNNs that the final layer of the network often provides an informative representation for many tasks, even though the network has been trained to perform a particular task. The common ingredient in all previous studies is a low-level feature representation for items, for example, RGB values of images in the image context. In the present work, we assume that no meaningful representation of the items is given. Instead, we are provided with the answers to some triplet comparisons of the following form: Is item A more similar to item B or item C? We provide a fast algorithm based on DNNs that constructs a Euclidean representation for the items, using solely the answers to the above-mentioned triplet comparisons. This problem has been studied in a sub-community of machine learning by the name "Ordinal Embedding". Previous approaches to the problem are painfully slow and cannot scale to larger datasets. We demonstrate that our proposed approach is significantly faster than available methods, and can scale to real-world large datasets. Thereby, we also draw attention to the less explored idea of using neural networks to directly, approximately solve non-convex, NPhard optimization problems that arise naturally in unsupervised learning problems. It has been widely recognized that deep neural networks (DNN) provide a powerful tool for representation learning (Bengio et al., 2013). Representations learned in an unsupervised fashion have been demonstrated to be useful in learning tasks such as classification (Ranzato et al., 2007; 2008; Hinton & Salakhutdinov, 2008; Hinton et al., 2006; Bengio et al., 2007). In the context of supervised learning, representations are typically learned as byproducts in neural networks (Radford et al., 2015). For example in image classification, low level representations of inputs (e.g., rgb values) are fed to a network, together with class label information, the network is trained to perform some supervised classification. As a byproduct it discovers a condensed data representation in the last hidden layers of the network that turns out to be surprisingly successful for other computer vision tasks such as object detection or semantic segmentation (Girshick et al., 2014; K ummerer et al., 2014; Long et al., 2015; Ren et al., 2015).


Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation

arXiv.org Machine Learning

In many environments, only a relatively small subset of the complete state space is necessary in order to accomplish a given task. We develop a simple technique using emergency stops (e-stops) to exploit this phenomenon. Using e-stops significantly improves sample complexity by reducing the amount of required exploration, while retaining a performance bound that efficiently trades off the rate of convergence with a small asymptotic sub-optimality gap. We analyze the regret behavior of e-stops and present empirical results in discrete and continuous settings demonstrating that our reset mechanism can provide order-of-magnitude speedups on top of existing reinforcement learning methods.


Stochastic Newton and Cubic Newton Methods with Simple Local Linear-Quadratic Rates

arXiv.org Machine Learning

We present two new remarkably simple stochastic second-order methods for minimizing the average of a very large number of sufficiently smooth and strongly convex functions. The first is a stochastic variant of Newton's method (SN), and the second is a stochastic variant of cubically regularized Newton's method (SCN). We establish local linear-quadratic convergence results. Unlike existing stochastic variants of second order methods, which require the evaluation of a large number of gradients and/or Hessians in each iteration to guarantee convergence, our methods do not have this shortcoming. For instance, the simplest variants of our methods in each iteration need to compute the gradient and Hessian of a {\em single} randomly selected function only. In contrast to most existing stochastic Newton and quasi-Newton methods, our approach guarantees local convergence faster than with first-order oracle and adapts to the problem's curvature. Interestingly, our method is not unbiased, so our theory provides new intuition for designing new stochastic methods.


Leveraging Procedural Generation to Benchmark Reinforcement Learning

arXiv.org Machine Learning

This evidence raises the possibility that overfitting pervades classic benchmarks like the Arcade Learning Environment (ALE) (Bellemare et al., 2013), which has long served as a gold standard in RL. While the diversity between games in the ALE is one of the benchmark's greatest strengths, the low emphasis on generalization presents a significant drawback. Previous work has sought to alleviate overfitting in the ALE by introducing sticky actions (Machado et al., 2018) or by embedding natural videos as backgrounds (Zhang et al., 2018b), but these methods only superficially address the underlying problem -- that agents perpetually encounter near-identical states. For each game the question must be asked: are agents robustly learning a relevant skill, or are they approximately memorizing specific trajectories? There have been several investigations of generalization in RL (Farebrother et al., 2018; Packer et al., 2018; Zhang et al., 2018a; Lee et al., 2019), but progress has largely proved elusive. Arguably one of the principal setbacks has been the lack of environments well-suited to measure generalization.


Learning Spatially Structured Image Transformations Using Planar Neural Networks

arXiv.org Machine Learning

Learning Spatially Structured Image Transformations Using Planar Neural Networks Joel Michelson, Joshua H. Palmer, Aneesha Dasari, and Maithilee Kunda Electrical Engineering and Computer Science, V anderbilt University, Nashville TN, USA Abstract --Learning image transformations is essential to the idea of mental simulation as a method of cognitive inference. We take a connectionist modeling approach, using planar neural networks to learn fundamental imagery transformations, like translation, rotation, and scaling, from perceptual experiences in the form of image sequences. We investigate how variations in network topology, training data, and image shape, among other factors, affect the efficiency and effectiveness of learning visual imagery transformations, including effectiveness of transfer to operating on new types of data. I NTRODUCTION Visuospatial reasoning is ubiquitous in everyday human intelligence. In addition to its reliance on semantic knowledge about objects, categories, and scenes, visuospatial reasoning also requires non-semantic knowledge about object shapes, spatial relationships, etc., including, for example [1] (p. 182): "Transforming the spatial codings of objects, including expansions or reductions in size, rotation, [etc.]...accumulating sequences of such changes and visualizing change over time...." We do not know exactly how the human brain represents such non-semantic visuospatial knowledge about transformations, but we do know that this knowledge is learned through real-world perceptual experiences, especially in infancy and early childhood [2]; and that it is often deployed through top-down neural activations in brain regions associated with visual perception, i.e., using visual mental imagery [3]. Only a few studies have examined how AI systems can represent and learn transformation-based reasoning operations like image rotation from perceptual experience. One early study represented each operation as a distributed set of weights in a single-layer, 2D connectionist network, and used the perceptron learning rule to learn each operation in a supervised fashion from image sequences depicting that operation [4].


Simpson's Paradox and the implications for medical trials

arXiv.org Machine Learning

This paper describes Simpson's paradox, and explains its serious implications for randomised control trials. In particular, we show that for any number of variables we can simulate the result of a controlled trial which uniformly point s to one conclusion ( such as'drug is effective') for every possible combination of the variable states, but when a previously unobserved confounding variable is included every possible combination of the variables state points to the opposite conclusion ('drug is not effectiv e'). In other words no matter how many variables are considered, and no matter how'conclusive' the result, one cannot conclude the result is truly'valid' since there is theoretically an unobserved confounding variable that could completely reverse the re sult.


Optimal Laplacian regularization for sparse spectral community detection

arXiv.org Machine Learning

ABSTRACT Regularization of the classical Laplacian matrices was empirically shown to improve spectral clustering in sparse networks. It was observed that small regularizations are preferable, but this point was left as a heuristic argument. In this paper we formally determine a proper regularization which is intimately related to alternative state-of-the-art spectral techniques for sparse graphs. Index T erms-- Regularized Laplacian, Bethe-Hessian, spectral clustering, sparse networks, community detection 1. INTRODUCTION Community detection [1] is one of the central unsupervised learning tasks on graphs. The community detection problem has vast applications in different fields of science [2] and can be seen as the simplest form of clustering, i.e. the problem of dividing objects into similarity classes.