differentiation
Automatic differentiation in ML: Where we are and where we should be going
We review the current state of automatic differentiation (AD) for array programming in machine learning (ML), including the different approaches such as operator overloading (OO) and source transformation (ST) used for AD, graph-based intermediate representations for programs, and source languages. Based on these insights, we introduce a new graph-based intermediate representation (IR) which specifically aims to efficiently support fully-general AD for array programming. Unlike existing dataflow programming representations in ML frameworks, our IR naturally supports function calls, higher-order functions and recursion, making ML models easier to implement. The ability to represent closures allows us to perform AD using ST without a tape, making the resulting derivative (adjoint) program amenable to ahead-of-time optimization using tools from functional language compilers, and enabling higher-order derivatives. Lastly, we introduce a proof of concept compiler toolchain called Myia which uses a subset of Python as a front end.
Tangent: Automatic differentiation using source-code transformation for dynamically typed array programming
The need to efficiently calculate first-and higher-order derivatives of increasingly complex models expressed in Python has stressed or exceeded the capabilities of available tools. In this work, we explore techniques from the field of automatic differentiation (AD) that can give researchers expressive power, performance and strong usability. These include source-code transformation (SCT), flexible gradient surgery, efficient in-place array operations, and higher-order derivatives. We implement and demonstrate these ideas in the Tangent software library for Python, the first AD framework for a dynamic language that uses SCT.
Backpropagation with Callbacks: Foundations for Efficient and Expressive Differentiable Programming
Training of deep learning models depends on gradient descent and end-to-end differentiation. Under the slogan of differentiable programming, there is an increasing demand for efficient automatic gradient computation for emerging network architectures that incorporate dynamic control flow, especially in NLP. In this paper we propose an implementation of backpropagation using functions with callbacks, where the forward pass is executed as a sequence of function calls, and the backward pass as a corresponding sequence of function returns. A key realization is that this technique of chaining callbacks is well known in the programming languages community as continuation-passing style (CPS). Any program can be converted to this form using standard techniques, and hence, any program can be mechanically converted to compute gradients. Our approach achieves the same flexibility as other reverse-mode automatic differentiation (AD) techniques, but it can be implemented without any auxiliary data structures besides the function call stack, and it can easily be combined with graph construction and native code generation techniques through forms of multi-stage programming, leading to a highly efficient implementation that combines the performance benefits of define-then-run software frameworks such as TensorFlow with the expressiveness of define-by-run frameworks such as PyTorch.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
- Europe > France > Provence-Alpes-Côte d'Azur > Alpes-Maritimes > Nice (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
One-step differentiation of iterative algorithms
For iterative algorithms, implicit differentiation alleviates this issue but requires custom implementation of Jacobian evaluation. In this paper, we study one-step differentiation, also known as Jacobian-free backpropagation, a method as easy as automatic differentiation and as efficient as implicit differentiation for fast algorithms (e.g., superlinear
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France > Provence-Alpes-Côte d'Azur > Alpes-Maritimes > Nice (0.04)
- Europe > Finland > Uusimaa > Helsinki (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > India (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
- Information Technology > Data Science (0.67)
Lipschitz regularity of deep neural networks: analysis and efficient estimation
Deep neural networks are notorious for being sensitive to small well-chosen perturbations, and estimating the regularity of such architectures is of utmost importance for safe and robust practical applications. In this paper, we investigate one of the key characteristics to assess the regularity of such methods: the Lipschitz constant of deep learning architectures.
- North America > United States > California > Los Angeles County > Long Beach (0.05)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Finland > Uusimaa > Helsinki (0.04)
- North America > United States > California > Santa Clara County > San Jose (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Monaco (0.04)
- (4 more...)
- Research Report > Experimental Study (1.00)
- Overview (0.92)
- Workflow (0.67)
- Asia > Middle East > Jordan (0.06)
- North America > United States > California > Alameda County > Berkeley (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Software (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)