tensorflow
Supplementary Material for Paper " Terra: Imperative-Symbolic Co-Execution of Imperative Deep Learning Programs "
For example, the MatMul operation of TensorFlow has'MatMul' as As same as the call id stack, Terra manages the loop id stack for the entire program execution. Figure 2: The result of the case assignment algorithm for the given TraceGraph.2 4 In this section, we describe the case assignment algorithm that Terra uses to explicitly insert the Switch-Case operations in the symbolic graph. The algorithm takes a TraceGraph as an input and returns an ordered list of switch-cases. A switch-case 6is a set of (basic block, control edges) where thebasic block is a linear3 chain of nodes, and the5control edges are the edges that point to the basic block. Every non-overlapping linear chain of nodes in the TraceGraph is uniquely assigned to a basic block so that the ordered list of3switch-cases 5can cover every trace in the TraceGraph.
Automatic differentiation in ML: Where we are and where we should be going
Bart van Merrienboer, Olivier Breuleux, Arnaud Bergeron, Pascal Lamblin
We review the current state of automatic differentiation (AD) for array programming in machine learning (ML), including the different approaches such as operator overloading (OO) and source transformation (ST) used for AD, graph-based intermediate representations for programs, and source languages. Based on these insights, we introduce a new graph-based intermediate representation (IR) which specifically aims to efficiently support fully-general AD for array programming. Unlike existing dataflow programming representations in ML frameworks, our IR naturally supports function calls, higher-order functions and recursion, making ML models easier to implement. The ability to represent closures allows us to perform AD using ST without a tape, making the resulting derivative (adjoint) program amenable to ahead-of-time optimization using tools from functional language compilers, and enabling higher-order derivatives. Lastly, we introduce a proof of concept compiler toolchain called Myia which uses a subset of Python as a front end.
Scalable Planning with Tensorflow for Hybrid Nonlinear Domains
Given recent deep learning results that demonstrate the ability to effectively optimize high-dimensional non-convex functions with gradient descent optimization on GPUs, we ask in this paper whether symbolic gradient optimization tools such as Tensorflow can be effective for planning in hybrid (mixed discrete and continuous) nonlinear domains with high dimensional state and action spaces? To this end, we demonstrate that hybrid planning with Tensorflow and RMSProp gradient descent is competitive with mixed integer linear program (MILP) based optimization on piecewise linear planning domains (where we can compute optimal solutions) and substantially outperforms state-of-the-art interior point methods for nonlinear planning domains. Furthermore, we remark that Tensorflow is highly scalable, converging to a strong plan on a large-scale concurrent domain with a total of 576,000 continuous action parameters distributed over a horizon of 96 time steps and 100 parallel instances in only 4 minutes. We provide a number of insights that clarify such strong performance including observations that despite long horizons, RMSProp avoids both the vanishing and exploding gradient problems. Together these results suggest a new frontier for highly scalable planning in nonlinear hybrid domains by leveraging GPUs and the power of recent advances in gradient descent with highly optimized toolkits like Tensorflow.