differentiating
Task Hierarchical Control via Null-Space Projection and Path Integral Approach
Patil, Apurva, Funada, Riku, Tanaka, Takashi, Sentis, Luis
This paper addresses the problem of hierarchical task control, where a robotic system must perform multiple subtasks with varying levels of priority. A commonly used approach for hierarchical control is the null-space projection technique, which ensures that higher-priority tasks are executed without interference from lower-priority ones. While effective, the state-of-the-art implementations of this method rely on low-level controllers, such as PID controllers, which can be prone to suboptimal solutions in complex tasks. This paper presents a novel framework for hierarchical task control, integrating the null-space projection technique with the path integral control method. Our approach leverages Monte Carlo simulations for real-time computation of optimal control inputs, allowing for the seamless integration of simpler PID-like controllers with a more sophisticated optimal control technique. Through simulation studies, we demonstrate the effectiveness of this combined approach, showing how it overcomes the limitations of traditional
- North America > United States > Texas > Travis County > Austin (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
The Curse of Unrolling: Rate of Differentiating Through Optimization
Computing the Jacobian of the solution of an optimization problem is a central problem in machine learning, with applications in hyperparameter optimization, meta-learning, optimization as a layer, and dataset distillation, to name a few. Unrolled differentiation is a popular heuristic that approximates the solution using an iterative solver and differentiates it through the computational path. This work provides a non-asymptotic convergence-rate analysis of this approach on quadratic objectives for gradient descent and the Chebyshev method. We show that to ensure convergence of the Jacobian, we can either 1) choose a large learning rate leading to a fast asymptotic convergence but accept that the algorithm may have an arbitrarily long burn-in phase or 2) choose a smaller learning rate leading to an immediate but slower convergence. We refer to this phenomenon as the curse of unrolling.Finally, we discuss open problems relative to this approach, such as deriving a practical update rule for the optimal unrolling strategy and making novel connections with the field of Sobolev orthogonal polynomials.
Differentiating Policies for Non-Myopic Bayesian Optimization
Nwankwo, Darian, Bindel, David
Bayesian optimization (BO) methods choose sample points by optimizing an acquisition function derived from a statistical model of the objective. These acquisition functions are chosen to balance sampling regions with predicted good objective values against exploring regions where the objective is uncertain. Standard acquisition functions are myopic, considering only the impact of the next sample, but non-myopic acquisition functions may be more effective. In principle, one could model the sampling by a Markov decision process, and optimally choose the next sample by maximizing an expected reward computed by dynamic programming; however, this is infeasibly expensive. More practical approaches, such as rollout, consider a parametric family of sampling policies. In this paper, we show how to efficiently estimate rollout acquisition functions and their gradients, enabling stochastic gradient-based optimization of sampling policies.
- North America > United States > New York > Tompkins County > Ithaca (0.04)
- North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
D-Flow: Differentiating through Flows for Controlled Generation
Ben-Hamu, Heli, Puny, Omri, Gat, Itai, Karrer, Brian, Singer, Uriel, Lipman, Yaron
Taming the generation outcome of state of the art Diffusion and Flow-Matching (FM) models without having to re-train a task-specific model unlocks a powerful tool for solving inverse problems, conditional generation, and controlled generation in general. In this work we introduce D-Flow, a simple framework for controlling the generation process by differentiating through the flow, optimizing for the source (noise) point. We motivate this framework by our key observation stating that for Diffusion/FM models trained with Gaussian probability paths, differentiating through the generation process projects gradient on the data manifold, implicitly injecting the prior into the optimization process. We validate our framework on linear and non-linear controlled generation problems including: image and audio inverse problems and conditional molecule generation reaching state of the art performance across all.
- North America > United States > California (0.04)
- Asia > Middle East > Israel (0.04)
Implicit Bilevel Optimization: Differentiating through Bilevel Optimization Programming
Bilevel Optimization Programming is used to model complex and conflicting interactions between agents, for example in Robust AI or Privacy-preserving AI. Integrating bilevel mathematical programming within deep learning is thus an essential objective for the Machine Learning community. Previously proposed approaches only consider single-level programming. In this paper, we extend existing single-level optimization programming approaches and thus propose Differentiating through Bilevel Optimization Programming (BiGrad) for end-to-end learning of models that use Bilevel Programming as a layer. BiGrad has wide applicability and can be used in modern machine learning frameworks. BiGrad is applicable to both continuous and combinatorial Bilevel optimization problems. We describe a class of gradient estimators for the combinatorial case which reduces the requirements in terms of computation complexity; for the case of the continuous variable, the gradient computation takes advantage of the push-back approach (i.e. vector-jacobian product) for an efficient implementation. Experiments show that the BiGrad successfully extends existing single-level approaches to Bilevel Programming.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)
- Government (0.47)
- Transportation (0.46)
Rotation Equivariant Operators for Machine Learning on Scalar and Vector Fields
Shen, Paul, Herbst, Michael, Viswanathan, Venkat
We develop theory and software for rotation equivariant operators on scalar and vector fields, with diverse applications in simulation, optimization and machine learning. Rotation equivariance (covariance) means all fields in the system rotate together, implying spatially invariant dynamics that preserve symmetry. Extending the convolution theorems of linear time invariant systems, we theorize that linear equivariant operators are characterized by tensor field convolutions using an appropriate product between the input field and a radially symmetric kernel field. Most Green's functions and differential operators are in fact equivariant operators, which can also fit unknown symmetry preserving dynamics by parameterizing the radial function. We implement the Julia package EquivariantOperators.jl for fully differentiable finite difference equivariant operators on scalar, vector and higher order tensor fields in 2d/3d. It can run forwards for simulation or image processing, or be back propagated for computer vision, inverse problems and optimal control. Code at https://aced-differentiate.github.io/EquivariantOperators.jl/
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
- North America > United States > Gulf of Mexico > Central GOM (0.05)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)
How IBM Research Is Differentiating Its Hybrid Cloud Platform with AI
AI has already begun to automate many non-mission critical business processes, including aspects of customer service and human resources. As the technology advances, new opportunities continue to emerge, in particular AI's ability to automate the movement to, and management of mission-critical workloads on hybrid cloud environments. Many businesses--especially those in highly regulated industries such as telecom, financial services and healthcare--are hesitant to move mission-critical workloads to the cloud. In fact, data from multiple sources reveals that only 20 percent of all workloads have moved to the cloud. Businesses further along in their journey understand the benefits of cloud use and often have already turned to the cloud for non-mission critical workloads. The accelerated proliferation of mission-critical applications--combined with the fact that more than 70 percent of organizations using public cloud are working with multiple vendors--means companies must approach the migration of these applications to a hybrid cloud environment using a four-phased approach: advise, move, build and manage.
Differentiating through the Fr\'echet Mean
Lou, Aaron, Katsman, Isay, Jiang, Qingxuan, Belongie, Serge, Lim, Ser-Nam, De Sa, Christopher
Recent advances in deep representation learning on Riemannian manifolds extend classical deep learning operations to better capture the geometry of the manifold. One possible extension is the Fr\'echet mean, the generalization of the Euclidean mean; however, it has been difficult to apply because it lacks a closed form with an easily computable derivative. In this paper, we show how to differentiate through the Fr\'echet mean for arbitrary Riemannian manifolds. Then, focusing on hyperbolic space, we derive explicit gradient expressions and a fast, accurate, and hyperparameter-free Fr\'echet mean solver. This fully integrates the Fr\'echet mean into the hyperbolic neural network pipeline. To demonstrate this integration, we present two case studies. First, we apply our Fr\'echet mean to the existing Hyperbolic Graph Convolutional Network, replacing its projected aggregation to obtain state-of-the-art results on datasets with high hyperbolicity. Second, to demonstrate the Fr\'echet mean's capacity to generalize Euclidean neural network operations, we develop a hyperbolic batch normalization method that gives an improvement parallel to the one observed in the Euclidean setting.
- North America > United States > New York (0.04)
- Europe > Netherlands > South Holland > Dordrecht (0.04)
Differentiating the Black-Box: Optimization with Local Generative Surrogates
Shirobokov, Sergey, Belavin, Vladislav, Kagan, Michael, Ustyuzhanin, Andrey, Baydin, Atılım Güneş
We propose a novel method for gradient-based optimization of black-box simulators using differentiable local surrogate models. In fields such as physics and engineering, many processes are modeled with non-differentiable simulators with intractable likelihoods. Optimization of these forward models is particularly challenging, especially when the simulator is stochastic. To address such cases, we introduce the use of deep generative models to iteratively approximate the simulator in local neighborhoods of the parameter space. We demonstrate that these local surrogates can be used to approximate the gradient of the simulator, and thus enable gradient-based optimization of simulator parameters. In cases where the dependence of the simulator on the parameter space is constrained to a low dimensional submanifold, we observe that our method attains minima faster than all baseline methods, including Bayesian optimization, numerical optimization, and REINFORCE driven approaches.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- North America > United States > California > San Mateo County > Menlo Park (0.04)
- (4 more...)
Mastering Backpropagation in Neural Network
In this article, we are going to learn one of the most important Machine Learning Algorithm which is Backpropagation in Neural Network in the simplest way ever. Let's feel in a Backpropagation way. Think of a situation where we are playing against an elite grandmaster chess player. We are badly defeated by him but the grandmaster allowed us to undo our steps and rectify the errors made during the game. After going through all the previous steps, we rectified most of our errors.