Goto

Collaborating Authors

 gradient approximation



Bridging Discrete and Backpropagation: Straight-Through and Beyond Liyuan Liu Chengyu Dong Xiaodong Liu Bin Y u Jianfeng Gao Microsoft Research

Neural Information Processing Systems

Backpropagation, the cornerstone of deep learning, is limited to computing gradients for continuous variables. This limitation poses challenges for problems involving discrete latent variables. To address this issue, we propose a novel approach to approximate the gradient of parameters involved in generating discrete latent variables. First, we examine the widely used Straight-Through (ST) heuristic and demonstrate that it works as a first-order approximation of the gradient. Guided by our findings, we propose ReinMax, which achieves second-order accuracy by integrating Heun's method, a second-order numerical method for solving ODEs. ReinMax does not require Hessian or other second-order derivatives, thus having negligible computation overheads. Extensive experimental results on various tasks demonstrate the superiority of ReinMax over the state of the art.








Low-rank surrogate modeling and stochastic zero-order optimization for training of neural networks with black-box layers

Chertkov, Andrei, Basharin, Artem, Saygin, Mikhail, Frolov, Evgeny, Straupe, Stanislav, Oseledets, Ivan

arXiv.org Artificial Intelligence

The growing demand for energy-efficient, high-performance AI systems has led to increased attention on alternative computing platforms (e.g., photonic, neuromorphic) due to their potential to accelerate learning and inference. However, integrating such physical components into deep learning pipelines remains challenging, as physical devices often offer limited expressiveness, and their non-differentiable nature renders on-device backpropagation difficult or infeasible. This motivates the development of hybrid architectures that combine digital neural networks with reconfigurable physical layers, which effectively behave as black boxes. In this work, we present a framework for the end-to-end training of such hybrid networks. This framework integrates stochastic zeroth-order optimization for updating the physical layer's internal parameters with a dynamic low-rank surrogate model that enables gradient propagation through the physical layer. A key component of our approach is the implicit projector-splitting integrator algorithm, which updates the lightweight surrogate model after each forward pass with minimal hardware queries, thereby avoiding costly full matrix reconstruction. We demonstrate our method across diverse deep learning tasks, including: computer vision, audio classification, and language modeling. Notably, across all modalities, the proposed approach achieves near-digital baseline accuracy and consistently enables effective end-to-end training of hybrid models incorporating various non-differentiable physical components (spatial light modulators, microring resonators, and Mach-Zehnder interferometers). This work bridges hardware-aware deep learning and gradient-free optimization, thereby offering a practical pathway for integrating non-differentiable physical components into scalable, end-to-end trainable AI systems.


WoSNN: Stochastic Solver for PDEs with Machine Learning

Song, Silei, Fahim, Arash, Mascagni, Michael

arXiv.org Artificial Intelligence

Solving elliptic partial differential equations (PDEs) is a fundamental step in various scientific and engineering studies. As a classic stochastic solver, the Walk-on-Spheres (WoS) method is a well-established and efficient algorithm that provides accurate local estimates for PDEs. In this paper, by integrating machine learning techniques with WoS and space discretization approaches, we develop a novel stochastic solver, WoS-NN. This new method solves elliptic problems with Dirichlet boundary conditions, facilitating precise and rapid global solutions and gradient approximations. The method inherits excellent characteristics from the original WoS method, such as being meshless and robust to irregular regions. By integrating neural networks, WoS-NN also gives instant local predictions after training without re-sampling, which is especially suitable for intense requests on a static region. A typical experimental result demonstrates that the proposed WoS-NN method provides accurate field estimations, reducing errors by around $75\%$ while using only $8\%$ of path samples compared to the conventional WoS method, which saves abundant computational time and resource consumption.