Not enough data to create a plot.
Try a different view from the menu above.
Shi, Wenlei
Flaming-hot Initiation with Regular Execution Sampling for Large Language Models
Chen, Weizhe, Zhang, Zhicheng, Liu, Guanlin, Zheng, Renjie, Shi, Wenlei, Dun, Chen, Wu, Zheng, Jin, Xing, Yan, Lin
Since the release of ChatGPT, large language models (LLMs) have demonstrated remarkable capabilities across various domains. A key challenge in developing these general capabilities is efficiently sourcing diverse, high-quality data. This becomes especially critical in reasoning-related tasks with sandbox checkers, such as math or code, where the goal is to generate correct solutions to specific problems with higher probability. In this work, we introduce Flaming-hot Initiation with Regular Execution (FIRE) sampling, a simple yet highly effective method to efficiently find good responses. Our empirical findings show that FIRE sampling enhances inference-time generation quality and also benefits training in the alignment stage. Furthermore, we explore how FIRE sampling improves performance by promoting diversity and analyze the impact of employing FIRE at different positions within a response.
Process Supervision-Guided Policy Optimization for Code Generation
Dai, Ning, Wu, Zheng, Zheng, Renjie, Wei, Ziyun, Shi, Wenlei, Jin, Xing, Liu, Guanlin, Dun, Chen, Huang, Liang, Yan, Lin
Reinforcement Learning (RL) with unit test feedback has enhanced large language models (LLMs) code generation, but relies on sparse rewards provided only after complete code evaluation, limiting learning efficiency and incremental improvements. When generated code fails all unit tests, no learning signal is received, hindering progress on complex tasks. To address this, we propose a Process Reward Model (PRM) that delivers dense, line-level feedback on code correctness during generation, mimicking human code refinement and providing immediate guidance. We explore various strategies for training PRMs and integrating them into the RL framework, finding that using PRMs both as dense rewards and for value function initialization significantly boosts performance. Our approach increases our in-house LLM's pass rate from 28.2% to 29.8% on LiveCodeBench and from 31.8% to 35.8% on our internal benchmark. Our experimental results highlight the effectiveness of PRMs in enhancing RL-driven code generation, especially for long-horizon scenarios.
LordNet: An Efficient Neural Network for Learning to Solve Parametric Partial Differential Equations without Simulated Data
Huang, Xinquan, Shi, Wenlei, Gao, Xiaotian, Wei, Xinran, Zhang, Jia, Bian, Jiang, Yang, Mao, Liu, Tie-Yan
Neural operators, as a powerful approximation to the non-linear operators between infinite-dimensional function spaces, have proved to be promising in accelerating the solution of partial differential equations (PDE). However, it requires a large amount of simulated data, which can be costly to collect. This can be avoided by learning physics from the physics-constrained loss, which we refer to it as mean squared residual (MSR) loss constructed by the discretized PDE. We investigate the physical information in the MSR loss, which we called long-range entanglements, and identify the challenge that the neural network requires the capacity to model the long-range entanglements in the spatial domain of the PDE, whose patterns vary in different PDEs. To tackle the challenge, we propose LordNet, a tunable and efficient neural network for modeling various entanglements. Inspired by the traditional solvers, LordNet models the long-range entanglements with a series of matrix multiplications, which can be seen as the low-rank approximation to the general fully-connected layers and extracts the dominant pattern with reduced computational cost. The experiments on solving Poisson's equation and (2D and 3D) Navier-Stokes equation demonstrate that the long-range entanglements from the MSR loss can be well modeled by the LordNet, yielding better accuracy and generalization ability than other neural networks. The results show that the Lordnet can be $40\times$ faster than traditional PDE solvers. In addition, LordNet outperforms other modern neural network architectures in accuracy and efficiency with the smallest parameter size.
Monte Carlo Neural PDE Solver for Learning PDEs via Probabilistic Representation
Zhang, Rui, Meng, Qi, Zhu, Rongchan, Wang, Yue, Shi, Wenlei, Zhang, Shihua, Ma, Zhi-Ming, Liu, Tie-Yan
In scenarios with limited available or high-quality data, training the function-to-function neural PDE solver in an unsupervised manner is essential. However, the efficiency and accuracy of existing methods are constrained by the properties of numerical algorithms, such as finite difference and pseudo-spectral methods, integrated during the training stage. These methods necessitate careful spatiotemporal discretization to achieve reasonable accuracy, leading to significant computational challenges and inaccurate simulations, particularly in cases with substantial spatiotemporal variations. To address these limitations, we propose the Monte Carlo Neural PDE Solver (MCNP Solver) for training unsupervised neural solvers via the PDEs' probabilistic representation, which regards macroscopic phenomena as ensembles of random particles. Compared to other unsupervised methods, MCNP Solver naturally inherits the advantages of the Monte Carlo method, which is robust against spatiotemporal variations and can tolerate coarse step size. In simulating the random walk of particles, we employ Heun's method for the convection process and calculate the expectation via the probability density function of neighbouring grid points during the diffusion process. These techniques enhance accuracy and circumvent the computational memory and time issues associated with Monte Carlo sampling, offering an improvement over traditional Monte Carlo methods. Our numerical experiments on convection-diffusion, Allen-Cahn, and Navier-Stokes equations demonstrate significant improvements in accuracy and efficiency compared to other unsupervised baselines. The source code will be publicly available at: https://github.com/optray/MCNP.
NeuralStagger: Accelerating Physics-constrained Neural PDE Solver with Spatial-temporal Decomposition
Huang, Xinquan, Shi, Wenlei, Meng, Qi, Wang, Yue, Gao, Xiaotian, Zhang, Jia, Liu, Tie-Yan
Neural networks have shown great potential in accelerating the solution of partial differential equations (PDEs). Recently, there has been a growing interest in introducing physics constraints into training neural PDE solvers to reduce the use of costly data and improve the generalization ability. However, these physics constraints, based on certain finite dimensional approximations over the function space, must resolve the smallest scaled physics to ensure the accuracy and stability of the simulation, resulting in high computational costs from large input, output, and neural networks. This paper proposes a general acceleration methodology called NeuralStagger by spatially and temporally decomposing the original learning tasks into several coarser-resolution subtasks. We define a coarse-resolution neural solver for each subtask, which requires fewer computational resources, and jointly train them with the vanilla physics-constrained loss by simply arranging their outputs to reconstruct the original solution. Due to the perfect parallelism between them, the solution is achieved as fast as a coarse-resolution neural solver. In addition, the trained solvers bring the flexibility of simulating with multiple levels of resolution. We demonstrate the successful application of NeuralStagger on 2D and 3D fluid dynamics simulations, which leads to an additional $10\sim100\times$ speed-up. Moreover, the experiment also shows that the learned model could be well used for optimal control.
Learning Physics-Informed Neural Networks without Stacked Back-propagation
He, Di, Li, Shanda, Shi, Wenlei, Gao, Xiaotian, Zhang, Jia, Bian, Jiang, Wang, Liwei, Liu, Tie-Yan
Physics-Informed Neural Network (PINN) has become a commonly used machine learning approach to solve partial differential equations (PDE). But, facing high-dimensional secondorder PDE problems, PINN will suffer from severe scalability issues since its loss includes second-order derivatives, the computational cost of which will grow along with the dimension during stacked back-propagation. In this work, we develop a novel approach that can significantly accelerate the training of Physics-Informed Neural Networks. In particular, we parameterize the PDE solution by the Gaussian smoothed model and show that, derived from Stein's Identity, the second-order derivatives can be efficiently calculated without back-propagation. We further discuss the model capacity and provide variance reduction methods to address key limitations in the derivative estimation. Experimental results show that our proposed method can achieve competitive error compared to standard PINN training but is significantly faster. Our code is released at https://github.com/LithiumDA/PINN-without-Stacked-BP.
Cooperative Policy Learning with Pre-trained Heterogeneous Observation Representations
Shi, Wenlei, Wei, Xinran, Zhang, Jia, Ni, Xiaoyuan, Jiang, Arthur, Bian, Jiang, Liu, Tie-Yan
Multi-agent reinforcement learning (MARL) has been increasingly explored to learn the cooperative policy towards maximizing a certain global reward. Many existing studies take advantage of graph neural networks (GNN) in MARL to propagate critical collaborative information over the interaction graph, built upon inter-connected agents. Nevertheless, the vanilla GNN approach yields substantial defects in dealing with complex real-world scenarios since the generic message passing mechanism is ineffective between heterogeneous vertices and, moreover, simple message aggregation functions are incapable of accurately modeling the combinational interactions from multiple neighbors. While adopting complex GNN models with more informative message passing and aggregation mechanisms can obviously benefit heterogeneous vertex representations and cooperative policy learning, it could, on the other hand, increase the training difficulty of MARL and demand more intense and direct reward signals compared to the original global reward. To address these challenges, we propose a new cooperative learning framework with pre-trained heterogeneous observation representations. Particularly, we employ an encoder-decoder based graph attention to learn the intricate interactions and heterogeneous representations that can be more easily leveraged by MARL. Moreover, we design a pre-training with local actor-critic algorithm to ease the difficulty in cooperative policy learning. Extensive experiments over real-world scenarios demonstrate that our new approach can significantly outperform existing MARL baselines as well as operational research solutions that are widely-used in industry.