Goto

Collaborating Authors

A Experiment Details

Neural Information Processing Systems

Source code for the training pipeline, tasks, and models used in this work, is available as part of the supplementary material. We used the same Adam [48] optimizer for all our experiments and a learning rate of 0.001, and a batch size of 128. For solving the differential equations both during ground truth data generation as well as with the neural ODEs, we use the Tsitouras 5/4 Runge-Kutta (Tsit5) method from DifferentialEquations.jl [36]. A.1 Coupled Pendulum The coupled pendulum dynamics are defined as We train the MP-NODE on a dataset of 500 trajectories, each randomly initialized with state values between [ ฯ€/2, ฯ€/2] for the ฮธ and [ 1, 1] for ฮธ, with a time step of 0.1s and each trajectory 10s long. The dataset is normalized through Z-score normalization.


Learning Modular Simulations for Homogeneous Systems

Neural Information Processing Systems

Complex systems are often decomposed into modular subsystems for engineering tractability. Although various equation based white-box modeling techniques make use of such structure, learning based methods have yet to incorporate these ideas broadly. We present a modular simulation framework for modeling homogeneous multibody dynamical systems, which combines ideas from graph neural networks and neural differential equations. We learn to model the individual dynamical subsystem as a neural ODE module. Full simulation of the composite system is orchestrated via spatio-temporal message passing between these modules. An arbitrary number of modules can be combined to simulate systems of a wide variety of coupling topologies. We evaluate our framework on a variety of systems and show that message passing allows coordination between multiple modules over time for accurate predictions and in certain cases, enables zero-shot generalization to new system configurations. Furthermore, we show that our models can be transferred to new system configurations with lower data requirement and training effort, compared to those trained from scratch.


Scaling transformer neural networks for skillful and reliable medium-range weather forecasting

Neural Information Processing Systems

Weather forecasting is a fundamental problem for anticipating and mitigating the impacts of climate change. Recently, data-driven approaches for weather forecasting based on deep learning have shown great promise, achieving accuracies that are competitive with operational systems. However, those methods often employ complex, customized architectures without sufficient ablation analysis, making it difficult to understand what truly contributes to their success. Here we introduce Stormer, a simple transformer model that achieves state-of-the-art performance on weather forecasting with minimal changes to the standard transformer backbone. We identify the key components of Stormer through careful empirical analyses, including weather-specific embedding, randomized dynamics forecast, and pressure-weighted loss.


Global Convergence to Local Minmax Equilibrium in Classes of Nonconvex Zero-Sum Games

Neural Information Processing Systems

We study gradient descent-ascent learning dynamics with timescale separation (ฯ„-GDA) in unconstrained continuous action zero-sum games where the minimizing player faces a nonconvex optimization problem and the maximizing player optimizes a Polyak-ลojasiewicz (Pล) or strongly-concave (SC) objective. In contrast to past work on gradient-based learning in nonconvex-Pล/SC zero-sum games, we assess convergence in relation to natural game-theoretic equilibria instead of only notions of stationarity. In pursuit of this goal, we prove that the only locally stable points of the ฯ„-GDA continuous-time limiting system correspond to strict local minmax equilibria in each class of games. For these classes of games, we exploit timescale separation to construct a potential function that when combined with the stability characterization and an asymptotic saddle avoidance result gives a global asymptotic almost-sure convergence guarantee for the discrete-time gradient descent-ascent update to a set of the strict local minmax equilibrium. Moreover, we provide convergence rates for the gradient descent-ascent dynamics with timescale separation to approximate stationary points.


Generalized Block-Diagonal Structure Pursuit: Learning Soft Latent Task Assignment against Negative Transfer

Neural Information Processing Systems

In multi-task learning, a major challenge springs from a notorious issue known as negative transfer, which refers to the phenomenon that sharing the knowledge with dissimilar and hard tasks often results in a worsened performance. To circumvent this issue, we propose a novel multi-task learning method, which simultaneously learns latent task representations and a block-diagonal Latent Task Assignment Matrix (LTAM).



Appendix for Multi-task Graph Neural Architecture Search with Task-aware Collaboration and Curriculum

Neural Information Processing Systems

An operation w Model weight ฮฑ The architecture parameter N The number of chunks ฮธ The trainable parameter in the soft task-collaborative module p The parameter generated by Eq.(9) p The parameter generated by Eq.(11), replacing p during curriculum training ฮด The parameter to control graph structure diversity ฮณ The parameter to control task-wise curriculum training BNRist is the abbreviation of Beijing National Research Center for Information Science and Technology. Here we provide the detailed derivation process of Eq.(10). Then we use Eq.(9) to substitute We consider a search space of standard layer-by-layer architectures without sophisticated connections such as residual or jumping connections, though our proposed method can be easily generalized. We choose five widely used message-passing GNN layers as our operation candidate set O, including GCN [4], GAT [9], GIN [10], SAGE [2], k-GNN [5], and ARMA [3]. Besides, we also adopt MLP, which does not consider graph structures.


Multi-task Graph Neural Architecture Search with Task-aware Collaboration and Curriculum

Neural Information Processing Systems

Graph neural architecture search (GraphNAS) has shown great potential for automatically designing graph neural architectures for graph related tasks. However, multi-task GraphNAS, capable of handling multiple tasks simultaneously and capturing the complex relationships and dependencies between them, has been largely unexplored in literature.


Practical Two-Step Lookahead Bayesian Optimization

Neural Information Processing Systems

Expected improvement and other acquisition functions widely used in Bayesian optimization use a "one-step" assumption: they value objective function evaluations assuming no future evaluations will be performed. Because we usually evaluate over multiple steps, this assumption may leave substantial room for improvement. Existing theory gives acquisition functions looking multiple steps in the future but calculating them requires solving a high-dimensional continuous-state continuousaction Markov decision process (MDP). Fast exact solutions of this MDP remain out of reach of today's methods. As a result, previous two-and multi-step lookahead Bayesian optimization algorithms are either too expensive to implement in most practical settings or resort to heuristics that may fail to fully realize the promise of two-step lookahead. This paper proposes a computationally efficient algorithm that provides an accurate solution to the two-step lookahead Bayesian optimization problem in seconds to at most several minutes of computation per batch of evaluations. The resulting acquisition function provides increased query efficiency and robustness compared with previous two-and multi-step lookahead methods in both single-threaded and batch experiments. This unlocks the value of two-step lookahead in practice. We demonstrate the value of our algorithm with extensive experiments on synthetic test functions and real-world problems.