Plotting



Supplementary material for Variational Automatic Curriculum Learning for Sparse-Reward Cooperative Multi-Agent Problems

Neural Information Processing Systems

All the source code can be found at our project website https://sites.google.com/view/ In order to prove Theorem 1, we introduce the following lemma, which uses Assumption 1. Lemma 1. The proof is largely based on [2]. Pure Task Expansion Results on MPE: VACL contains entity progression in the result of Figure 1. To specifically study the performance of task expansion, we exclude entity progression module from VACL and compare with baselines in Simple-Spread with n = 4 and Push-Ball with n = 2.



Consistency Diffusion Bridge Models

Neural Information Processing Systems

Diffusion models (DMs) have become the dominant paradigm of generative modeling in a variety of domains by learning stochastic processes from noise to data. Recently, diffusion denoising bridge models (DDBMs), a new formulation of generative modeling that builds stochastic processes between fixed data endpoints based on a reference diffusion process, have achieved empirical success across tasks with coupled data distribution, such as image-to-image translation. However, DDBM's sampling process typically requires hundreds of network evaluations to achieve decent performance, which may impede their practical deployment due to high computational demands. In this work, inspired by the recent advance of consistency models in DMs, we tackle this problem by learning the consistency function of the probability-flow ordinary differential equation (PF-ODE) of DDBMs, which directly predicts the solution at a starting step given any point on the ODE trajectory. Based on a dedicated general-form ODE solver, we propose two paradigms: consistency bridge distillation and consistency bridge training, which is flexible to apply on DDBMs with broad design choices. Experimental results show that our proposed method could sample 4 to 50 faster than the base DDBM and produce better visual quality given the same step in various tasks with pixel resolution ranging from 64 64 to 256 256, as well as supporting downstream tasks such as semantic interpolation in the data space.




Neuro-Symbolic Data Generation for Math Reasoning Zenan Li1 Zhi Zhou 1 Yuan Yao

Neural Information Processing Systems

A critical question about Large Language Models (LLMs) is whether their apparent deficiency in mathematical reasoning is inherent, or merely a result of insufficient exposure to high-quality mathematical data. To explore this, we developed an automated method for generating high-quality, supervised mathematical datasets. The method carefully mutates existing math problems, ensuring both diversity and validity of the newly generated problems. This is achieved by a neuro-symbolic data generation framework combining the intuitive informalization strengths of LLMs, and the precise symbolic reasoning of math solvers along with projected Markov chain Monte Carlo sampling in the highly-irregular symbolic space. Empirical experiments demonstrate the high quality of data generated by the proposed method, and that the LLMs, specifically LLaMA-2 and Mistral, when realigned with the generated data, surpass their state-of-the-art counterparts.


Scalarization for Multi-Task and Multi-Domain Learning at Scale

Neural Information Processing Systems

Training a single model on multiple input domains and/or output tasks allows for compressing information from multiple sources into a unified backbone hence improves model efficiency. It also enables potential positive knowledge transfer across tasks/domains, leading to improved accuracy and data-efficient training. However, optimizing such networks is a challenge, in particular due to discrepancies between the different tasks or domains: Despite several hypotheses and solutions proposed over the years, recent work has shown that uniform scalarization training, i.e., simply minimizing the average of the task losses, yields on-par performance with more costly SotA optimization methods. This raises the issue of how well we understand the training dynamics of multi-task and multi-domain networks. In this work, we first devise a large-scale unified analysis of multi-domain and multi-task learning to better understand the dynamics of scalarization across varied task/domain combinations and model sizes. Following these insights, we then propose to leverage population-based training to efficiently search for the optimal scalarization weights when dealing with a large number of tasks or domains.