Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance

Neural Information Processing Systems

Recent controllable generation approaches such as FreeControl [24] and Diffusion Self-Guidance [7] bring fine-grained spatial and appearance control to text-toimage (T2I) diffusion models without training auxiliary modules.


Oracle Inequalities for Model Selection in Offline Reinforcement Learning

Neural Information Processing Systems

In offline reinforcement learning (RL), a learner leverages prior logged data to learn a good policy without interacting with the environment. A major challenge in applying such methods in practice is the lack of both theoretically principled and practical tools for model selection and evaluation. To address this, we study the problem of model selection in offline RL with value function approximation. The learner is given a nested sequence of model classes to minimize squared Bellman error and must select among these to achieve a balance between approximation and estimation error of the classes. We propose the first model selection algorithm for offline RL that achieves minimax rate-optimal oracle inequalities up to logarithmic factors.


HiCo: Hierarchical Controllable Diffusion Model for Layout-to-image Generation

Neural Information Processing Systems

The task of layout-to-image generation involves synthesizing images based on the captions of objects and their spatial positions. Existing methods still struggle in complex layout generation, where common bad cases include object missing, inconsistent lighting, conflicting view angles, etc. To effectively address these issues, we propose a Hierarchical Controllable (HiCo) diffusion model for layout-to-image generation, featuring object seperable conditioning branch structure. Our key insight is to achieve spatial disentanglement through hierarchical modeling of layouts. We use a multi branch structure to represent hierarchy and aggregate them in fusion module.


Block Broyden's Methods for Solving Nonlinear Equations

Neural Information Processing Systems

This paper studies quasi-Newton methods for solving nonlinear equations. We propose block variants of both good and bad Broyden's methods, which enjoy explicit local superlinear convergence rates. Our block good Broyden's method has a faster condition-number-free convergence rate than existing Broyden's methods because it takes the advantage of multiple rank modification on Jacobian estimator. On the other hand, our block bad Broyden's method directly estimates the inverse of the Jacobian provably, which reduces the computational cost of the iteration. Our theoretical results provide some new insights on why good Broyden's method outperforms bad Broyden's method in most of the cases. The empirical results also demonstrate the superiority of our methods and validate our theoretical analysis.



Graph Structure Inference with BAM: Neural Dependency Processing via Bilinear Attention

Neural Information Processing Systems

Detecting dependencies among variables is a fundamental task across scientific disciplines. We propose a novel neural network model for graph structure inference, which aims to learn a mapping from observational data to the corresponding underlying dependence structures. The model is trained with variably shaped and coupled simulated input data and requires only a single forward pass through the trained network for inference. Central to our approach is a novel bilinear attention mechanism (BAM) operating on covariance matrices of transformed data while respecting the geometry of the manifold of symmetric positive definite (SPD) matrices. Inspired by graphical lasso methods, our model optimizes over continuous graph representations in the SPD space, where inverse covariance matrices encode conditional independence relations. Empirical evaluations demonstrate the robustness of our method in detecting diverse dependencies, excelling in undirected graph estimation and showing competitive performance in completed partially directed acyclic graph estimation via a novel two-step approach. The trained model effectively detects causal relationships and generalizes well across different functional forms of nonlinear dependencies.



Spiking Token Mixer: An Event-Driven Friendly Former Structure for Spiking Neural Networks Shikuang Deng

Neural Information Processing Systems

Spiking neural networks (SNNs), inspired by biological processes, use spike signals for inter-layer communication, presenting an energy-efficient alternative to traditional neural networks. To realize the theoretical advantages of SNNs in energy efficiency, it is essential to deploy them onto neuromorphic chips. On clock-driven synchronous chips, employing shorter time steps can enhance energy efficiency but reduce SNN performance. Compared to the clock-driven synchronous chip, the event-driven asynchronous chip achieves much lower energy consumption but only supports some specific network operations. Recently, a series of SNN projects have achieved tremendous success, significantly improving the SNN's performance. However, event-driven asynchronous chips do not support some of the proposed structures, making it impossible to integrate these SNNs into asynchronous hardware. In response to these problems, we propose the Spiking Token Mixer (STMixer) architecture, which consists exclusively of operations supported by asynchronous scenarios, including convolutional, fully connected layers and residual paths. Our series of experiments also demonstrates that STMixer achieves performance on par with spiking transformers in synchronous scenarios with very low timesteps. This indicates its ability to achieve the same level of performance with lower power consumption in synchronous scenarios.


A Related Work

Neural Information Processing Systems

Organization In this supplementary file, we provide in-depth descriptions of the materials that are not covered in the main paper, and report additional experimental results. The document is organized as follows: Section A-Related work. Neural Architecture Search (NAS) was introduced to ease the process of manually designing complex neural networks. Early NAS [1] efforts employed a brute force approach by training candidate architectures and using their accuracy as a proxy for discovering superior designs. One-shot NAS methods [5, 6, 7] further reduced the cost by training large supernetworks and identifying high-accuracy subnetworks, often generated from pre-trained models. Nevertheless, as search spaces expand with architectural innovations [8, 9], more efficient methods are necessary to predict neural network accuracy in vast design spaces. Recent mathematical programming (MP) based NAS methods [10, 11] are noteworthy, as they transform multi-objective NAS problems into mathematical programming solutions.


MathNAS: If Blocks Have a Role in Mathematical Architecture Design Qinsi Wang

Neural Information Processing Systems

Neural Architecture Search (NAS) has emerged as a favoured method for unearthing effective neural architectures. Recent development of large models has intensified the demand for faster search speeds and more accurate search results. However, designing large models by NAS is challenging due to the dramatical increase of search space and the associated huge performance evaluation cost. Consider a typical modular search space widely used in NAS, in which a neural architecture consists of m block nodes and a block node has n alternative blocks.