Well File:

Spatially Sparse Inference for Generative Image Editing Supplementary Material

Neural Information Processing Systems

For all models, we use block size 6 for 3 3 convolutions and block size 4 for 1 1 convolutions. We omit the element-wise operations for simplicity and follow the notations in Section 3. As the kernel sizes of the convolution in the shortcut branch and main branch are different, their reduced active block indices are different (Indices and Shortcut Indices). To reduce the tensor copying overheads in Scatter, we fuse Scatter and the following Gather into Scatter-Gather and fuse the Scatter in the shortcut, main branch and residual addition into Scatter with Block Residual. As mentioned in Section 3.2, we fuse Scatter and the following Gather into a Scatter-Gather Note that the pre-computation is cheap and only needs to be once for each resolution. Scatter weigh more in the shortcut branch.


Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models

Neural Information Processing Systems

During image editing, existing deep generative models tend to re-synthesize the entire output from scratch, including the unedited regions. This leads to a significant waste of computation, especially for minor editing operations. In this work, we present Spatially Sparse Inference (SSI), a general-purpose technique that selectively performs computation for edited regions and accelerates various generative models, including both conditional GANs and diffusion models. Our key observation is that users tend to make gradual changes to the input image.


PhyloGen: Language Model-Enhanced Phylogenetic Inference via Graph Structure Generation

Neural Information Processing Systems

Phylogenetic trees elucidate evolutionary relationships among species, but phylogenetic inference remains challenging due to the complexity of combining continuous (branch lengths) and discrete parameters (tree topology). Traditional Markov Chain Monte Carlo methods face slow convergence and computational burdens. Existing Variational Inference methods, which require pre-generated topologies and typically treat tree structures and branch lengths independently, may overlook critical sequence features, limiting their accuracy and flexibility. We propose PhyloGen, a novel method leveraging a pre-trained genomic language model to generate and optimize phylogenetic trees without dependence on evolutionary models or aligned sequence constraints. PhyloGen views phylogenetic inference as a conditionally constrained tree structure generation problem, jointly optimizing tree topology and branch lengths through three core modules: (i) Feature Extraction, (ii) PhyloTree Construction, and (iii) PhyloTree Structure Modeling. Meanwhile, we introduce a Scoring Function to guide the model towards a more stable gradient descent. We demonstrate the effectiveness and robustness of PhyloGen on eight real-world benchmark datasets. Visualization results confirm PhyloGen provides deeper insights into phylogenetic relationships.




Equivariant Blurring Diffusion for Hierarchical Molecular Conformer Generation

Neural Information Processing Systems

How can diffusion models process 3D geometries in a coarse-to-fine manner, akin to our multiscale view of the world? In this paper, we address the question by focusing on a fundamental biochemical problem of generating 3D molecular conformers conditioned on molecular graphs in a multiscale manner. Our approach consists of two hierarchical stages: i) generation of coarse-grained fragment-level 3D structure from the molecular graph, and ii) generation of fine atomic details from the coarse-grained approximated structure while allowing the latter to be adjusted simultaneously. For the challenging second stage, which demands preserving coarse-grained information while ensuring SE(3) equivariance, we introduce a novel generative model termed Equivariant Blurring Diffusion (EBD), which defines a forward process that moves towards the fragment-level coarse-grained structure by blurring the fine atomic details of conformers, and a reverse process that performs the opposite operation using equivariant networks. We demonstrate the effectiveness of EBD by geometric and chemical comparison to state-of-theart denoising diffusion models on a benchmark of drug-like molecules. Ablation studies draw insights on the design of EBD by thoroughly analyzing its architecture, which includes the design of the loss function and the data corruption process. Codes are released at https://github.com/Shen-Lab/EBD.


Supplementary Materials for: NCP: Neural Correspondence Prior for Effective Unsupervised Shape Matching

Neural Information Processing Systems

Our work uses the functional map framework as a first estimator for p2p maps, and multiple losses on point-to-point (p2p) maps to learn robust features that allow extracting good correspondences using the nearest neighbor in feature space. We provide a brief overview in the next section. Functional maps The functional map (fmap) framework was used for the first stage of our NCP-UN algorithm.


Compositional 3D-aware Video Generation with LLM Director

Neural Information Processing Systems

Significant progress has been made in text-to-video generation through the use of powerful generative models and large-scale internet data. However, substantial challenges remain in precisely controlling individual concepts within the generated video, such as the motion and appearance of specific characters and the movement of viewpoints. In this work, we propose a novel paradigm that generates each concept in 3D representation separately and then composes them with priors from Large Language Models (LLM) and 2D diffusion models. Specifically, given an input textual prompt, our scheme consists of three stages: 1) We leverage LLM as the director to first decompose the complex query into several sub-prompts that indicate individual concepts within the video (e.g., scene, objects, motions), then we let LLM to invoke pre-trained expert models to obtain corresponding 3D representations of concepts.


Rethinking No-reference Image Exposure Assessment from Holism to Pixel: Models, Datasets and Benchmarks

Neural Information Processing Systems

The past decade has witnessed an increasing demand for enhancing image quality through exposure, and as a crucial prerequisite in this endeavor, Image Exposure Assessment (IEA) is now being accorded serious attention. However, IEA encounters two persistent challenges that remain unresolved over the long term: the accuracy and generalizability of No-reference IEA are inadequate for practical applications; the scope of IEA is confined to qualitative and quantitative analysis of the entire image or subimage, such as providing only a score to evaluate the exposure level, thereby lacking intuitive and precise fine-grained evaluation for complex exposure conditions. The objective of this paper is to address the persistent bottleneck challenges from three perspectives: model, dataset, and benchmark.