Goto

Collaborating Authors

 Genre


Improving Time Series Forecasting via Instance-aware Post-hoc Revision

Neural Information Processing Systems

Time series forecasting plays a vital role in various real-world applications and has attracted significant attention in recent decades. While recent methods have achieved remarkable accuracy by incorporating advanced inductive biases and training strategies, we observe that instance-level variations remain a significant challenge. These variations--stemming from distribution shifts, missing data, and long-tail patterns--often lead to suboptimal forecasts for specific instances, even when overall performance appears strong. To address this issue, we propose a model-agnostic framework, PIR, designed to enhance forecasting performance through Post-forecasting Identification and Revision. Specifically, PIR first identifies biased forecasting instances by estimating their accuracy. Based on this, the framework revises the forecasts using contextual information, including covariates and historical time series, from both local and global perspectives in a post-processing fashion. Extensive experiments on real-world datasets with mainstream forecasting models demonstrate that PIR effectively mitigates instance-level errors and significantly improves forecasting reliability.


ACompressive-Expressive Communication Framework for Compositional Representations

Neural Information Processing Systems

Compositionality in knowledge and language--the ability to represent complex concepts as a combination of simpler ones--is a hallmark of human cognition and communication. Despite recent advances, deep neural networks still struggle to acquire this property reliably. Neural models for emergent communication look to endow artificial agents with compositional language by simulating the pressures that form human language. In this work, we introduce CELEBI2 (CompressiveExpressive Language Emergence through a discrete Bottleneck and Iterated learning), a novel self-supervised framework for inducing compositional representations through a reconstruction-based communication game between a sender and a receiver. Building on theories of language emergence and the iterated learning framework, we integrate three mechanisms that jointly promote compressibility, expressivity, and efficiency in the emergent language. First, Progressive Decoding incentivizes intermediate reasoning by requiring the receiver to produce partial reconstructions after each symbol. Second, Final-State Imitation trains successive generations of agents to imitate reconstructions rather than messages, enforcing a tighter communication bottleneck.


OpenWorldSAM Extending for Universal Image Segmentation with Language Prompts

Neural Information Processing Systems

The ability to segment objects based on open-ended language prompts remains a critical challenge, requiring models to ground textual semantics into precise spatial masks while handling diverse and unseen categories. We present OpenWorldSAM, a framework that extends the prompt-driven Segment Anything Model v2 (SAM2) to open-vocabulary scenarios by integrating multi-modal embeddings extracted from a lightweight vision-language model (VLM). Our approach is guided by four key principles: i) Unified prompting: OpenWorldSAM supports a diverse range of prompts, including category-level and sentence-level language descriptions, providing a flexible interface for various segmentation tasks.


Dynamic Masking and Auxiliary Hash Learning for Enhanced Cross-Modal Retrieval

Neural Information Processing Systems

The demand for multimodal data processing drives the development of information technology. Cross-modal hash retrieval has attracted much attention because it can overcome modal differences and achieve efficient retrieval, and has shown great application potential in many practical scenarios. Existing cross-modal hashing methods have difficulties in fully capturing the semantic information of different modal data, which leads to a significant semantic gap between modalities. Moreover, these methods often ignore the importance differences of channels, and due to the limitation of a single goal, the matching effect between hash codes is also affected to a certain extent, thus facing many challenges. To address these issues, we propose a Dynamic Masking and Auxiliary Hash Learning (AHLR) method for enhanced cross-modal retrieval.


HollowFlow: Efficient Sample Likelihood Evaluation using Hollow Message Passing

Neural Information Processing Systems

Flow and diffusion-based models have emerged as powerful tools for scientific applications, particularly for sampling non-normalized probability distributions, as exemplified by Boltzmann Generators (BGs). A critical challenge in deploying these models is their reliance on sample likelihood computations, which scale prohibitively with system size n, often rendering them infeasible for large-scale problems. To address this, we introduce HollowFlow, a flow-based generative model leveraging a novel non-backtracking graph neural network (NoBGNN). By enforcing a block-diagonal Jacobian structure, HollowFlow likelihoods are evaluated with a constant number of backward passes in n, yielding speed-ups of up to O(n2): a significant step towards scaling BGs to larger systems. Crucially, our framework generalizes: any equivariant GNN or attention-based architecture can be adapted into a NoBGNN.


Unifying Proportional Fairness in Centroid and Non-Centroid Clustering

Neural Information Processing Systems

Proportional fairness criteria inspired by democratic ideals of proportional representation have received growing attention in the clustering literature. Prior work has investigated them in two separate paradigms. Chen et al. [1] study centroid clustering, in which each data point's loss is determined by its distance to a representative point (centroid) chosen in its cluster. Caragiannis et al. [2] study non-centroid clustering, in which each data point's loss is determined by its maximum distance to any other data point in its cluster. We generalize both paradigms to introduce semi-centroid clustering, in which each data point's loss is a combination of its centroid and non-centroid losses, and study two proportional fairness criteria--the core, and its relaxation, fully justified representation (FJR). Our main result is a novel algorithm which achieves a constant approximation to the core, in polynomial time, even when the distance metrics used for centroid and non-centroid loss measurements are different. We also derive improved results for more restricted loss functions and the weaker FJR criterion, and establish lower bounds in each case.


sicoremgmpuplellearx

Neural Information Processing Systems

Each row presents an example with overlapping instances, and image captions are shown below. More examples and more detailed failure descriptions can be found in Appendix C. in controllable image generation [Li et al., 2023b, Zhang et al., 2023]. A recent line of work proposes generating images conditioned on layouts, commonly referred to as Layout-to-Image (L2I) generation, which allows users to directly specify spatial locations [Xie et al., 2023b, Wang et al., 2024b, Li et al., 2023b] and object counts [Binyamin et al., 2024, Yang et al., 2023] in the generated outputs. While existing frameworks [Xie et al., 2023b, Wang et al., 2024b, Li et al., 2023b] can achieve satisfactory spatial and numerical control over image generation, these approaches fail to generate distinct, coherent objects when multiple bounding boxes overlap in layout and their associated categories are semantically similar. As illustrated in Figure 2, such scenarios lead to artifacts including object blending, spatial ambiguity, and visual distortion.


AIResearch Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench

Neural Information Processing Systems

AI research agents are demonstrating great potential to accelerate scientific progress by automating the design, implementation, and training of machine learning models. We focus on methods for improving agents' performance on MLE-bench, a challenging benchmark where agents compete in Kaggle competitions to solve real-world machine learning problems. We formalize AI research agents as search policies that navigate a space of candidate solutions, iteratively modifying them using operators. By designing and systematically varying different operator sets and search policies (Greedy, MCTS, Evolutionary), we show that their interplay is critical for achieving high performance. Our best pairing of search strategy and operator set achieves a state-of-the-art result on MLE-bench lite, increasing the success rate of achieving a Kaggle medal from 39.6 % to 47.7 %. Our investigation underscores the importance of jointly considering the search strategy, operator design, and evaluation methodology in advancing automated machine learning.


MDNS: Masked Diffusion Neural Sampler via Stochastic Optimal Control

Neural Information Processing Systems

We study the problem of learning a neural sampler to generate samples from discrete state spaces where the target probability mass function π e U is known up to a normalizing constant, which is an important task in fields such as statistical physics, machine learning, combinatorial optimization, etc. To better address this challenging task when the state space has a large cardinality and the distribution is multi-modal, we propose Masked Diffusion Neural Sampler (MDNS), a novel framework for training discrete neural samplers by aligning two path measures through a family of learning objectives, theoretically grounded in the stochastic optimal control of the continuous-time Markov chains. We validate the efficiency and scalability of MDNS through extensive experiments on various distributions with distinct statistical properties, where MDNS learns to accurately sample from the target distributions despite the extremely high problem dimensions and outperforms other learning-based baselines by a large margin. A comprehensive study of ablations and extensions is also provided to demonstrate the efficacy and potential of the proposed framework.


Ref. ImagesOursGTPaint-by-Example Target Images

Neural Information Processing Systems

Reference-driven image completion, which restores missing regions in a target view using additional images, is particularly challenging when the target view differs significantly from the references. Existing generative methods rely solely on diffusion priors and, without geometric cues such as camera pose or depth, often produce misaligned or implausible content. We propose GeoComplete, a novel framework that incorporates explicit 3D structural guidance to enforce geometric consistency in the completed regions, setting it apart from prior image-only approaches. GeoComplete introduces two key ideas: conditioning the diffusion process on projected point clouds to infuse geometric information, and applying target-aware masking to guide the model toward relevant reference cues. The framework features a dual-branch diffusion architecture.