Goto

Collaborating Authors

 Genre


Cost-aware LLM-based Online Dataset Annotation

Neural Information Processing Systems

Recent advances in large language models (LLMs) have enabled automated dataset labeling with minimal human supervision. While majority voting across multiple LLMs can improve label reliability by mitigating individual model biases, it incurs high computational costs due to repeated querying. In this work, we propose a novel online framework, Cost-aware Majority Voting (CaMVo), for efficient and accurate LLM-based dataset annotation. CaMVo adaptively selects a subset of LLMs for each data instance based on contextual embeddings, balancing confidence and cost without requiring pre-training or ground-truth labels. Leveraging a LinUCB-based selection mechanism and a Bayesian estimator over confidence scores, CaMVo estimates a lower bound on labeling accuracy for each LLM and aggregates responses through weighted majority voting. Our empirical evaluation on the MMLU and IMDBMovie Review datasets demonstrates that CaMVo achieves comparable or superior accuracy to full majority voting while significantly reducing labeling costs. This establishes CaMVo as a practical and robust solution for cost-efficient annotation in dynamic labeling environments.


CDFlow: Building Invertible Layers with Circulant and Diagonal Matrices

Neural Information Processing Systems

Normalizing flows are deep generative models that achieve efficient likelihood estimation and sampling through invertible transformations. A key challenge is designing linear layers that enhance expressiveness while enabling efficient computation of the Jacobian determinant and inverse. In this work, we introduce a novel invertible linear layer based on the product of circulant and diagonal matrices. This decomposition provides a parameter-and computation-efficient formulation, reducing the parameter complexity from O(n2)to O(mn)by using mdiagonal matrices together with m 1circulant matrices, while approximating arbitrary linear transformations. Furthermore, leveraging the Fast Fourier Transform (FFT), our method reduces the time complexity of matrix inversion from O(n3) to O(mnlogn) and matrix log-determinant from O(n3) to O(mn), where n is the input dimension. Building upon this, we introduce a novel normalizing flow model called CirculantDiagonal Flow (CDFlow). Empirical results demonstrate that CDFlow excels in density estimation for natural image datasets and effectively models data with inherent periodicity. In terms of computational efficiency, our method speeds up the matrix inverse and log-determinant computations by 1.17 and 4.31, respectively, compared to the general dense matrix, when the number of channels is set to 96.


Evaluating Program Semantics Reasoning with Type Inference in System F

Neural Information Processing Systems

Large Language Models (LLMs) are increasingly integrated into the software engineering ecosystem. Their test-time compute (TTC) reasoning capabilities show significant potential for understanding program logic and semantics beyond mere token recognition. However, current benchmarks for code reasoning lack a formal, program-centric deductive framework to ensure sound evaluation, and are incapable of assessing whether models genuinely reason about program semantics or merely exploit superficial associations between natural language and code tokens. To bridge this gap, we introduce TF-Bench, a benchmark designed to evaluate LLM reasoning based on type inference in System F, a task we refer to as program semantics reasoning. By employing verified transformations to remove semantically irrelevant natural language, we construct TF-Benchpure, a purely semanticsdriven variant of TF-Bench. Our analysis reveals substantial limitations in state-of-the-art LLMs, with the best-performing LLM (Claude-3.7-sonnet)


AnimateQR: Bridging Aesthetics and Functionality in Dynamic QRCode Generation

Neural Information Processing Systems

Animated QR codes present an exciting frontier for dynamic content delivery and digital interaction. However, despite their potential, there has been no prior work focusing on the generation of animated QR codes that are both visually appealing and universally scannable. In this paper, we introduce AnimateQR, the first generative framework for creating animated QR codes that balance aesthetic flexibility with scannability. Unlike previous methods that focus on static QR codes, AnimateQR leverages hierarchical luminance guidance and progressive spatiotemporal control to produce high-quality dynamic QR codes. Our first innovation is a multi-scale hierarchical control signal that adjusts luminance across different spatial scales, ensuring that the QR code remains decodable while allowing for artistic expression. The second innovation is a progressive control mechanism that dynamically adjusts spatiotemporal guidance throughout the diffusion denoising steps, enabling fine-grained balance between visual quality and scannability. Extensive experimental results demonstrate that AnimateQR achieves state-of-the-art performance in both decoding success rates (96% vs. 56% baseline) and visual quality (user preference: 7.2 vs. 2.3 on a 10-point scale). Codes are availble at https://github.com/mulns/AnimateQR.


Prompt-Guided Alignment with Information Bottleneck Makes Image Compression Also a Restorer

Neural Information Processing Systems

Learned Image Compression (LIC) models face critical challenges in real-world scenarios due to various environmental degradations, such as fog and rain. Due to the distribution mismatch between degraded inputs and clean training data, welltrained LIC models suffer from reduced compression efficiency, while retraining dedicated models for diverse degradation types is costly and impractical. Our method addresses the above issue by leveraging prompt learning under the information bottleneck principle, enabling compact extraction of shared components between degraded and clean images for improved latent alignment and compression efficiency. In detail, we propose an Information Bottleneck-constrained Latent Representation Unifying (IB-LRU) scheme, in which a Probabilistic Prompt Generator (PPG) is deployed to simultaneously capture the distribution of different degradations.


MonarchAttention: Zero-Shot Conversion to Fast, Hardware-Aware Structured Attention

Neural Information Processing Systems

Transformers have achieved state-of-the-art performance across various tasks, but suffer from a notable quadratic complexity in sequence length due to the attention mechanism. In this work, we propose MonarchAttention-a novel approach to sub-quadratic attention approximation via Monarch matrices, an expressive class of structured matrices. Based on the variational form of softmax, we describe an efficient optimization-based algorithm to compute an approximate projection of softmax attention onto the class of Monarch matrices with ฮ˜(N Nd) computational complexity and ฮ˜(Nd)memory/IO complexity.


050b8ff31bee2dfea65b731e71baccd5-Paper-Conference.pdf

Neural Information Processing Systems

Object binding, the brain's ability to bind the many features that collectively represent an object into a coherent whole, is central to human cognition. It groups low-level perceptual features into high-level object representations, stores those objects efficiently and compositionally in memory, and supports human reasoning about individual object instances. While prior work often imposes object-centric attention (e.g., Slot Attention) explicitly to probe these benefits, it remains unclear whether this ability naturally emerges in pre-trained Vision Transformers (ViTs). Intuitively, they could: recognizing which patches belong to the same object should be useful for downstream prediction and thus guide attention. Motivated by the quadratic nature of self-attention, we hypothesize that ViTs represent whether two patches belong to the same object, a property we term IsSameObject.


OpenLex3D: ATiered Evaluation Benchmark for Open-Vocabulary 3DScene Representations

Neural Information Processing Systems

However, at present the evaluation of these representations is limited to datasets with closed-set semantics that do not capture the richness of language. This work presents OpenLex3D, a dedicated benchmark for evaluating 3D open-vocabulary scene representations. OpenLex3D provides entirely new label annotations for scenes from Replica, ScanNet++, and HM3D, which capture real-world linguistic variability by introducing synonymical object categories and additional nuanced descriptions. Our label sets provide 13 times more labels per scene than the original datasets. By introducing an open-set 3D semantic segmentation task and an object retrieval task, we evaluate various existing 3D open-vocabulary methods on OpenLex3D, showcasing failure cases, and avenues for improvement. Our experiments provide insights on feature precision, segmentation, and downstream capabilities. The benchmark is publicly available at: https://openlex3d.github.io.


Structural Entropy Guided Agent for Detecting and Repairing Knowledge Deficiencies in LLMs

Neural Information Processing Systems

Large language models (LLMs) have achieved unprecedented performance by leveraging vast pretraining corpora, yet their performance remains suboptimal in knowledge-intensive domains such as medicine and scientific research, where high factual precision is required. While synthetic data provides a promising avenue for augmenting domain knowledge, existing methods frequently generate redundant samples that do not align with the model's true knowledge gaps. To overcome this limitation, we propose a novel Structural Entropy-guided Knowledge Navigator (SENATOR) framework that addresses the intrinsic knowledge deficiencies of LLMs. Our approach employs the Structure Entropy (SE) metric to quantify uncertainty along knowledge graph paths and leverages Monte Carlo Tree Search (MCTS) to selectively explore regions where the model lacks domain-specific knowledge. Guided by these insights, the framework generates targeted synthetic data for supervised fine-tuning, enabling continuous self-improvement. Experimental results on LLaMA-3 and Qwen2 across multiple domain-specific benchmarks show that SENATOR effectively detects and repairs knowledge deficiencies, achieving notable performance improvements.