Goto

Collaborating Authors

 Country


Incremental Sequence Classification with Temporal Consistency

Neural Information Processing Systems

We address the problem of incremental sequence classification, where predictions are updated as new elements in the sequence are revealed. Drawing on temporaldifference learning from reinforcement learning, we identify a temporal-consistency condition that successive predictions should satisfy. We leverage this condition to develop a novel loss function for training incremental sequence classifiers. Through a concrete example, we demonstrate that optimizing this loss can offer substantial gains in data efficiency. We apply our method to text classification tasks and show that it improves predictive accuracy over competing approaches on several benchmark datasets. We further evaluate our approach on the task of verifying large language model generations for correctness in grade-school math problems. Our results show that models trained with our method are better able to distinguish promising generations from unpromising ones after observing only a few tokens.


ALTo: Adaptive-Length Tokenizer for Autoregressive Mask Generation

Neural Information Processing Systems

While humans effortlessly draw visual objects and shapes by adaptively allocating attention based on their complexity, existing multimodal large language models (MLLMs) remain constrained by rigid token representations. Bridging this gap, we propose ALTo, an adaptive-length tokenizer for autoregressive mask generation. To achieve this, a novel token length predictor is designed, along with a length regularization term and a differentiable token chunking strategy.


Russian strikes kill nine in Ukraine and damage historic cathedral, officials say

BBC News

Nine people have been killed and several others injured in a wave of Russian strikes on Ukraine during which a major religious landmark in Kyiv caught fire, reports say. Four people were killed in attacks on Kyiv, while five rescue workers died trying to put out a fire caused by a Russian strike on the north-eastern city of Kharkiv, Ukrainian officials said. The 11th Century Dormition Cathedral was significantly damaged in what Ukrainian Prime Minister Yulia Svyrydenko called a brutal assault on our people and our heritage. Meanwhile, a Ukrainian drone attack in the Russian city of Tula, south of Moscow, killed three people and wounded three others, including a one-year-old, officials said. Drone and missile strikes set fire to buildings and cars and left more than 140,000 people in Ukraine's capital without electricity, Kyiv Mayor Vitali Klitschko said.


Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

Neural Information Processing Systems

Academic poster generation is a crucial yet challenging task in scientific communication, requiring the compression of long-context interleaved documents into a single, visually coherent page. To address this challenge, we introduce the first benchmark and metric suite for poster generation, which pairs recent conference papers with author-designed posters and evaluates outputs on (i) Visual Quality--semantic alignment with human posters, (ii) Textual Coherence--language fluency, (iii) Holistic Assessment--six fine-grained aesthetic and informational criteria scored by a VLM-as-judge, and notably (iv) PaperQuiz--the poster's ability to convey core paper content as measured by VLMs answering generated quizzes. Building on this benchmark, we propose PosterAgent, a top-down, visualin-the-loop multi-agent pipeline: the (a) Parser distills the paper into a structured asset library; the (b) Planner aligns text-visual pairs into a binary-tree layout that preserves reading order and spatial balance; and the (c) Painter-Commenter loop refines each panel by executing rendering code and using VLM feedback to eliminate overflow and ensure alignment. In our comprehensive evaluation, we find that GPT-4o outputs--though visually appealing at first glance--often exhibit noisy text and poor PaperQuiz scores, and we find that reader engagement is the primary aesthetic bottleneck, as human-designed posters rely largely on visual semantics to convey meaning. Our fully open-source variants (e.g., based on the Qwen-2.5 series) outperform existing 4o-driven multi-agent systems across nearly all metrics, while using 87%fewer tokens. It transforms a 22-page paper into a finalized yet editable '.pptx' poster -- all for just $0.005. These findings chart clear directions for the next generation of fully automated poster-generation models.



Encoder-Decoder Diffusion Language Models for Efficient Training and Inference

Neural Information Processing Systems

Discrete diffusion models enable parallel token sampling for faster inference than autoregressive approaches. However, prior diffusion models use a decoder-only architecture, which requires sampling algorithms that invoke the full network at every denoising step and incur high computational cost. Our key insight is that discrete diffusion models perform two types of computation: 1) representing clean tokens and 2) denoising corrupted tokens, which enables us to use separate modules for each task. We propose an encoder-decoder architecture to accelerate discrete diffusion inference, which relies on an encoder to represent clean tokens and a lightweight decoder to iteratively refine a noised sequence. We also show that this architecture enables faster training of block diffusion models, which partition sequences into blocks for better quality and are commonly used in diffusion language model inference. We introduce a framework for Efficient Encoder-Decoder Diffusion (E2D2), consisting of an architecture with specialized training and sampling algorithms, and we show that E2D2 achieves superior trade-offs between generation quality and inference throughput on summarization, translation, and mathematical reasoning tasks. We provide the code1, model weights, and blog post on the project page: https://m-arriola.com/e2d2.


Structure-free Graph Condensation: From Large-scale Graphs to Condensed Graph-free Data

Neural Information Processing Systems

Graph condensation, which reduces the size of a large-scale graph by synthesizing a small-scale condensed graph as its substitution, has immediate benefits for various graph learning tasks. However, existing graph condensation methods rely on the joint optimization of nodes and structures in the condensed graph, and overlook critical issues in effectiveness and generalization ability. In this paper, we advocate a new Structure-Free Graph Condensation paradigm, named SFGC, to distill a largescale graph into a small-scale graph node set without explicit graph structures, i.e., graph-free data. Our idea is to implicitly encode topology structure information into the node attributes in the synthesized graph-free data, whose topology is reduced to an identity matrix.


Overleaf Example

Neural Information Processing Systems

Although Federated Learning (FL) is promising for privacy-preserving collaborative model training, it suffers from low inference performance due to heterogeneous client data. Due to heterogeneous data across clients, FL training easily learns client-specific overfitting features. Existing FL methods adopt coarsegrained averaging, which can easily cause the global model to get stuck in local optima, leading to poor generalization. Specifically, this paper presents a novel FL framework, FedPhoenix, to address this issue. It stochastically resets partial parameters in each round to destroy some features of the global model, guiding FL training to learn multiple generalized features for inference rather than specific overfitting features. Experimental results on various wellknown datasets demonstrate that compared to SOTAFL methods, FedPhoenix can achieve up to 20.73% higher accuracy. The implementation is publicly available at https://github.com/UniString/FedPhoenix.


NFL-BA: Near-Field Light Bundle Adjustment for SLAM in Dynamic Lighting

Neural Information Processing Systems

Simultaneous distant terranean illumination; robotics, Localization and howe search v and er, man & Mapping rescue y real-w in (SLAM) collapsed orld scenarios, systems environments, such typically as endoscop require assume agents y static,, subto such operate cases, with dynamic a co-located near-field light lighting and camera introduces in the strong, absence vie of w-dependent external lighting.


MaintainCoder: Maintainable Code Generation Under Dynamic Requirements

Neural Information Processing Systems

Modern code generation has made significant strides in functional correctness and execution efficiency. However, these systems often overlook a critical dimension in real-world software development: maintainability. To handle dynamic requirements with minimal rework, we propose MaintainCoder as a pioneering solution. It integrates the Waterfall model, design patterns, and multi-agent collaboration to systematically enhance cohesion, reduce coupling, achieving clear responsibility boundaries and better maintainability. We also introduce MaintainBench, a benchmark comprising requirement changes and novel dynamic metrics on maintenance efforts. Experiments demonstrate that existing code generation methods struggle to meet maintainability standards when requirements evolve. In contrast, MaintainCoder improves dynamic maintainability metrics by more than 60% with even higher correctness of initial codes. Furthermore, while static metrics fail to accurately reflect maintainability and even contradict each other, our proposed dynamic metrics exhibit high consistency. Our work not only provides the foundation for maintainable code generation, but also highlights the need for more realistic and comprehensive code generation research.