Goto

Collaborating Authors

 Technology


GLVD: Guided Learned Vertex Descent

Neural Information Processing Systems

Existing 3D face modeling methods usually depend on 3DMorphable Models, which inherently constrain the representation capacity to fixed shape priors. Optimization-based approaches offer high-quality reconstructions but tend to be computationally expensive. In this work, we introduce GLVD, a hybrid method for 3D face reconstruction from few-shot images that extends Learned Vertex Descent (LVD) [11] by integrating per-vertex neural field optimization with global structural guidance from dynamically predicted 3D keypoints. By incorporating relative spatial encoding, GLVD iteratively refines mesh vertices without requiring dense 3D supervision. This enables expressive and adaptable geometry reconstruction while maintaining computational efficiency. GLVD achieves state-of-the-art performance in single-view settings and remains highly competitive in multi-view scenarios, all while substantially reducing inference time.



Teaching Language Models to Reason with Tools

Neural Information Processing Systems

Large reasoning models (LRMs) like OpenAI-o1 have shown impressive capabilities in natural language reasoning. However, these models frequently demonstrate inefficiencies or inaccuracies when tackling complex mathematical operations. While integrating computational tools such as Code Interpreters (CIs) offers a promising solution, it introduces a critical challenge: a conflict between the model's internal, probabilistic reasoning and the external, deterministic knowledge provided by the CI, which often leads models to unproductive deliberation. To overcome this, we introduce CoRT (Code-Optimized Reasoning Training), a post-training framework designed to teach LRMs to effectively utilize CIs. We propose Hint-Engineering, a new data synthesis strategy that strategically injects diverse hints at optimal points within reasoning paths. This approach generates high-quality, code-integrated reasoning data specifically tailored to optimize LRMCI interaction. Using this method, we have synthesized 30 high-quality samples to post-train models ranging from 1.5B to 32B parameters through supervised fine-tuning.


MAPEstimation with Denoisers: Convergence Rates and Guarantees

Neural Information Processing Systems

Denoiser models have become powerful tools for inverse problems, enabling the use of pretrained networks to approximate the score of a smoothed prior distribution. These models are often used in heuristic iterative schemes aimed at solving Maximum a Posteriori (MAP) optimisation problems, where the proximal operator of the negative log-prior plays a central role. In practice, this operator is intractable, and practitioners plug in a pretrained denoiser as a surrogate--despite the lack of general theoretical justification for this substitution. In this work, we show that a simple algorithm, closely related to several used in practice, provably converges to the proximal operator under a log-concavity assumption on the prior p. We show that this algorithm can be interpreted as a gradient descent on smoothed proximal objectives. Our analysis thus provides a theoretical foundation for a class of empirically successful but previously heuristic methods.


Flexible Realignment of Language Models

Neural Information Processing Systems

Realignment becomes necessary when a language model (LM) fails to meet expected performance. We propose a flexible realignment framework that supports quantitative control of alignment degree during training and inference.


6075d47368ddf560e92efd53264b5405-Paper-Conference.pdf

Neural Information Processing Systems

Visual Reasoning (AVR) entails discerning latent patterns in visual data and inferring underlying rules. Existing solutions often lack scalability and adaptability, as deep architectures tend to overfit training data, and static neural networks fail to dynamically capture diverse rules. To tackle the challenges, we propose a Dynamic and Scalable Reasoning Framework (DSRF) that greatly enhances the reasoning ability by widening the network instead of deepening it, and dynamically adjusting the reasoning network to better fit novel samples instead of a static network. Specifically, we design a Multi-View Reasoning Pyramid (MVRP) to capture complex rules through layered reasoning to focus features at each view on distinct combinations of attributes, widening the reasoning network to cover more attribute combinations analogous to complex reasoning rules. Additionally, we propose a Dynamic Domain-Contrast Prediction (DDCP) block to handle varying task-specific relationships dynamically by introducing a Gram matrix to model feature distributions, and a gate matrix to capture subtle domain differences between context and target features. Extensive experiments on six AVR tasks demonstrate DSRF's superior performance, achieving state-of-the-art results under various settings. Code is available here: https://github.com/UNNCRoxLi/DSRF.


College Grads Are Rejecting AI En Masse

Mother Jones

This week only, every donation is doubled! Halfway through our Summer Membership Drive, we're still well behind where we need to be. But there's good news: This week, every donation will be doubled up, to $50,000 We need you right now. We need you right now. The wave of booing aimed at AI-pilled commencement speakers signals a sea change in public opinion.


APartition Cover Approach for Tokenization

Neural Information Processing Systems

Tokenization is the process of encoding strings into tokens of a fixed vocabulary size, and is widely utilized in Natural Language Processing applications. The leading tokenization algorithm today is Byte-Pair Encoding (BPE), which formulates the tokenization problem as a compression problem and tackles it by performing sequences of merges. In this work, we formulate tokenization as an optimization objective, show that it is NP-hard via a simple reduction from vertex cover, and propose a polynomial-time greedy algorithm GREEDTOK. Our formulation naturally relaxes to the well-studied weighted maximum coverage problem which has a simple (1 1/e)-approximation algorithm GREEDWMC. Through empirical evaluations on real-world corpora, we show that GREEDTOK outperforms BPE and UNIGRAM on compression and achieves a covering score comparable to GREEDWMC.


Object Centric Representation Learning for Enhanced Scene Graph Prediction

Neural Information Processing Systems

While previous research has addressed dataset limitations and explored various approaches including Open-Vocabulary settings, they frequently fail to optimize the representational capacity of object and relationship features, showing excessive reliance on Graph Neural Networks despite insufficient discriminative capability. In this work, we demonstrate through extensive analysis that the quality of object features plays a critical role in determining overall scene graph accuracy. To address this challenge, we design a highly discriminative object feature encoder and employ a contrastive pretraining strategy that decouples object representation learning from the scene graph prediction. This design not only enhances object classification accuracy but also yields direct improvements in relationship prediction. Notably, when plugging in our pretrained encoder into existing frameworks, we observe substantial performance improvements across all evaluation metrics. Additionally, whereas existing approaches have not fully exploited the integration of relationship information, we effectively combine both geometric and semantic features to achieve superior relationship prediction. Comprehensive experiments on the 3DSSG dataset demonstrate that our approach significantly outperforms previous state-of-the-art methods.


Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning

Neural Information Processing Systems

Reinforcement learning (RL) has become the dominant paradigm for improving the performance of language models on complex reasoning tasks. Despite the substantial empirical gains demonstrated by RL-based training methods like GRPO, a granular understanding of why and how RL enhances performance is still lacking. To bridge this gap, we introduce SPARKLE, a fine-grained analytic framework to dissect the effects of RL across three key dimensions: (1) plan following and execution, (2) knowledge integration, and (3) chain of subproblems. Using this framework, we gain insights beyond mere accuracy.