Goto

Collaborating Authors

 Technology


DAPO: Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage-Based Policy Optimization

Neural Information Processing Systems

The role of reinforcement learning (RL) in enhancing the reasoning of large language models (LLMs) is becoming increasingly significant. Despite the success of RL in many scenarios, there are still many challenges in improving the reasoning of LLMs. One key challenge is the sparse reward, which introduces more training variance in policy optimization and makes it difficult to obtain a good estimation for value function in Actor-Critic (AC) methods. To address these issues, we introduce Direct Advantage-Based Policy Optimization (DAPO), a novel step-level offline RL algorithm with theoretical guarantees for enhancing the reasoning abilities of LLMs. Unlike response-level methods (such as DPO and GRPO) that the update directions of all reasoning steps are governed by the outcome reward uniformly, DAPO employs a critic function to provide step-level dense signals for policy optimization. Additionally, the actor and critic in DAPO are trained independently, ensuring that critic is a good estimation of true state value function and avoiding the co-training instability observed in standard AC methods. We train DAPO on mathematical and code problems and then evaluate its performance on multiple benchmarks. Our results show that DAPO can effectively enhance the mathematical and code capabilities on both SFT models and RL models, demonstrating the effectiveness of DAPO.


BraVE: Offline Reinforcement Learning for Discrete Combinatorial Action Spaces

Neural Information Processing Systems

Offline reinforcement learning in high-dimensional, discrete action spaces is challenging due to the exponential scaling of the joint action space with the number of sub-actions and the complexity of modeling sub-action dependencies. Existing methods either exhaustively evaluate the action space, making them computationally infeasible, or factorize Q-values, failing to represent joint sub-action effects. We propose Branch Value Estimation (BraVE), a value-based method that uses tree-structured action traversal to evaluate a linear number of joint actions while preserving dependency structure. BraVE outperforms prior offline RL methods by up to 20 in environments with over four million actions. 2


Beyond Scalars: Concept-Based Alignment Analysis in Vision Transformers

Neural Information Processing Systems

Measuring the alignment between representations lets us understand similarities between the feature spaces of different models, such as Vision Transformers trained under diverse paradigms. However, traditional measures for representational alignment yield only scalar values that obscure how these spaces agree in terms of learned features. To address this, we combine alignment analysis with concept discovery, allowing a fine-grained breakdown of alignment into individual concepts. This approach reveals both universal concepts across models and each representation's internal concept structure. We introduce a new definition of concepts as non-linear manifolds, hypothesizing they better capture the geometry of the featurespace. A sanity check demonstrates the advantage of this manifold-based definition over linear baselines for concept-based alignment. Finally, our alignment analysis of four different ViTs shows that increased supervision tends to reduce semantic organization in learned representations.


Attack by Yourself: Effective and Unnoticeable Multi-Category Graph Backdoor Attacks with Subgraph Triggers Pool

Neural Information Processing Systems

Graph Neural Networks (GNNs) have achieved significant success in various real-world applications, including social networks, finance systems, and traffic management. Recent researches highlight their vulnerability to backdoor attacks in node classification, where GNNs trained on a poisoned graph misclassify a test node only when specific triggers are attached. These studies typically focus on single attack categories and use adaptive trigger generators to create node-specific triggers. However, adaptive trigger generators typically have a simple structure, limited parameters, and lack category-aware graph knowledge, which makes them struggle to handle backdoor attacks across multiple categories as the number of target categories increases. We address this gap by proposing a novel approach for Effective and Unnoticeable Multi-Category (EUMC) graph backdoor attacks, leveraging subgraph from the attacked graph as category-aware triggers to precisely control the target category.


Few-Shot Learning from Gigapixel Images via Hierarchical Vision-Language Alignment and Modeling

Neural Information Processing Systems

Vision-language models (VLMs) have recently been integrated into multiple instance learning (MIL) frameworks to address the challenge of few-shot, weakly supervised classification of whole slide images (WSIs). A key trend involves leveraging multi-scale information to better represent hierarchical tissue structures. However, existing methods often face two key limitations: (1) insufficient modeling of interactions within the same modalities across scales (e.g., 5 and 20) and (2) inadequate alignment between visual and textual modalities on the same scale. To address these gaps, we propose HiVE-MIL, a hierarchical vision-language framework that constructs a unified graph consisting of (1) parent-child links between coarse (5) and fine (20) visual/textual nodes to capture hierarchical relationships, and (2) heterogeneous intra-scale edges linking visual and textual nodes on the same scale. To further enhance semantic consistency, HiVE-MIL incorporates a two-stage, text-guided dynamic filtering mechanism that removes weakly correlated patch-text pairs, and introduces a hierarchical contrastive loss to align textual semantics across scales. Extensive experiments on TCGA breast, lung, and kidney cancer datasets demonstrate that HiVE-MIL consistently outperforms both traditional MIL and recent VLM-based MIL approaches, achieving gains of up to 4.1% in macro F1 under 16-shot settings. Our results demonstrate the value of jointly modeling hierarchical structure and multimodal alignment for efficient and scalable learning from limited pathology data.


MODELSHAPLEY: Find Your Ideal Parameter Player via One Gradient Backpropagation

Neural Information Processing Systems

Measuring parameter importance is crucial for understanding and optimizing large language models (LLMs). Existing work predominantly focuses on pruning or probing at neuron/feature levels without fully considering the cooperative behaviors of model parameters. In this paper, we introduce a novel approach-MODEL SHAPLEY to quantify parameter importance based on the Shapley value, a principled method from cooperative game theory that captures both individual and synergistic contributions among parameters, via only one gradient backpropagation. We derive a scalable second-order approximation to compute Shapley values at the parameter level, leveraging blockwise Fisher information for tractability in large-scale settings. Our method enables fine-grained differentiation of parameter importance, facilitating targeted knowledge injection and model compression. Through mini-batch Monte Carlo updates and efficient approximation of the Hessian structure, we achieve robust Shapley-based attribution with only modest computational overhead. Experimental results indicate that this cooperative game perspective enhances interpretability, guides more effective parameter-specific fine-tuning and model compressing, and paves the way for continuous model improvement in various downstream tasks.


Dynamic Gaussian Splatting from Defocused and Motion-blurred Monocular Videos

Neural Information Processing Systems

This paper presents a unified framework that allows high-quality dynamic Gaussian Splatting from both defocused and motion-blurred monocular videos. Due to the significant difference between the formation processes of defocus blur and motion blur, existing methods are tailored for either one of them, lacking the ability to simultaneously deal with both of them. Although the two can be jointly modeled as blur kernel-based convolution, the inherent difficulty in estimating accurate blur kernels greatly limits the progress in this direction. In this work, we go a step further towards this direction. Particularly, we propose to estimate per-pixel reliable blur kernels using a blur prediction network that exploits blur-related scene and camera information and is subject to a blur-aware sparsity constraint. Besides, we introduce a dynamic Gaussian densification strategy to mitigate the lack of Gaussians for incomplete regions, and boost the performance of novel view synthesis by incorporating unseen view information to constrain scene optimization. Extensive experiments show that our method outperforms the state-of-the-art methods in generating photorealistic novel view synthesis from defocused and motion-blurred monocular videos.


RobustMerge: Parameter-Efficient Model Merging for MLLMs with Direction Robustness

Neural Information Processing Systems

Fine-tuning pre-trained models with custom data leads to numerous expert models on specific tasks. Merging models into one universal model to empower multi-task ability refraining from data leakage has gained popularity. With the expansion in data and model size, parameter-efficient tuning becomes the common practice for obtaining task-specific models efficiently. However, few methods are dedicated to efficient merging, and existing methods designed for full fine-tuning merging fail under efficient merging. To address the issue, we analyze from low-rank decomposition and reveal that direction robustness during merging is crucial for merging efficient modules. We furthermore uncover that compensating for the gap between stark singular values contributes to direction robustness. Therefore, we propose RobustMerge, a training-free parameter-efficient merging method with complementary parameter adaptation to maintain direction robustness. Specifically, we (1) prune parameters and scale coefficients from inter-parameter relations for singular values to maintain direction stability away from task interference, and (2) perform cross-task normalization to enhance unseen task generalization. We establish a benchmark consisting of diverse multimodal tasks, on which we conduct experiments to certify the outstanding performance and generalizability of our method.


Stonehenge's secret SISTER: Archaeologists discover an ancient monument just three miles away that may have served as a 'prototype' for the famous stones

Daily Mail - Science & tech

Trump turns on the charm after extended'alpha' handshake with Macron and kisses for Brigitte at Palace of Versailles Sensational REAL reason Jelly Roll is divorcing Bunnie XO: Insiders reveal'preacher's wife' bombshell that's the talk of Nashville... truth about legendary rocker cuckolding rumor... and G-string mishap LIZ JONES: The cracks in Harry and Meghan's perfect facade have started to show. It's so obvious he's tiring of her tone-deaf approach... and I predict there's serious trouble in store Taylor Swift's bottomless thirst for attention, her greed and sheer tackiness are now truly unbearable... this latest stunt has shown her true colors: MAUREEN CALLAHAN NBA star's fiancee breaks her silence after friend, 26, mysteriously dropped dead at her luxury bachelorette party in St Barts Luxury fashion tycoon beloved by the stars hangs her head in shame as she's indicted for allegedly exploiting her workers and stealing $50k from their wages Jeff Bezos mercilessly mocked for taking'fake phone calls' when out with wife Lauren Sanchez Anguished family members flee court over sick details of Gilgo Beach murderer's kill room: Live updates'She has not been transparent... the damage has been done': How influencer Elle Darby'betrayed' thousands of young female fans...as insiders tell MOLLY CLAYTON how she cashed in As a divorced mother-of-three, cocaine was my little treat while my fellow middle-class friends had a few wines. What happened next was every family's worst nightmare... this is my warning to mums who'dabble' Desperate search for mom-of-three who hasn't been seen in three days as husband pleads for her return The shocking betrayal behind Jelly Roll's divorce from Bunnie XO is so utterly cruel... but have you yet spotted her revenge: JACQUELYNN POWERS Devastating supply crunch forces Apple to raise prices on iPhones and other devices, calling the move'unavoidable' Jeff's Dream Team: Bezos recruits world's top architects to build most expensive mega mansion on Billionaire Bunker island The Ring star Daveigh Chase's friends searched for her on LA's Skid Row in months before her shock death at 35 Watch horrifying drone video that follows woman's plunge to death after bungee team threw her from bridge without rope Stonehenge's secret SISTER: Archaeologists discover an ancient monument just three miles away that may have served as a'prototype' for the famous stones Archaeologists have discovered a secret sister monument to Stonehenge that might have served as a'prototype' for the famous stones. This ancient site is just three miles away from Stonehenge itself, located in the village of Bulford, Wiltshire. Consisting of two wooden poles placed 400 feet (120 metres) apart, this long-lost monument might appear rather basic at first glance.


CoPr: Awekis Pr: Ththth ooeenres mmm ve ecrafe ppduatorttbpn axsae b al akapitnnict'ingid dhcosk an,th oe h a wAtdendehu aoneudd.m pritoto ahnue cn htehd ey

Neural Information Processing Systems

Audio-dri generation, ments and v appealing en hav human e made visua animation remarkable l quality methods, progress videos. in such Ho generating we as v talking er, existing synchronized head methods and talking facial primarily mo body vefocus facing on incorrect single binding human animation problems between and struggle audio with and persons.