Genre
What Matters in Data for DPO?
Direct Preference Optimization (DPO) has emerged as a simple and effective approach for aligning large language models (LLMs) with human preferences, bypassing the need for a learned reward model. Despite its growing adoption, a fundamental question remains open: what characteristics of preference data are most critical for DPO performance? In this work, we provide a systematic study of how preference data distribution influences DPO, from both theoretical and empirical perspectives. We show that the quality of chosen responses plays a dominant role in optimizing the DPO objective, while the quality of rejected responses may have relatively limited impact. Our theoretical analysis characterizes the optimal response distribution under DPO and reveals how contrastiveness between responses helps primarily by improving the chosen samples. We further study an online DPO setting and show it effectively reduces to supervised fine-tuning on the chosen responses. Extensive experiments across diverse tasks confirm our findings: improving the quality of chosen responses consistently boosts performance regardless of the quality of the rejected responses. We also investigate the benefit of mixing the on-policy data. Our results interpret the mechanism behind some widely adopted strategies and offer practical insights for constructing high-impact preference datasets for LLM alignment.
Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models
A long-standing goal in AI is to develop agents capable of solving diverse tasks across a range of environments, including those never seen during training. Two dominant paradigms address this challenge: (i) reinforcement learning (RL), which learns policies via trial and error, and (ii) optimal control, which plans actions using a known or learned dynamics model. However, their comparative strengths in the offline setting--where agents must learn from reward-free trajectories--remain underexplored. In this work, we systematically evaluate RL and control-based methods on a suite of navigation tasks, using offline datasets of varying quality. On the RL side, we consider goal-conditioned and zero-shot methods. On the control side, we train a latent dynamics model using the Joint Embedding Predictive Architecture (JEPA) and employ it for planning. We investigate how factors such as data diversity, trajectory quality, and environment variability influence the performance of these approaches. Our results show that model-free RL benefits most from large amounts of high-quality data, whereas model-based planning generalizes better to unseen layouts and is more data-efficient, while achieving trajectory stitching performance comparable to leading model-free methods. Notably, planning with a latent dynamics model proves to be a strong approach for handling suboptimal offline data and adapting to diverse environments.
UEPI: Universal Energy-Behavior-Preserving Integrators for Energy Conservative/Dissipative Differential Equations
Physical phenomena in the real world are often described by energy-based modeling theories, such as Hamiltonian mechanics or the Landau theory. It is known that physical phenomena based on these theories have an energy conservation law or a dissipation law. Therefore, in the simulations of such physical phenomena, numerical methods that preserve the energy-conservation or dissipation laws are desirable. However, because various energy-behavior-preserving numerical methods have been proposed, it is difficult to discover the best one. In this study, we propose a method for learning highly accurate energy-behavior-preserving integrators from data. Numerical results show that our approach certainly learns energy-behavior-preserving numerical methods that are more accurate than existing numerical methods for various differential equations, including chaotic Hamiltonian systems, dissipative systems, and a nonlinear partial differential equation. We also provide universal approximation theorems for the proposed approach.
The Structure of Relation Decoding Linear Operators in Large Language Models
This paper investigates the structure of linear operators introduced in Hernandez et al. [2023] that decode specific relational facts in transformer language models. We extend their single-relation findings to a collection of relations and systematically chart their organization. We show that such collections of relation decoders can be highly compressed by simple order-3 tensor networks without significant loss in decoding accuracy. To explain this surprising redundancy, we develop a cross-evaluation protocol, in which we apply each linear decoder operator to the subjects of every other relation. Our results reveal that these linear maps do not encode distinct relations, but extract recurring, coarse-grained semantic properties (e.g., country of capital city and country of food are both in the country-of-X property). This property-centric structure clarifies both the operators' compressibility and highlights why they generalize only to new relations that are semantically close. Our findings thus interpret linear relational decoding in transformer language models as primarily property-based, rather than relation-specific.
Goblin shark filmed in its native habitat for the first time
One of these mysterious sharks was spotted 2,300 feet deeper than scientists expected. More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. The goblin shark was first described in 1898. Breakthroughs, discoveries, and DIY tips sent six days a week. By signing up, you confirm you are 16+, will receive newsletters and promotional content and agree to our Terms of Use and acknowledge the data practices in our Privacy Policy .
Is Noise Conditioning Necessary? A Unified Theory of Unconditional Graph Diffusion Models
Explicit noise-level conditioning is widely regarded as essential for the effective operation of Graph Diffusion Models (GDMs). In this work, we challenge this assumption by investigating whether denoisers can implicitly infer noise levels directly from corrupted graph structures, potentially eliminating the need for explicit noise conditioning. To this end, we develop a theoretical framework centered on Bernoulli edge-flip corruptions and extend it to encompass more complex scenarios involving coupled structure-attribute noise. Extensive empirical evaluations on both synthetic and real-world graph datasets, using models such as GDSS and DiGress, provide strong support for our theoretical findings. Notably, unconditional GDMs achieve performance comparable or superior to their conditioned counterparts, while also offering reductions in parameters (4-6%) and computation time (8-10%). Our results suggest that the high-dimensional nature of graph data itself often encodes sufficient information for the denoising process, opening avenues for simpler, more efficient GDM architectures.
621 trillion miles of fungi networks crisscross the planet
A new map explores the vast underground world supporting all life on Earth. More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. The length of fungi networks is almost a billion times the distance between Earth and the sun. Breakthroughs, discoveries, and DIY tips sent six days a week. By signing up, you confirm you are 16+, will receive newsletters and promotional content and agree to our Terms of Use and acknowledge the data practices in our Privacy Policy .
SonoGym: High Performance Simulation for Challenging Surgical Tasks with Robotic Ultrasound
Ultrasound (US) is a widely used medical imaging modality due to its real-time capabilities, non-invasive nature, and cost-effectiveness. By reducing operator dependency and enhancing access to complex anatomical regions, robotic ultrasound can help improve workflow efficiency. Recent studies have demonstrated the potential of deep reinforcement learning (DRL) and imitation learning (IL) to enable more autonomous and intelligent robotic ultrasound navigation. However, the application of learning-based robotic ultrasound to computer-assisted surgical tasks, such as anatomy reconstruction and surgical guidance, remains largely unexplored. A key bottleneck for this is the lack of realistic and efficient simulation environments tailored to these tasks.
Don't Just Chase "Highlighted Tokens" in MLLMs: Revisiting Visual Holistic Context Retention
Despite their powerful capabilities, multimodal large language models (MLLMs) suffer from considerable computational overhead due to their reliance on massive visual tokens. Recent studies have explored token pruning to alleviate this problem, which typically uses text-vision cross-attention or [CLS] attention to assess and discard redundant visual tokens. In this work, we identify a critical limitation of such attention-first pruning approaches, i.e., they tend to preserve semantically similar tokens, resulting in pronounced performance drops under high pruning rates. To this end, we propose HoloV, a simple yet effective, plug-and-play visual token pruning framework for efficient inference.
Drug Sites Hijacked Spotify's Search Ranking Through Fake Podcasts
A joint congressional report describes a spam operation that turned tens of thousands of fake podcasts into search-engine bait for illegal pharmacy and scam sites. For the past year, Spotify has been quietly purging tens of thousands of podcasts that advertised illegal online pharmacies. A report released Thursday by Senator Maggie Hassan, ranking member of the Joint Economic Committee, faults the company for acting only after news outlets exposed the content and her office spent nearly a year pressing for answers. None of what it removed was sent to law enforcement, the report says. Spotify reportedly removed more than 57,000 podcast episodes and 3,000 shows, and took enforcement action against 3,500 accounts, all pushing links to illegal online pharmacies advertising opioids, benzodiazepines, and stimulants for sale without a prescription.