Industry
Can Knowledge-Graph-based Retrieval Augmented Generation Really Retrieve What You Need?
Retrieval-Augmented Generation (RAG) based on knowledge graphs (KGs) enhances large language models (LLMs) with structural and textual external knowledge. Yet, existing KG-based RAG methods struggle to retrieve accurate and diverse information when handling complex queries. By modeling KG-based retrieval as a multi-step decision process, Process Reward Models (PRMs) offer a promising solution to align the retrieval behavior with the query-specific knowledge requirements. However, PRMs heavily rely on process-level supervision signals that are expensive and hard to obtain on KGs. To address this challenge, we propose GraphFlow, a framework that efficiently retrieves accurate and diverse knowledge required for complex queries from text-rich KGs. GraphFlow employs a detailed balance objective with local exploration to jointly optimize a retrieval policy and a flow estimator.
Physics-informed Neural Operator for Pansharpening
Over the past decades, pansharpening has contributed greatly to numerous remote sensing applications, with methods evolving from theoretically grounded models to deep learning approaches and their hybrids. Though promising, existing methods rarely address pansharpening through the lens of underlying physical imaging processes. In this work, we revisit the spectral imaging mechanism and propose a novel physics-informed neural operator framework for pansharpening, termed PINO, which faithfully models the end-to-end electro-optical sensor process. Specifically, PINO operates as: (1) First, a spatial-spectral encoder is introduced to aggregate multi-granularity high-resolution panchromatic (PAN) and low-resolution multispectral (LRMS) features.
STSBENCH: ALarge-Scale Dataset for Modeling Neuronal Activity in the Dorsal Stream of Primate Visual Cortex
The primate visual system is typically divided into two streams -- the ventral stream, responsible for object recognition, and the dorsal stream, responsible for encoding spatial relations and motion. Recent studies have shown that convolutional neural networks (CNNs) pretrained on object recognition tasks are remarkably effective at predicting neuronal responses in the ventral stream, shedding light on the neural mechanisms underlying object recognition. However, similar models of the dorsal stream remain underdeveloped due to the lack of large scale datasets encompassing dorsal stream areas. To address this gap, we present STSBENCH, a dataset of large-scale, single neuron recordings from over 2,000 neurons in the superior temporal sulcus (STS), a nearly 50-fold increase over existing dorsal stream datasets, collected while Rhesus macaques viewed thousands of unique, natural videos. We show that our dataset can be used for benchmarking encoding models of dorsal stream neuronal responses and reconstructing visual input from neural activity.
VideoVLA: Video Generators Can Be Generalizable Robot Manipulators
Generalization in robot manipulation is essential for deploying robots in open-world environments and advancing toward artificial general intelligence. While recent vision-language-action models leverage large pre-trained understanding models for perception and instruction following, their ability to generalize to novel tasks, objects, and settings remains limited. In this work, we present VideoVLA, a simple approach that explores the potential of directly transforming large video generation models into robotic VLA manipulators. Given a language instruction and an image, VideoVLA predicts an action sequence as well as the future visual outcomes. Built on a multi-modal Diffusion Transformer, VideoVLA jointly models video, language, and action modalities, using pre-trained video generative models for joint visual and action forecasting. Our experiments show that high-quality imagined futures correlate with reliable action predictions and task success, highlighting the importance of visual imagination in manipulation. VideoVLA demonstrates strong generalization, including imitating other embodiments' skills and handling novel objects. This dual-prediction strategy--forecasting both actions and their visual consequences--explores a paradigm shift in robot learning and unlocks generalization capabilities in manipulation systems.
MTL-KD: Multi-Task Learning Via Knowledge Distillation for Generalizable Neural Vehicle Routing Solver
Multi-Task Learning (MTL) in Neural Combinatorial Optimization (NCO) is a promising approach for training a unified model capable of solving multiple Vehicle Routing Problem (VRP) variants. However, existing Reinforcement Learning (RL)-based multi-task methods can only train light decoder models on small-scale problems, exhibiting limited generalization ability when solving large-scale problems. To overcome this limitation, this work introduces a novel multi-task learning method driven by knowledge distillation (MTL-KD), which enables efficient training of heavy decoder models with strong generalization ability. The proposed MTL-KD method transfers policy knowledge from multiple distinct RL-based single-task models to a single heavy decoder model, facilitating label-free training and effectively improving the model's generalization ability across diverse tasks. In addition, we introduce a flexible inference strategy termed Random Reordering Re-Construction (R3C), which is specifically adapted for diverse VRP tasks and further boosts the performance of the multi-task model. Experimental results on 6 seen and 10 unseen VRP variants with up to 1,000 nodes indicate that our proposed method consistently achieves superior performance on both uniform and real-world benchmarks, demonstrating robust generalization abilities.
Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-based Decoding
Diffusion models excel at capturing the natural design spaces of images, molecules, and biological sequences. However, for many applications, rather than merely generating designs that are natural, we aim to optimize downstream reward functions while preserving the naturalness of these design spaces. Existing methods for achieving this goal often require "differentiable" proxy models (e.g., classifier guidance) or computationally-expensive fine-tuning of diffusion models (e.g., classifier-free guidance, RL-based fine-tuning). Here, we propose a new method, Soft Value-based Decoding in Diffusion models (SVDD), to address these challenges. SVDD is an iterative sampling method that integrates soft value functions, which looks ahead to how intermediate noisy states lead to high rewards in the future, into the standard inference procedure of pre-trained diffusion models. Notably, SVDD avoids fine-tuning generative models and eliminates the need to construct differentiable models. This enables us to (1) directly use non-differentiable features/reward feedback, commonly used in many scientific domains, and (2) apply our method to recent discrete diffusion models in a principled way. Finally, we demonstrate the effectiveness of SVDD across several domains, including image generation, molecule generation (optimization of docking scores, QED, SA), and DNA/RNA generation (optimization of activity levels). The code is available at https://github.com/masa-ue/SVDD.
894403f9604374a7a003063e480f65b9-Paper-Conference.pdf
Transformers have theoretical limitations in modeling certain sequence-to-sequence tasks, yet it remains largely unclear if these limitations play a role in large-scale pretrained LLMs, or whether LLMs might effectively overcome these constraints in practice due to the scale of both the models themselves and their pretraining data. We explore how these architectural constraints manifest after pretraining, by studying a family of retrieval and copying tasks inspired by Liu et al. [2024a]. We use a recently proposed framework for studying length generalization [Huang et al., 2025] to provide guarantees for each of our settings.
The Growing Political Power of Anti-Data Center Activists
Follow this section to personalize your feed and get instant alerts. Follow Go to your personalized feed WHY FOLLOW? Smart Alerts: Get notified about major news as it happens. Follow this tag to personalize your feed and get instant alerts. Follow Go to your personalized feed WHY FOLLOW? Smart Alerts: Get notified about major news as it happens.
GTA 6 - all you need to know about Rockstar's blockbuster game
GTA 6 - all you need to know about Rockstar's blockbuster game The latest instalment in Rockstar's blockbuster game franchise, Grand Theft Auto, is set to be the biggest games launch of the year. Details are still scant, although we do now know that GTA 6 will be available to pre-order on 25 June, the developer has announced . Analysts believe Rockstar's action adventure could become the most expensive game ever made, with estimates putting development costs at more than $1bn (£866m). We're still awaiting some crucial information about the game - but here's what we do and don't know about GTA 6 so far. When is GTA 6 coming out?