Industry
Contextual Integrity in LLMs via Reasoning and Reinforcement Learning
As the era of autonomous agents making decisions on behalf of users unfolds, ensuring contextual integrity (CI) - what is the appropriate information to share while carrying out a certain task - becomes a central question to the field. We posit that CI demands a form of reasoning where the agent needs to reason about the context in which it is operating. To test this, we first prompt LLMs to reason explicitly about CI when deciding what information to disclose. We then extend this approach by developing a reinforcement learning (RL) framework that further instills in models the reasoning necessary to achieve CI. Using a synthetic, automatically created, dataset of only 700 examples but with diverse contexts and information disclosure norms, we show that our method substantially reduces inappropriate information disclosure while maintaining task performance across multiple model sizes and families. Importantly, improvements transfer from this synthetic dataset to established CI benchmarks such as PrivacyLens that has human annotations and evaluates privacy leakage of AI assistants in actions and tool calls. Our code is available at: https://github.com/EricGLan/CI-RL
Will California's billionaire tax proposal make it to ballots?
A campaign event in Los Angeles, California, for a proposed'billionaires tax', on 18 February. A campaign event in Los Angeles, California, for a proposed'billionaires tax', on 18 February. Despite more than double the needed number of signatures to qualify for ballot, there's uncertainty it'll make it to voters Nick Robins-Early and Dara Kerr here, filling in for your usual host Blake Montgomery who is out on vacation. We'll be talking about the fight over a proposed billionaire tax in California, the UK's social media ban and SpaceX making a big buy in the AI arms race. The California wealth tax showdown comes to a head this week.
Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties
Recent large-scale reasoning models have achieved state-of-the-art performance on challenging mathematical benchmarks, yet the internal mechanisms underlying their success remain poorly understood. In this work, we introduce the notion of a reasoning graph, extracted by clustering hidden-state representations at each reasoning step, and systematically analyze three key graph-theoretic properties: cyclicity, diameter, and small-world index, across multiple tasks (GSM8K, MATH500, AIME 2024). Our findings reveal that distilled reasoning models (e.g., DeepSeekR1-Distill-Qwen-32B) exhibit significantly more recurrent cycles (about 5 per sample), substantially larger graph diameters, and pronounced small-world characteristics (about 6x) compared to their base counterparts. Notably, these structural advantages grow with task difficulty and model capacity, with cycle detection peaking at the 14B scale and exploration diameter maximized in the 32B variant, correlating positively with accuracy. Furthermore, we show that supervised fine-tuning on an improved dataset systematically expands reasoning graph diameters in tandem with performance gains, offering concrete guidelines for dataset design aimed at boosting reasoning capabilities.
ff887781480973bd3cb6026feb378d1e-Paper-Conference.pdf
This based paper on pix presents el-space Pixel-P diffusion erfect generation Depth that, a monocular produces high-quality depth estimation, flying-pix model elfree point clouds from estimated depth maps. Current generative depth estimation models they require fine-tune a VAE Stable to compre Diffusion ss depth and maps achiev into e impressi the latent ve performance.
Meta-D2AG: Causal Graph Learning with Interventional Dynamic Data
Causal discovery in the form of a directed acyclic graph (DAG) for dynamic time series data has been widely studied in various applications. In this work, we propose a dynamic DAG discovery algorithm, Meta-D2AG, based on online metalearning. Meta-D2AG is designed to learn dynamic DAG structures from potentially nonlinear and non-stationary time series datasets, accounting for changes in both parameters and graph structures. Unlike most of the existing work focusing on observational, offline, and/or stationary settings, Meta-D2AG explicitly treats data collected at different time points with distribution shifts as distinct domains, which is assumed to occur as a result of external interventions. Moreover, MetaD2AG involves a new online meta-learning framework to take advantage of the temporal transition among existing domains such that it can quickly adapt to new domains with few measurements. A first-order optimization approach is utilized to efficiently solve the meta-learning framework, and theoretical analysis establishes the identifiability conditions and the convergence of the learning process. We demonstrate the promising performance of the proposed meta learning framework through better accuracy on benchmark datasets against state-of-the-art baselines.
Dynamic and Chemical Constraints to Enhance the Molecular Masked Graph Autoencoders
Masked Graph Autoencoders (MGAEs) have gained significant attention recently. Their proxy tasks typically involve random corruption of input graphs followed by reconstruction. However, in the molecular domain, two main issues arise: the predetermined mask ratio and reconstruction objectives can lead to suboptimal performance or negative transfer due to overly simplified or complex tasks, and these tasks may deviate from chemical priors. To tackle these challenges, we propose Dynamic and Chemical Constraints (DyCC) for MGAEs. This includes a masking strategy called GIBMS, which preserves essential semantic information during graph masking while adaptively adjusting the mask ratio and content for each molecule. Additionally, we introduce a Soft Label Generator (SLG) that reconstructs masked tokens as learnable prototypes (soft labels) rather than hard labels. These components adhere to chemical constraints and allow dynamic variation of proxy tasks during training. We integrate the model-agnostic DyCC into various MGAEs and conduct comprehensive experiments, demonstrating significant performance improvements. Our code is available at https://github.
QiMeng-NeuComBack: Self-Evolving Translation from IR to Assembly Code
Compilers, while essential, are notoriously complex systems that demand prohibitively expensive human expertise to develop and maintain. The recent advancements in Large Language Models (LLMs) offer a compelling new paradigm: Neural Compilation, which could potentially simplify compiler development for new architectures and facilitate the discovery of innovative optimization techniques. However, several critical obstacles impede its practical adoption. Firstly, a significant lack of dedicated benchmarks and robust evaluation methodologies hinders objective assessment and tracking of progress in the field. Secondly, systematically enhancing the reliability and performance of LLM-generated assembly remains a critical challenge.
Improving Regret Approximation for Unsupervised Dynamic Environment Generation
Unsupervised Environment Design (UED) seeks to automatically generate training curricula for reinforcement learning (RL) agents, with the goal of improving generalisation and zero-shot performance. However, designing effective curricula remains a difficult problem, particularly in settings where small subsets of environment parameterisations result in significant increases in the complexity of the required policy. Current methods struggle with a difficult credit assignment problem and rely on regret approximations that fail to identify challenging levels, both of which are compounded as the size of the environment grows. We propose Dynamic Environment Generation for UED (DEGen) to enable a denser level generator reward signal, reducing the difficulty of credit assignment and allowing for UED to scale to larger environment sizes. We also introduce a new regret approximation, Maximised Negative Advantage (MNA), as a significantly improved metric to optimise for, that better identifies more challenging levels. We show empirically that MNA outperforms current regret approximations and when combined with DEGen, consistently outperforms existing methods, especially as the size of the environment grows. We have made all our code available here: https://github.
Agents
To address this problem, fine-tuning longcontext LVLMs and employing GPT-based agents have emerged as promising solutions. However, fine-tuning LVLMs would require extensive high-quality data and substantial GPU resources, while GPT-based agents would rely on proprietary models (e.g., GPT-4o). In this paper, we propose Video Retrieval-Augmented Generation (Video-RAG), a training-free and cost-effective pipeline that employs visually-aligned auxiliary texts to help facilitate cross-modality alignment while providing additional information beyond the visual content. Specifically, we leverage open-source external tools to extract visually-aligned information from pure video data (e.g., audio, optical character, and object detection), and incorporate the extracted information into an existing LVLM as auxiliary texts, alongside video frames and queries, in a plug-and-play manner. Our Video-RAG offers several key advantages: (i) lightweight with low computing overhead due to singleturn retrieval; (ii) easy implementation and compatibility with any LVLM; and (iii) significant, consistent performance gains across long video understanding benchmarks, including Video-MME, MLVU, and LongVideoBench. Notably, our model demonstrates superior performance over proprietary models like Gemini1.5-Pro and GPT-4o when utilized with a 72B model.