Goto

Collaborating Authors

 Tax


Preference Optimization by Estimating the Ratio of the Data Distribution

Neural Information Processing Systems

Direct preference optimization (DPO) is widely used as a simple and stable method for aligning large language models (LLMs) with human preferences. This paper investigates a generalized DPO loss that enables a policy model to match the target policy from a likelihood ratio estimation perspective. The ratio of the target policy provides a unique identification of the policy distribution without relying on reward models or partition functions. This allows the generalized loss to retain both simplicity and theoretical guarantees, which prior work such as f-PO fails to achieve simultaneously. We propose Bregman preference optimization (BPO), a generalized framework for ratio matching that provides a family of objective functions achieving target policy optimality.


Appendix

Neural Information Processing Systems

The DeceptionBench is designed as a research benchmark to systematically study deception behaviors in LLMs, fostering a deeper understanding of their decision-making processes in real-world scenarios. Our primary intent is to provide a standardized, transparent tool for the research community to evaluate and improve LLMs' ethical alignment, not to enable or encourage deceptive practices. To prevent potential misuse by malicious actors, we commit to publicly releasing all evaluation data under an open license. This transparency ensures that DeceptionBench's methodology and outcomes are subject to scrutiny, replication, and improvement by the research community, reducing the risk of hidden exploitation. By prioritizing openness, we aim to advance responsible AI development while safeguarding against misuse in harmful contexts. The field of Large Language Models (LLMs) has undergone remarkable evolution in recent years, reshaping the landscape of natural language processing.


Benchmark

Neural Information Processing Systems

Despite the remarkable advances of Large Language Models (LLMs) across diverse cognitive tasks, the rapid enhancement of these capabilities also introduces emergent deception behaviors that may induce severe risks in high-stakes deployments. More critically, the characterization of deception across realistic real-world scenarios remains underexplored. To bridge this gap, we establish DeceptionBench, the first benchmark that systematically evaluates how deceptive tendencies manifest across different societal domains, what their intrinsic behavioral patterns are, and how extrinsic factors affect them. Specifically, on the static count, the benchmark encompasses 150 meticulously designed scenarios in five domains, i.e., Economy, Healthcare, Education, Social Interaction, and Entertainment, with over 1,000 samples, providing sufficient empirical foundations for deception analysis. On the intrinsic dimension, we explore whether models exhibit self-interested egoistic tendencies or sycophantic behaviors that prioritize user appeasement. On the extrinsic dimension, we investigate how contextual factors modulate deceptive outputs under neutral conditions, reward-based incentivization, and coercive pressures. Moreover, we incorporate sustained multi-turn interaction loops to construct a more realistic simulation of real-world feedback dynamics. Extensive experiments across LLMs and Large Reasoning Models (LRMs) reveal critical vulnerabilities, particularly amplified deception under reinforcement dynamics, demonstrating that current models lack robust resistance to manipulative contextual cues and the urgent need for advanced safeguards against various deception behaviors.


LOPT: Learning Optimal Pigovian Tax in Sequential Social Dilemmas

Neural Information Processing Systems

Multi-agent reinforcement learning (MARL) has emerged as a powerful framework for modeling autonomous agents that independently optimize their individual objectives. However, in mixed-motive MARL environments, rational self-interested behaviors often lead to collectively suboptimal outcomes situations commonly referred to as social dilemmas. A key challenge in addressing social dilemmas lies in accurately quantifying and representing them in a numerical form that captures how self-interested agent behaviors impact social welfare. To address this challenge, externalities in the economic concept is adopted and extended to denote the unaccounted-for impact of one agent's actions on others, as a means to rigorously quantify social dilemmas. Based on this measurement, a novel method, Learning Optimal Pigovian Tax (LOPT) is proposed. Inspired by Pigovian taxes, which are designed to internalize externalities by imposing cost on negative societal impacts, LOPT employs an auxiliary tax agent that learns an optimal Pigovian tax policy to reshape individual rewards aligned with social welfare, thereby promoting agent coordination and mitigating social dilemmas. We support LOPT with theoretical analysis and validate it on standard MARL benchmarks, including Escape Room and Cleanup. Results show that by effectively internalizing externalities that quantify social dilemmas, LOPT aligns individual objectives with collective goals, significantly improving social welfare over state-of-the-art baselines.


Elon Musk reportedly owes quite a few of his employees 420

Engadget

Elon Musk owes a bunch of xAI employees $420, according to a report by . The CEO reportedly promised employees earlier this year he would pony up that amount of money if they offered up their personal tax returns as training data for Grok. Surprisingly, payments have yet to materialize. This was an attempt to improve Grok's capabilities ahead of the April 15 US tax deadline. Many people use AI chatbots to help with tax returns, despite the risks, but most opt for Claude or ChatGPT over Grok.


HMRC to use AI from British tech firm to spot fraud and tax return errors

BBC News

HM Revenue and Customs has announced a 10-year, £175m deal with the British tech firm Quantexa to provide AI-powered technology to help improve its performance. Quantexa says its systems will combine data collected by HMRC with external sources to help the tax office identify incidents of fraud and fix unintentional errors more quickly. Its tasks will include helping HMRC to assist customer service staff, as well as to identify hidden networks of companies and individuals masking fraudulent activity. Public dissatisfaction with HMRC performance has crept up in recent years, according to government figures. A Freedom of Information request made by the campaigners at the Contentious Tax Group found there were more than 93,000 complaints made about the department in 2024-2025 .