oversight
Scaling Laws For Scalable Oversight
Scalable oversight, the process by which weaker AI systems supervise stronger ones, has been proposed as a key strategy to control future superintelligent systems. However, it is still unclear how scalable oversight itself scales. To address this gap, we propose a framework that quantifies the probability of successful oversight as a function of the capabilities of the overseer and the system being overseen. Specifically, our framework models oversight as a game between capability-mismatched players; the players have oversight-specific Elo scores that are a piecewise-linear function of their general intelligence, with two plateaus corresponding to task incompetence and task saturation. We validate our framework with a modified version of the game Nim and then apply it to four oversight games: Mafia, Debate, Backdoor Code and Wargames. For each game, we find scaling laws that approximate how domain performance depends on general AI system capability. We then build on our findings in a theoretical study of Nested Scalable Oversight (NSO), a process in which trusted models oversee untrusted stronger models, which then become the trusted models in the next step. We identify conditions under which NSO succeeds and derive numerically (and in some cases analytically) the optimal number of oversight levels to maximize the probability of oversight success. We also apply our theory to our four oversight games, where we find that NSO success rates at a general Elo gap of 400 are 13.5% for Mafia, 51.7% for Debate, 10.0% for Backdoor Code, and 9.4% for Wargames; these rates decline further when overseeing stronger systems.
Military AINeeds Technically-Informed Regulation to Safeguard AIResearch and its Applications
Military weapon systems and command-and-control infrastructure augmented by artificial intelligence (AI) have seen rapid development and deployment in recent years. However, the sociotechnical impacts of AI on combat systems, military decision-making, and the norms of warfare have been understudied. We focus on a specific subset of lethal autonomous weapon systems (LAWS) that use AI for targeting or battlefield decisions. We refer to this subset as AI-powered lethal autonomous weapon systems (AI-LAWS) and argue that they introduce novel risks--including unanticipated escalation, poor reliability in unfamiliar environments, and erosion of human oversight--all of which threaten both military effectiveness and the openness of AI research. These risks cannot be addressed by high-level policy alone; effective regulation must be grounded in the technical behavior of AI models. We argue that AI researchers must be involved throughout the regulatory lifecycle. Thus, we propose a clear, behavior-based definition of AILAWS--systems that introduce unique risks through their use of modern AI--as a foundation for technically grounded regulation, given that existing frameworks do not distinguish them from conventional LAWS. Using this definition, we propose several technically-informed policy directions and invite greater participation from the AI research community in military AI policy discussions.
1ae5c1db7569a6c2f395020765b119a4-Paper-Position_Paper_Track.pdf
Artificial intelligence (AI) now permeates critical infrastructures and decisionmaking systems where failures produce social, economic, and democratic harm. This position paper challenges the entrenched belief that regulation and innovation are opposites. As evidenced by analogies from aviation, pharmaceuticals, and welfare systems and recent cases of synthetic misinformation, bias and unaccountable decision-making, the absence of well-designed regulation has already created immeasurable damage. Regulation, when thoughtful and adaptive, is not a brake on innovation--it is its foundation. The present position paper examines the EU AIAct as a model of risk-based, responsibility-driven regulation that addresses the Collingridge Dilemma: acting early enough to prevent harm, yet flexibly enough to sustain innovation. Its adaptive mechanisms--regulatory sandboxes, small and medium enterprises (SMEs) support, real-world testing, fundamental rights impact assessment (FRIA)--demonstrate how regulation can accelerate responsibly, rather than delay, technological progress. The position paper summarises how governance tools transform perceived burdens into tangible advantages: legal certainty, consumer trust, and ethical competitiveness.
Scaling Laws For Scalable Oversight
Scalable oversight, the process by which weaker AI systems supervise stronger ones, has been proposed as a key strategy to control future superintelligent systems. However, it is still unclear how scalable oversight itself scales. To address this gap, we propose a framework that quantifies the probability of successful oversight as a function of the capabilities of the overseer and the system being overseen. Specifically, our framework models oversight as a game between capability-mismatched players; the players have oversight-specific Elo scores that are a piecewise-linear function of their general intelligence, with two plateaus corresponding to task incompetence and task saturation. We validate our framework with a modified version of the game Nim and then apply it to four oversight games: Mafia, Debate, Backdoor Code and Wargames. For each game, we find scaling laws that approximate how domain performance depends on general AI system capability. We then build on our findings in a theoretical study of Nested Scalable Oversight (NSO), a process in which trusted models oversee untrusted stronger models, which then become the trusted models in the next step. We identify conditions under which NSO succeeds and derive numerically (and in some cases analytically) the optimal number of oversight levels to maximize the probability of oversight success. We also apply our theory to our four oversight games, where we find that NSO success rates at a general Elo gap of 400 are 13.5\% for Mafia, 51.7\% for Debate, 10.0\% for Backdoor Code, and 9.4\% for Wargames; these rates decline further when overseeing stronger systems.
AI facial recognition oversight lagging far behind technology, watchdogs warn
How does live facial recognition work and how many police forces use it? Britain's biometrics watchdogs have warned that national oversight of AI-powered face scanning to catch criminals is lagging far behind the technology's rapid growth. With the Metropolitan police almost doubling the number of faces they scan in London over the past 12 months and a rising use of the technology by retailers in the UK, Prof William Webster, the biometrics commissioner for England and Wales, said the "slow pace of legislation was trying to catch up with the real world" and "the horse had gone before the cart". Dr Brian Plastow, who holds the same role in Scotland, warned the technology was "nowhere near as effective as the police claim it is" and said there was a "patchwork legal framework" throughout the UK. He said in England and Wales, police were "really just marking their own homework".
RWDS Big Questions: how do we balance innovation and regulation in the world of AI?
RWDS Big Questions: how do we balance innovation and regulation in the world of AI? AI development is accelerating, while regulation moves more deliberately. That tension creates a core challenge: how do we maintain momentum without breaking the things that matter? The aim isn't to slow innovation unnecessarily, but to ensure progress happens at a pace that protects individuals and society. Responsible actors should not be disadvantaged -- yet safeguards are essential to maintain trust. For the latest video in our RWDS Big Questions series, our panel explores this delicate balance.
US House panel advances bill to give Congress authority on AI chip exports
What is the Insurrection Act? Why is the US Fed chair criminal probe causing alarm? The United States House of Representatives Foreign Affairs Committee has overwhelmingly voted to advance a bill that would give Congress more power over artificial intelligence chip exports despite pushback from White House AI tsar David Sacks and a social media campaign against the legislation. Representative Brian Mast of Florida, a Republican and the chair of the House Foreign Affairs Committee, introduced the "AI Overwatch Act" in December after US President Donald Trump greenlit shipments of Nvidia's powerful H200 AI chips to China. The bill claims that those "countries of concern" also include countries beyond China, such as Russia, Iran, North Korea, Cuba and Venezuela.
Big Balls Was Just the Beginning
DOGE dominated the news this year as Elon Musk's operatives shook up several US government agencies. Since the beginning of the Trump administration, the so-called Department of Government Efficiency (DOGE), the brainchild of billionaire Elon Musk, has gone through several iterations, leading periodically to claims-- most recently from the director of the Office of Personnel Management--that the group doesn't exist, or has vanished altogether. Many of its original members are in full-time roles at various government agencies, and the new National Design Studio (NDS) is headed by Airbnb cofounder Joe Gebbia, a close ally of Musk's. Even if DOGE doesn't survive another year, or until the US semiquincentennial--its original expiration date, per the executive order establishing it--the organization's larger project will continue. DOGE from its inception was used for two things, both of which have continued apace: the destruction of the administrative state and the wholesale consolidation of data in service of concentrating power in the executive branch.
The SMART+ Framework for AI Systems
Kandikatla, Laxmiraju, Radeljic, Branislav
Artificial Intelligence (AI) systems are now an integral part of multiple industries. In clinical research, AI supports automated adverse event detection in clinical trials, patient eligibility screening for protocol enrollment, and data quality validation. Beyond healthcare, AI is transforming finance through real-time fraud detection, automated loan risk assessment, and algorithmic decision-making. Similarly, in manufacturing, AI enables predictive maintenance to reduce equipment downtime, enhances quality control through computer-vision inspection, and optimizes production workflows using real-time operational data. While these technologies enhance operational efficiency, they introduce new challenges regarding safety, accountability, and regulatory compliance. To address these concerns, we introduce the SMART+ Framework - a structured model built on the pillars of Safety, Monitoring, Accountability, Reliability, and Transparency, and further enhanced with Privacy & Security, Data Governance, Fairness & Bias, and Guardrails. SMART+ offers a practical, comprehensive approach to evaluating and governing AI systems across industries. This framework aligns with evolving mechanisms and regulatory guidance to integrate operational safeguards, oversight procedures, and strengthened privacy and governance controls. SMART+ demonstrates risk mitigation, trust-building, and compliance readiness. By enabling responsible AI adoption and ensuring auditability, SMART+ provides a robust foundation for effective AI governance in clinical research.