Goto

Collaborating Authors

 deployment


Information-Theoretic Limits of Safety Verification for Self-Improving Systems

Scrivens, Arsenios

arXiv.org Machine Learning

Can a safety gate permit unbounded beneficial self-modification while maintaining bounded cumulative risk? We formalize this question through dual conditions -- requiring sum delta_n < infinity (bounded risk) and sum TPR_n = infinity (unbounded utility) -- and establish a theory of their (in)compatibility. Classification impossibility (Theorem 1): For power-law risk schedules delta_n = O(n^{-p}) with p > 1, any classifier-based gate under overlapping safe/unsafe distributions satisfies TPR_n <= C_alpha * delta_n^beta via Holder's inequality, forcing sum TPR_n < infinity. This impossibility is exponent-optimal (Theorem 3). A second independent proof via the NP counting method (Theorem 4) yields a 13% tighter bound without Holder's inequality. Universal finite-horizon ceiling (Theorem 5): For any summable risk schedule, the exact maximum achievable classifier utility is U*(N, B) = N * TPR_NP(B/N), growing as exp(O(sqrt(log N))) -- subpolynomial. At N = 10^6 with budget B = 1.0, a classifier extracts at most U* ~ 87 versus a verifier's ~500,000. Verification escape (Theorem 2): A Lipschitz ball verifier achieves delta = 0 with TPR > 0, escaping the impossibility. Formal Lipschitz bounds for pre-LayerNorm transformers under LoRA enable LLM-scale verification. The separation is strict. We validate on GPT-2 (d_LoRA = 147,456): conditional delta = 0 with TPR = 0.352. Comprehensive empirical validation is in the companion paper [D2].


Top AI ethics and policy issues of 2025 and what to expect in 2026

AIHub

This happened as generative and agentic systems became essential in key sectors worldwide. This feature highlights the major AI ethics and policy developments of 2025, and concludes with a forward-looking perspective on the ethical and policy challenges likely to shape 2026.


'What is the mission?' With Iran, California military families fear another 'forever war'

Los Angeles Times

Things to Do in L.A. With Iran, California military families fear another'forever war' Shalena Critchlow, at the Oceanside Pier, holds a photo of her son Cpl. Saiveon Critchlow, who recently completed his service with the U.S. Marines. This is read by an automated voice. Please report any issues or inconsistencies here .






FairMultipleDecisionMaking ThroughSoftInterventions

Neural Information Processing Systems

How to ensure fairness in algorithmic decision making models is an important task in machine learning [12,15]. Over the past years, many researchers have been devoted to the design of fair classification algorithms withrespecttoapre-defined protected attribute,suchasraceorsex,anda decision task/model, such as hiring [1,11,24]. In particular,one line of the work istoincorporate fairness constraints into classic learning algorithms tobuild fair classifiers from potentially biased data [4,13,29,31-33]. Most of previous research generally focuses on a single decision model.