Goto

Collaborating Authors

 catastrophe


Twelve miners killed by Russian strike in Ukraine, energy company says

BBC News

Twelve miners have been killed by a Russian drone strike in eastern Ukraine, the country's largest private energy firm has said. DTEK said a bus carrying workers after a shift in the Dnipropetrovsk region had been targeted in Sunday's attack. At least seven people were injured. Earlier, at least two others were killed and nine injured in separate Russian attacks overnight and on Sunday. The victims included six people hurt when a drone hit a maternity hospital in Zaporizhzhia.


A Concrete Roadmap towards Safety Cases based on Chain-of-Thought Monitoring

Schulz, Julian

arXiv.org Artificial Intelligence

As AI systems approach dangerous capability levels where inability safety cases become insufficient, we need alternative approaches to ensure safety. This paper presents a roadmap for constructing safety cases based on chain-of-thought (CoT) monitoring in reasoning models and outlines our research agenda. We argue that CoT monitoring might support both control and trustworthiness safety cases. We propose a two-part safety case: (1) establishing that models lack dangerous capabilities when operating without their CoT, and (2) ensuring that any dangerous capabilities enabled by a CoT are detectable by CoT monitoring. We systematically examine two threats to monitorability: neuralese and encoded reasoning, which we categorize into three forms (linguistic drift, steganography, and alien reasoning) and analyze their potential drivers. We evaluate existing and novel techniques for maintaining CoT faithfulness. For cases where models produce non-monitorable reasoning, we explore the possibility of extracting a monitorable CoT from a non-monitorable CoT. To assess the viability of CoT monitoring safety cases, we establish prediction markets to aggregate forecasts on key technical milestones influencing their feasibility.


Liberals are catalysts to catastrophe, again

Al Jazeera

Yoav Litvin is an Israeli-American doctor of psychology/neuroscience, a writer and photographer. On September 17, the late-night talk show host Jimmy Kimmel was suspended after remarks he made about the death of right-wing activist Charlie Kirk. Days later, he was reinstated following liberal upheaval. In his first appearance back on air, Kimmel read US President Donald Trump's post on Truth Social: "I can't believe ABC fake news gave Jimmy Kimmel his job back." Without missing a beat, Kimmel responded, "You can't believe they gave me my job back. I can't believe we gave you your job back!"


The Simplistic Moral Lessons of "Superman"

The New Yorker

The world may be going to hell, but the writer and director James Gunn has graced it with a sunshine "Superman." The most recent installments in the franchise--Zack Snyder's diptych "Man of Steel" (2013) and "Batman v Superman: Dawn of Justice" (2016)--had a hectic, howling, near-apocalyptic sense of tragedy, but Gunn's vision is bright, chipper, and sentimental. A title card announces that Superman has endured his first defeat, and the hero (played by David Corenswet) is shown tumbling from the sky and slamming with a sickening thud onto the surface of a frozen wasteland, where he lies prostrate, spitting red blood on the snow. Fear not: no sooner does the wounded combatant put his lips together and whistle for Krypto than his faithful and frisky canine companion arrives and drags his master back to the Fortress of Solitude. There, loyal robots examine the patient and, by exposing him to sunlight, begin to heal him.


On the Mathematical Impossibility of Safe Universal Approximators

Yao, Jasper

arXiv.org Artificial Intelligence

We establish fundamental mathematical limits on universal approximation theorem (UAT) system alignment by proving that catastrophic failures are an inescapable feature of any useful computational system. Our central thesis is that for any universal approximator, the expressive power required for useful computation is inextricably linked to a dense set of instabilities that make perfect, reliable control a mathematical impossibility. We prove this through a three-level argument that leaves no escape routes for any class of universal approximator architecture. i) Combinatorial Necessity: For the vast majority of practical universal approximators (e.g., those using ReLU activations), we prove that the density of catastrophic failure points is directly proportional to the network's expressive power. ii) Topological Necessity: For any theoretical universal approximator, we use singularity theory to prove that the ability to approximate generic functions requires the ability to implement the dense, catastrophic singularities that characterize them. iii) Empirical Necessity: We prove that the universal existence of adversarial examples is empirical evidence that real-world tasks are themselves catastrophic, forcing any successful model to learn and replicate these instabilities. These results, combined with a quantitative "Impossibility Sandwich" showing that the minimum complexity for usefulness exceeds the maximum complexity for safety, demonstrate that perfect alignment is not an engineering challenge but a mathematical impossibility. This foundational result reframes UAT safety from a problem of "how to achieve perfect control" to one of "how to operate safely in the presence of irreducible uncontrollability," with profound implications for the future of UAT development and governance.


Misalignment or misuse? The AGI alignment tradeoff

Hellrigel-Holderbaum, Max, Dung, Leonard

arXiv.org Artificial Intelligence

Creating systems that are aligned with our goals is seen as a leading approach to create safe and beneficial AI in both leading AI companies and the academic field of AI safety. We defend the view that misaligned AGI - future, generally intelligent (robotic) AI agents - poses catastrophic risks. At the same time, we support the view that aligned AGI creates a substantial risk of catastrophic misuse by humans. While both risks are severe and stand in tension with one another, we show that - in principle - there is room for alignment approaches which do not increase misuse risk. We then investigate how the tradeoff between misalignment and misuse looks em pirically for different technical approaches to AI alignment. Here, we argue that many current alignment techniques and foreseeable improvements thereof plausibly increase risks of catastrophic misuse. Since the impacts of AI depend on the social context, we close by discussing important social factors and suggest that to reduce the risk of a misuse catastrophe due to aligned AGI, techniques such as robustness, AI control methods and especially good governance seem essential.


The Cybertruck was supposed to be apocalypse-proof. Can it even survive a trip to the grocery store?

The Guardian

The Cybertruck answers a question no one in the auto industry even thought to ask: what if there was a truck that a Chechen warlord couldn't possibly pass up – a bulletproof, bioweapons-resistant, road rage-inducing street tank that's illegal to drive in most of the world? Few had seen anything quite like the Cybertruck when it was unveiled in 2019. Wrapped in an "ultra-hard, 30X, cold-rolled stainless steel exoskeleton", the Cybertruck was touted as the ultimate doomsday chariot – a virtually indestructible, obtuse-angled, electrically powered behemoth that can repel handgun fire and outrun a Porsche while towing a Porsche, with enough juice leftover to power your house in the event of a blackout. At the launch, Tesla's CEO, Elon Musk, said the truck could tackle any terrain on Earth and possibly also on Mars – and all for the low, low base price of 40,000. "Sometimes you get these late-civilization vibes [that the] apocalypse could come along at any moment," Musk said.


Asking for Help Enables Safety Guarantees Without Sacrificing Effectiveness

Plaut, Benjamin, Liévano-Karim, Juan, Russell, Stuart

arXiv.org Artificial Intelligence

Most reinforcement learning algorithms with regret guarantees rely on a critical assumption: that all errors are recoverable. Recent work by Plaut et al. discarded this assumption and presented algorithms that avoid "catastrophe" (i.e., irreparable errors) by asking for help. However, they provided only safety guarantees and did not consider reward maximization. We prove that any algorithm that avoids catastrophe in their setting also guarantees high reward (i.e., sublinear regret) in any Markov Decision Process (MDP), including MDPs with irreversible costs. This constitutes the first no-regret guarantee for general MDPs. More broadly, our result may be the first formal proof that it is possible for an agent to obtain high reward while becoming self-sufficient in an unknown, unbounded, and high-stakes environment without causing catastrophe or requiring resets.


Doomsday Clock ticks forwards to 89 seconds to midnight - the closest humans have ever been to annihilation

Daily Mail - Science & tech

Humanity is officially one second closer to world annihilation, scientists say. The Doomsday Clock has been revealed – and it now sits at 89 seconds to midnight, one second closer than last year. It's also the closest the clock has ever been to midnight in its 78-year history, meaning we're nearer to world-ending catastrophe than ever before. The Bulletin of Atomic Scientists, which decides where the hands are set, cited the Russia-Ukraine war, ongoing conflicts in the Middle East, the threat of nuclear war, climate change, a looming bird flu pandemic and AI arms race for the update. The Chicago-based nonprofit created the Doomsday Clock in 1947 during the Cold War tensions that followed World War II to warn the public about how close humankind was to destroying the world.


Is humanity doomed? Doomsday Clock will be updated this MONTH to determine our fate - as the Russia-Ukraine war rages on and climate disasters continue to wreak havoc

Daily Mail - Science & tech

This month, humanity will learn just how close we are to annihilation. Every January, the Bulletin of the Atomic Scientists (BAS) sets a new time for the Doomsday Clock - the symbolic scale for humanity's proximity to the apocalypse. Last year, scientists left the clock sitting at 90 seconds to midnight - the closest humanity had come to destruction since the creation of the atomic bomb. But with war still raging in Ukraine and chaos across the Middle East, experts say that the risk of nuclear war is now'far too high'. Dr Haydn Belfield, research associate at the Centre for the Study of Existential Risk, told MailOnline: 'We are probably closer to nuclear war than at any point in the last forty years.'