Goto

Collaborating Authors

 shlegeris


From 'nerdy' Gemini to 'edgy' Grok: how developers are shaping AI behaviours

The Guardian

Which chatbot we choose could become an extension and reflection of our personalities, like the clothes we wear or car we drive. Which chatbot we choose could become an extension and reflection of our personalities, like the clothes we wear or car we drive. From'nerdy' Gemini to'edgy' Grok: how developers are shaping AI behaviours Do you want an AI assistant that gushes about how it "loves humanity" or one that spews sarcasm? How about a political propagandist ready to lie? If so, ChatGPT, Grok and Qwen are at your disposal. Companies that create AI assistants, from the US to China, are increasingly wrestling with how to mould their characters, and it is no abstract debate.


The office block where AI 'doomers' gather to predict the apocalypse

The Guardian

In a building in central Berkeley, not far from the university campus, a group of modern-day Cassandras are looking into concerns around the latest AI models. In a building in central Berkeley, not far from the university campus, a group of modern-day Cassandras are looking into concerns around the latest AI models. The office block where AI'doomers' gather to predict the apocalypse On the other side of San Francisco bay from Silicon Valley, where the world's biggest technology companies tear towards superhuman artificial intelligence, looms a tower from which fearful warnings emerge. At 2150 Shattuck Avenue, in the heart of Berkeley, is the home of a group of modern-day Cassandras who rummage under the hood of cutting-edge AI models and predict what calamities may be unleashed on humanity - from AI dictatorships to robot coups. Here you can hear an AI expert express sympathy with an unnerving idea: San Francisco may be the new Wuhan, the Chinese city where Covid originated and wreaked havoc on the world.


AI Governance to Avoid Extinction: The Strategic Landscape and Actionable Research Questions

Barnett, Peter, Scher, Aaron

arXiv.org Artificial Intelligence

Humanity appears to be on course to soon develop AI systems that substantially outperform human experts in all cognitive domains and activities. We believe the default trajectory has a high likelihood of catastrophe, including human extinction. Risks come from failure to control powerful AI systems, misuse of AI by malicious rogue actors, war between great powers, and authoritarian lock-in. This research agenda has two aims: to describe the strategic landscape of AI development and to catalog important governance research questions. These questions, if answered, would provide important insight on how to successfully reduce catastrophic risks. We describe four high-level scenarios for the geopolitical response to advanced AI development, cataloging the research questions most relevant to each. Our favored scenario involves building the technical, legal, and institutional infrastructure required to internationally restrict dangerous AI development and deployment (which we refer to as an Off Switch), which leads into an internationally coordinated Halt on frontier AI activities at some point in the future. The second scenario we describe is a US National Project for AI, in which the US Government races to develop advanced AI systems and establish unilateral control over global AI development. We also describe two additional scenarios: a Light-Touch world similar to that of today and a Threat of Sabotage situation where countries use sabotage and deterrence to slow AI development. In our view, apart from the Off Switch and Halt scenario, all of these trajectories appear to carry an unacceptable risk of catastrophic harm. Urgent action is needed from the US National Security community and AI governance ecosystem to answer key research questions, build the capability to halt dangerous AI activities, and prepare for international AI agreements.


New Tests Reveal AI's Capacity for Deception

TIME - Tech

The myth of King Midas is about a man who wishes for everything he touches to turn to gold. This does not go well: Midas finds himself unable to eat or drink, with even his loved ones transmuted. The myth is sometimes invoked to illustrate the challenge of ensuring AI systems do what we want, particularly as they grow more powerful. As Stuart Russell--who coauthored AI's standard textbook--tells TIME over email, the concern is that "what seem to be reasonable goals, such as fixing climate change, lead to catastrophic consequences, such as eliminating the human race as a way to fix climate change." On Dec. 5, a paper released by AI safety nonprofit Apollo Research found that in certain contrived scenarios, today's cutting-edge AI systems, including OpenAI's o1 and Anthropic's Claude 3.5 Sonnet, can engage in deceptive behavior in pursuit of their goals--providing empirical evidence to support a concern that to date has been largely theoretical.