AITopics | shlegeris

Collaborating Authors

shlegeris

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

From 'nerdy' Gemini to 'edgy' Grok: how developers are shaping AI behaviours

The GuardianFeb-3-2026, 17:28:23 GMT

Which chatbot we choose could become an extension and reflection of our personalities, like the clothes we wear or car we drive. Which chatbot we choose could become an extension and reflection of our personalities, like the clothes we wear or car we drive. From'nerdy' Gemini to'edgy' Grok: how developers are shaping AI behaviours Do you want an AI assistant that gushes about how it "loves humanity" or one that spews sarcasm? How about a political propagandist ready to lie? If so, ChatGPT, Grok and Qwen are at your disposal. Companies that create AI assistants, from the US to China, are increasingly wrestling with how to mould their characters, and it is no abstract debate.

claude, machine learning, natural language, (21 more...)

The Guardian

Country:

Europe (0.48)
North America > United States > California (0.29)

Industry:

Health & Medicine (0.95)
Government > Regional Government (0.71)
Leisure & Entertainment > Sports (0.69)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

The office block where AI 'doomers' gather to predict the apocalypse

The GuardianDec-30-2025, 17:00:47 GMT

In a building in central Berkeley, not far from the university campus, a group of modern-day Cassandras are looking into concerns around the latest AI models. In a building in central Berkeley, not far from the university campus, a group of modern-day Cassandras are looking into concerns around the latest AI models. The office block where AI'doomers' gather to predict the apocalypse On the other side of San Francisco bay from Silicon Valley, where the world's biggest technology companies tear towards superhuman artificial intelligence, looms a tower from which fearful warnings emerge. At 2150 Shattuck Avenue, in the heart of Berkeley, is the home of a group of modern-day Cassandras who rummage under the hood of cutting-edge AI models and predict what calamities may be unleashed on humanity - from AI dictatorships to robot coups. Here you can hear an AI expert express sympathy with an unnerving idea: San Francisco may be the new Wuhan, the Chinese city where Covid originated and wreaked havoc on the world.

ai company, ais, shlegeris, (15 more...)

The Guardian

Country:

North America > United States > California > San Francisco County > San Francisco (0.45)
Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.24)
Asia > China > Hubei Province > Wuhan (0.24)
(3 more...)

Industry:

Leisure & Entertainment > Sports (0.69)
Information Technology > Services (0.69)
Government > Regional Government > North America Government > United States Government (0.48)

Technology:

Information Technology > Communications > Social Media (0.96)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

CoT Red-Handed: Stress Testing Chain-of-Thought Monitoring

Arnav, Benjamin, Bernabeu-Pérez, Pablo, Helm-Burger, Nathan, Kostolansky, Tim, Whittingham, Hannes, Phuong, Mary

arXiv.org Artificial IntelligenceNov-26-2025

As AI models are deployed with increasing autonomy, it is important to ensure they do not take harmful actions unnoticed. As a potential mitigation, we investigate Chain-of-Thought (CoT) monitoring, wherein a weaker trusted monitor model continuously oversees the intermediate reasoning steps of a more powerful but untrusted model. We compare CoT monitoring to action-only monitoring, where only final outputs are reviewed, in a red-teaming setup where the untrusted model is instructed to pursue harmful side tasks while completing a coding problem. We find that while CoT monitoring is more effective than overseeing only model outputs in scenarios where action-only monitoring fails to reliably identify sabotage, reasoning traces can contain misleading rationalizations that deceive the CoT monitors, reducing performance in obvious sabotage cases. To address this, we introduce a hybrid protocol that independently scores model reasoning and actions, and combines them using a weighted average. Our hybrid monitor consistently outperforms both CoT and action-only monitors across all tested models and tasks, with detection rates twice higher than action-only monitoring for subtle deception scenarios.

large language model, machine learning, side task, (18 more...)

arXiv.org Artificial Intelligence

2505.23575

Genre: Research Report (0.52)

Industry: Law Enforcement & Public Safety > Terrorism (0.55)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.73)

Add feedback

AI Governance to Avoid Extinction: The Strategic Landscape and Actionable Research Questions

Barnett, Peter, Scher, Aaron

arXiv.org Artificial IntelligenceMay-8-2025

Humanity appears to be on course to soon develop AI systems that substantially outperform human experts in all cognitive domains and activities. We believe the default trajectory has a high likelihood of catastrophe, including human extinction. Risks come from failure to control powerful AI systems, misuse of AI by malicious rogue actors, war between great powers, and authoritarian lock-in. This research agenda has two aims: to describe the strategic landscape of AI development and to catalog important governance research questions. These questions, if answered, would provide important insight on how to successfully reduce catastrophic risks. We describe four high-level scenarios for the geopolitical response to advanced AI development, cataloging the research questions most relevant to each. Our favored scenario involves building the technical, legal, and institutional infrastructure required to internationally restrict dangerous AI development and deployment (which we refer to as an Off Switch), which leads into an internationally coordinated Halt on frontier AI activities at some point in the future. The second scenario we describe is a US National Project for AI, in which the US Government races to develop advanced AI systems and establish unilateral control over global AI development. We also describe two additional scenarios: a Light-Touch world similar to that of today and a Threat of Sabotage situation where countries use sabotage and deterrence to slow AI development. In our view, apart from the Off Switch and Halt scenario, all of these trajectories appear to carry an unacceptable risk of catastrophic harm. Urgent action is needed from the US National Security community and AI governance ecosystem to answer key research questions, build the capability to halt dangerous AI activities, and prepare for international AI agreements.

international agreement, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.04592

Country:

Europe (1.00)
Asia (1.00)
North America > United States > California (0.92)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (1.00)

Industry:

Law Enforcement & Public Safety > Terrorism (1.00)
Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)

Add feedback

New Tests Reveal AI's Capacity for Deception

TIME - TechDec-15-2024, 17:56:19 GMT

The myth of King Midas is about a man who wishes for everything he touches to turn to gold. This does not go well: Midas finds himself unable to eat or drink, with even his loved ones transmuted. The myth is sometimes invoked to illustrate the challenge of ensuring AI systems do what we want, particularly as they grow more powerful. As Stuart Russell--who coauthored AI's standard textbook--tells TIME over email, the concern is that "what seem to be reasonable goals, such as fixing climate change, lead to catastrophic consequences, such as eliminating the human race as a way to fix climate change." On Dec. 5, a paper released by AI safety nonprofit Apollo Research found that in certain contrived scenarios, today's cutting-edge AI systems, including OpenAI's o1 and Anthropic's Claude 3.5 Sonnet, can engage in deceptive behavior in pursuit of their goals--providing empirical evidence to support a concern that to date has been largely theoretical.

large language model, machine learning, natural language, (19 more...)

TIME - Tech

Genre: Research Report > New Finding (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback