Goto

Collaborating Authors

 constitution


The Only Thing Standing Between Humanity and AI Apocalypse Is … Claude?

WIRED

The Only Thing Standing Between Humanity and AI Apocalypse Is Claude? As AI systems grow more powerful, Anthropic's resident philosopher says the startup is betting Claude itself can learn the wisdom needed to avoid disaster. Anthropic is locked in a paradox: Among the top AI companies, it's the most obsessed with safety and leads the pack in researching how models can go wrong. But even though the safety issues it has identified are far from resolved, Anthropic is pushing just as aggressively as its rivals toward the next, potentially more dangerous, level of artificial intelligence. Its core mission is figuring out how to resolve that contradiction. Last month, Anthropic released two documents that both acknowledged the risks associated with the path it's on and hinted at a route it could take to escape the paradox.


From 'nerdy' Gemini to 'edgy' Grok: how developers are shaping AI behaviours

The Guardian

Which chatbot we choose could become an extension and reflection of our personalities, like the clothes we wear or car we drive. Which chatbot we choose could become an extension and reflection of our personalities, like the clothes we wear or car we drive. From'nerdy' Gemini to'edgy' Grok: how developers are shaping AI behaviours Do you want an AI assistant that gushes about how it "loves humanity" or one that spews sarcasm? How about a political propagandist ready to lie? If so, ChatGPT, Grok and Qwen are at your disposal. Companies that create AI assistants, from the US to China, are increasingly wrestling with how to mould their characters, and it is no abstract debate.


How Do You Teach an AI to Be Good? Anthropic Just Published Its Answer

TIME - Tech

How Do You Teach an AI to Be Good? A person holds a smartphone displaying the logo of "Claude," an AI language model by Anthropic A person holds a smartphone displaying the logo of "Claude," an AI language model by Anthropic Cheng Xin/Getty Images Getting AI models to behave used to be a thorny mathematical problem. These days, it looks a bit more like raising a child. That, at least, is according to Amanda Askell --a trained philosopher whose unique role within Anthropic is crafting the personality of Claude, the AI firm's rival to ChatGPT. "Imagine you suddenly realize that your six-year-old child is a kind of genius," Askell says.


5 new quarters commemorate 250 years of American independence

Popular Science

The new designs honor the Constitution, Civil War, and more. Breakthroughs, discoveries, and DIY tips sent every weekday. While we've said goodbye to both the year 2025 and the penny, five new United States quarters will be finding their way into your pocket soon enough. The designs of each new quarter will honor the country's 250th anniversary (aka its semiquincentennial). According to a press release from the U.S. Mint, the coins "commemorate 250 years of American Liberty by reflecting our country's founding principles and honoring our Nation's history."


Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels

Neural Information Processing Systems

When prompting a language model (LM), users often expect the model to adhere to a set of behavioral principles across diverse tasks, such as producing insightful content while avoiding harmful or biased language. Instilling such principles (i.e., a constitution) into a model is resource-intensive, technically challenging, and generally requires human preference labels or examples. We introduce SAMI, an iterative algorithm that finetunes a pretrained language model (without requiring preference labels or demonstrations) to increase the conditional mutual information between constitutions and self-generated responses given queries from a dataset. On single-turn dialogue and summarization, a SAMI-trained mistral-7b outperforms the initial pretrained model, with win rates between 66% and 77%.


MORNING GLORY: A President Donald Trump-branded energy drink?

FOX News

This material may not be published, broadcast, rewritten, or redistributed. Quotes displayed in real-time or delayed by at least 15 minutes. Market data provided by Factset . Powered and implemented by FactSet Digital Solutions . Mutual Fund and ETF data provided by Refinitiv Lipper .


AI-Generated Compromises for Coalition Formation: Modeling, Simulation, and a Textual Case Study

Briman, Eyal, Shapiro, Ehud, Talmon, Nimrod

arXiv.org Artificial Intelligence

The challenge of finding compromises between agent proposals is fundamental to AI sub-fields such as argumentation, mediation, and negotiation. Building on this tradition, Elkind et al. (2021) introduced a process for coalition formation that seeks majority-supported proposals preferable to the status quo, using a metric space where each agent has an ideal point. The crucial step in this iterative process involves identifying compromise proposals around which agent coalitions can unite. How to effectively find such compromise proposals, however, remains an open question. We address this gap by formalizing a holistic model that encompasses agent bounded rationality and uncertainty and developing AI models to generate such compromise proposals. We focus on the domain of collaboratively writing text documents -- e.g., to enable the democratic creation of a community constitution. We apply NLP (Natural Language Processing) techniques and utilize LLMs (Large Language Models) to create a semantic metric space for text and develop algorithms to suggest suitable compromise points. To evaluate the effectiveness of our algorithms, we simulate various coalition formation processes and demonstrate the potential of AI to facilitate large-scale democratic text editing, such as collaboratively drafting a constitution, an area where traditional tools are limited.


JBE-QA: Japanese Bar Exam QA Dataset for Assessing Legal Domain Knowledge

Cao, Zhihan, Nishino, Fumihito, Yamada, Hiroaki, Thanh, Nguyen Ha, Miyao, Yusuke, Satoh, Ken

arXiv.org Artificial Intelligence

We introduce JBE-QA, a Japanese Bar Exam Question-Answering dataset to evaluate large language models' legal knowledge. Derived from the multiple-choice (tanto-shiki) section of the Japanese bar exam (2015-2024), JBE-QA provides the first comprehensive benchmark for Japanese legal-domain evaluation of LLMs. It covers the Civil Code, the Penal Code, and the Constitution, extending beyond the Civil Code focus of prior Japanese resources. Each question is decomposed into independent true/false judgments with structured contextual fields. The dataset contains 3,464 items with balanced labels. We evaluate 26 LLMs, including proprietary, open-weight, Japanese-specialised, and reasoning models. Our results show that proprietary models with reasoning enabled perform best, and the Constitution questions are generally easier than the Civil Code or the Penal Code questions.



Latent Principle Discovery for Language Model Self-Improvement

Ramji, Keshav, Naseem, Tahira, Astudillo, Ramón Fernandez

arXiv.org Artificial Intelligence

When language model (LM) users aim to improve the quality of its generations, it is crucial to specify concrete behavioral attributes that the model should strive to reflect. However, curating such principles across many domains, even non-exhaustively, requires a labor-intensive annotation process. To automate this process, we propose eliciting these latent attributes that guide model reasoning toward human-preferred responses by explicitly modeling them in a self-correction setting. Our approach mines new principles from the LM itself and compresses the discovered elements to an interpretable set via clustering. Specifically, we employ a form of posterior-regularized Monte Carlo Expectation-Maximization to both identify a condensed set of the most effective latent principles and teach the LM to strategically invoke them in order to intrinsically refine its responses. We demonstrate that bootstrapping our algorithm over multiple iterations enables smaller language models (7-8B parameters) to self-improve, achieving +8-10% in AlpacaEval win-rate, an average of +0.3 on MT-Bench, and +19-23% in principle-following win-rate on IFEval. We also show that clustering the principles yields interpretable and diverse model-generated constitutions while retaining model performance. The gains that our method achieves highlight the potential of automated, principle-driven post-training recipes toward continual self-improvement.