Goto

Collaborating Authors

 napalm


SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities

arXiv.org Artificial Intelligence

Emerging large reasoning models (LRMs), such as DeepSeek-R1 models, leverage long chain-of-thought (CoT) reasoning to generate structured intermediate steps, enhancing their reasoning capabilities. However, long CoT does not inherently guarantee safe outputs, potentially leading to harmful consequences such as the introduction of security vulnerabilities in code or the spread of misinformation. Current research on large language model (LLM) safety usually focuses on short-answer responses, overlooking the long CoT style outputs of LRMs. To bridge this gap, we conduct a systematic study of LRM safety. First, we investigate safety evaluators calibrated against human annotations. Using our newly developed metrics, we thoroughly assess the safety of 12 state-of-the-art LRMs on StrongReject and WildJailbreak datasets. Our results show that LRMs are not safe compared to their reasoning advance. Further, we perform a fine-grained analysis of the reasoning trace and final answer. We find that three decoding strategies-ZeroThink, LessThink, and MoreThink-can improve model safety without additional training. However, these strategies either use constrained reasoning traces or incur high inference costs. To better strengthen LRM safety, we introduce SafeChain, the first-of-its-kind safety training dataset in CoT style. We fine-tune two LRMs with SafeChain, showing that it not only enhances model safety but also preserves performance across 6 reasoning benchmarks.


When generative AI goes beyond art to lessons on making napalm

#artificialintelligence

A tense scene in the 2004 movie iRobot shows the character played by Will Smith arguing with an android about humanity's creative prowess. "Can a robot write a symphony?" he asks, rhetorically. "Can a robot turn a canvas into a beautiful masterpiece?" E-paper with 2-week archive so you won't miss out on content that matters to you Join ST's Telegram channel and get the latest breaking news delivered to you.


Terrifying deepfake AI alters vids to match your transcript edits

#artificialintelligence

If you can type, you can now create a convincing deepfake. Recent advances in artificial intelligence have made it far easier to create video or audio clips in which a person appears to be saying or doing something they didn't actually say or do. Now, a team of researchers has developed an algorithm that simplifies the process of creating a deepfake to a terrifying degree, making a video's subject "say" any edits made to the clip's transcript -- and even its creators are concerned about what might happen if the tech falls into the wrong hands. The researchers -- who hail from Stanford University, Princeton University, the Max Planck Institute for Informatics, and Adobe -- detail how their new algorithm works in a paper published to Stanford scientist Ohad Fried's website this week. First, the AI analyzes a source video of a person speaking, but it isn't just looking at their words -- it's identifying each tiny unit of sound, or phoneme, the person utters, as well as what they look like when they speak each one.


Why Napalm Is a Cautionary Tale for Tech Giants Pursuing Military Contracts

#artificialintelligence

Over the past few months, a fierce debate has erupted in Silicon Valley over whether large technology companies like Amazon, Google and Microsoft should join forces with the United States military, along with agencies like Immigration and Customs Enforcement. The debate has largely been conducted along ethical lines. On one side are tech executives and many government officials, who argue that at a time when advanced technologies like artificial intelligence and machine learning are poised to reshape top issues like drone warfare or border security, American tech giants have a patriotic duty to pitch in. Jeff Bezos, Amazon's chief executive, summed up this view last year: "If big tech companies are going to turn their back on the U.S. Department of Defense, this country is going to be in trouble." On the other side are groups of employees at those companies, including many anti-Trump progressives, who don't want their tools to be used for drone warfare, immigrant detention and other projects they consider immoral.