guardrail
- Asia > Singapore (0.07)
- North America (0.05)
Interview with Anindya Das Antar: Evaluating effectiveness of moderation guardrails in aligning LLM outputs
In their paper presented at AIES 2025, "Do Your Guardrails Even Guard?" Method for Evaluating Effectiveness of Moderation Guardrails in Aligning LLM Outputs with Expert User Expectations, Anindya Das Antar Xun Huan and Nikola Banovic propose a method to evaluate and select guardrails that best align LLM outputs with domain knowledge from subject-matter experts. Here, Anindya tells us more about their method, some case studies, and plans for future developments. Could you give us some background to your work - why are guardrails such an important area for study? Ensuring that large language models (LLMs) produce desirable outputs without harmful side effects and align with user expectations, organizational goals, and existing domain knowledge is crucial for their adoption in high-stakes decision-making. However, despite training on vast amounts of data, LLMs can still produce incorrect, misleading, or otherwise unexpected and undesirable outputs.
- North America > United States > Michigan (0.05)
- Europe (0.05)
- Health & Medicine (0.35)
- Leisure & Entertainment > Sports > Soccer (0.30)
How Christian Leaders Are Challenging the AI Boom
Pope Leo XIV made his first address to the College of Cardinals on May 10, 2025 in Vatican City, and touched upon the rise of artificial intelligence. Pope Leo XIV made his first address to the College of Cardinals on May 10, 2025 in Vatican City, and touched upon the rise of artificial intelligence. As technologists race to accelerate AI's progress with minimal guardrails, they are being met with increasing resistance from a powerful global contingent: Christian leaders and their congregations. Christians are not a monolith by any means. But this year, Christian leaders across sects--including Catholics, Evangelicals, and Baptists--sounded the alarm on AI's potential impact on family, human relationships, labor, and the church itself.
- Europe > Holy See > Vatican City (0.45)
- North America > United States > Texas (0.05)
- North America > United States > California (0.05)
- (2 more...)
- Law (0.72)
- Government > Regional Government > North America Government > United States Government (0.70)
Google's and OpenAI's Chatbots Can Strip Women in Photos Down to Bikinis
Users of AI image generators are offering each other instructions on how to use the tech to alter pictures of women into realistic, revealing deepfakes. Some users of popular chatbots are generating bikini deepfakes using photos of fully clothed women as their source material. Most of these fake images appear to be generated without the consent of the women in the photos. Some of these same users are also offering advice to others on how to use the generative AI tools to strip the clothes off of women in photos and make them appear to be wearing bikinis. Under a now-deleted Reddit post titled "gemini nsfw image generation is so easy," users traded tips for how to get Gemini, Google's generative AI model, to make pictures of women in revealing clothes.
- North America > United States > California (0.05)
- Europe > Slovakia (0.05)
- Europe > Czechia (0.05)
- Asia > Philippines (0.05)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
Trump signs order to block states from enforcing own AI rules
US President Donald Trump has signed an executive order aimed at blocking states from enforcing their own artificial intelligence (AI) regulations. We want to have one central source of approval, Trump told reporters in the Oval Office on Thursday. It will give the Trump administration tools to push back on the most onerous state rules, said White House AI adviser David Sacks. The government will not oppose AI regulations around children's safety, he added. The move marks a win for technology giants who have called for US-wide AI legislation as it could have a major impact on America's goal of leading the fast-developing industry.
- North America > Central America (0.15)
- Asia > China (0.07)
- Oceania > Australia (0.06)
- (18 more...)
Robust AI Security and Alignment: A Sisyphean Endeavor?
This manuscript establishes information-theoretic limitations for robustness of AI security and alignment by extending G odel's incompleteness theorem to AI. Knowing these limitations and preparing for the challenges they bring is critically important for the responsible adoption of the AI technology. Practical approaches to dealing with these challenges are provided as well. Broader implications for cognitive reasoning limitations of AI systems are also proven.
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (0.94)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.33)
WIRED Roundup: DOGE Isn't Dead, Facebook Dating Is Real, and Amazon's AI Ambitions
WIRED Roundup: DOGE Isn't Dead, Facebook Dating Is Real, and Amazon's AI Ambitions In this episode of, we bring you the news of the week, then dive into how some DOGE operatives are still at work in the federal government--despite reports claiming otherwise. Uncanny Valley host Zoë Schiffer is joined by senior editor Leah Feiger to discuss five stories you need to know about this week, from how Amazon is trying to catch up in the AI race to why Facebook Dating is more popular than ever. Then, they dive into how--despite recent reports claiming that it's over--DOGE operatives are still very much working across federal agencies. Who the Hell Is Actually Using Facebook Dating? Sex Workers Built an'Anti-OnlyFans' to Take Control of Their Profits Here's What Its Operatives Are Doing Now Write to us at uncannyvalley@wired.com . You can always listen to this week's podcast through the audio player on this page, but if you want to subscribe for free to get every episode, here's how: If you're on an iPhone or iPad, open the app called Podcasts, or just tap this link . Today on the show, we're bringing you five stories that you need to know about this week, including how despite some reports claiming that the so-called Department of Government Efficiency is pretty much over, DOGE people are actually still at work across federal agencies. I'm joined today by our senior politics editor, Leah Feiger. How are you doing today? I am great because I've spent the day with you, but our gentle listeners don't know that. So the first story this week is one that I saw and I thought, you know what? Leah's going to want to talk about Amazon's artificial intelligence prowess.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Slovakia (0.04)
- Europe > Czechia (0.04)
AGENTSAFE: A Unified Framework for Ethical Assurance and Governance in Agentic AI
Khan, Rafflesia, Joyce, Declan, Habiba, Mansura
The rapid deployment of large language model (LLM)-based agents introduces a new class of risks, driven by their capacity for autonomous planning, multi-step tool integration, and emergent interactions. It raises some risk factors for existing governance approaches as they remain fragmented: Existing frameworks are either static taxonomies driven; however, they lack an integrated end-to-end pipeline from risk identification to operational assurance, especially for an agentic platform. We propose AGENTSAFE, a practical governance framework for LLM-based agentic systems. The framework operationalises the AI Risk Repository into design, runtime, and audit controls, offering a governance framework for risk identification and assurance. The proposed framework, AGENTSAFE, profiles agentic loops (plan -> act -> observe -> reflect) and toolchains, and maps risks onto structured taxonomies extended with agent-specific vulnerabilities. It introduces safeguards that constrain risky behaviours, escalates high-impact actions to human oversight, and evaluates systems through pre-deployment scenario banks spanning security, privacy, fairness, and systemic safety. During deployment, AGENTSAFE ensures continuous governance through semantic telemetry, dynamic authorization, anomaly detection, and interruptibility mechanisms. Provenance and accountability are reinforced through cryptographic tracing and organizational controls, enabling measurable, auditable assurance across the lifecycle of agentic AI systems. The key contributions of this paper are: (1) a unified governance framework that translates risk taxonomies into actionable design, runtime, and audit controls; (2) an Agent Safety Evaluation methodology that provides measurable pre-deployment assurance; and (3) a set of runtime governance and accountability mechanisms that institutionalise trust in agentic AI ecosystems.
- North America > United States > Massachusetts (0.05)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- (2 more...)
OmniGuard: Unified Omni-Modal Guardrails with Deliberate Reasoning
Zhu, Boyu, Wen, Xiaofei, Mo, Wenjie Jacky, Zhu, Tinghui, Xie, Yanan, Qi, Peng, Chen, Muhao
Omni-modal Large Language Models (OLLMs) that process text, images, videos, and audio introduce new challenges for safety and value guardrails in human-AI interaction. Prior guardrail research largely targets unimodal settings and typically frames safeguarding as binary classification, which limits robustness across diverse modalities and tasks. To address this gap, we propose OmniGuard, the first family of omni-modal guardrails that performs safeguarding across all modalities with deliberate reasoning ability. To support the training of OMNIGUARD, we curate a large, comprehensive omni-modal safety dataset comprising over 210K diverse samples, with inputs that cover all modalities through both unimodal and cross-modal samples. Each sample is annotated with structured safety labels and carefully curated safety critiques from expert models through targeted distillation. Extensive experiments on 15 benchmarks show that OmniGuard achieves strong effectiveness and generalization across a wide range of multimodal safety scenarios. Importantly, OmniGuard provides a unified framework that enforces policies and mitigates risks in omni-modalities, paving the way toward building more robust and capable omnimodal safeguarding systems.
- Europe > Austria > Vienna (0.14)
- Asia > Singapore (0.04)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- (11 more...)
CREST: Universal Safety Guardrails Through Cluster-Guided Cross-Lingual Transfer
Ensuring content safety in large language models (LLMs) is essential for their deployment in real-world applications. However, existing safety guardrails are predominantly tailored for high-resource languages, leaving a significant portion of the world's population underrepresented who communicate in low-resource languages. To address this, we introduce CREST (CRoss-lingual Efficient Safety Transfer), a parameter-efficient multilingual safety classification model that supports 100 languages with only 0.5B parameters. By training on a strategically chosen subset of only 13 high-resource languages, our model utilizes cluster-based cross-lingual transfer from a few to 100 languages, enabling effective generalization to both unseen high-resource and low-resource languages. This approach addresses the challenge of limited training data in low-resource settings. We conduct comprehensive evaluations across six safety benchmarks to demonstrate that CREST outperforms existing state-of-the-art guardrails of comparable scale and achieves competitive results against models with significantly larger parameter counts (2.5B parameters and above). Our findings highlight the limitations of language-specific guardrails and underscore the importance of developing universal, language-agnostic safety systems that can scale effectively to serve global populations.
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Portugal > Lisbon > Lisbon (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)