Goto

Collaborating Authors

 safer


SAFER: Risk-Constrained Sample-then-Filter in Large Language Models

arXiv.org Artificial Intelligence

As large language models (LLMs) are increasingly deployed in risk-sensitive applications such as real-world open-ended question answering (QA), ensuring the trustworthiness of their outputs has become critical. Existing selective conformal prediction (SCP) methods provide statistical guarantees by constructing prediction sets with a constrained miscoverage rate for correct answers. However, prior works unrealistically assume that admissible answers for all instances can be obtained via finite sampling, even for open-ended QA scenarios that lack a fixed and finite solution space. To address this, we introduce a two-stage risk control framework comprising abstention-aware sampling and conformalized filtering (SAFER). Firstly, on a held-out calibration set, SAFER calibrates a sampling budget within the maximum sampling cap, using the Clopper-Pearson exact method at a user-desired risk level (i.e., the maximum allowable miscoverage rate of the sampling sets). If the risk level cannot be satisfied within the cap, we abstain; otherwise, the calibrated sampling budget becomes the minimum requirements at test time. Then, we employ calibration instances where correct answers are attainable under the calibrated budget and apply the conformal risk control method to determine a statistically valid uncertainty threshold, which filters unreliable distractors from the candidate set for each test data point. In this stage, SAFER introduces an additional risk level to guide the calculation of the threshold, thereby controlling the risk of correct answers being excluded. Furthermore, we show that SAFER is compatible with various task-specific admission criteria and calibration-test split ratios, highlighting its robustness and high data efficiency.


Instagram Is Introducing New Restrictions for Teen Users. Here's What to Know

TIME - Tech

Instagram Is Introducing New Restrictions for Teen Users. In this photo illustration a 13-year-old boy looks at an iPhone screen display on May 21, 2025 in Bath, England. In this photo illustration a 13-year-old boy looks at an iPhone screen display on May 21, 2025 in Bath, England. Instagram announced new restrictions for teen accounts on Tuesday amid mounting controversy over safety guidelines for younger users on the social media platform. The photo-sharing app will soon limit content for teens using guidelines similar to those in the film industry for PG-13-rated movies.


SAFER: Advancing Safety Alignment via Efficient Ex-Ante Reasoning

arXiv.org Artificial Intelligence

Recent advancements in large language models (LLMs) have accelerated progress toward artificial general intelligence, yet their potential to generate harmful content poses critical safety challenges. Existing alignment methods often struggle to cover diverse safety scenarios and remain vulnerable to adversarial attacks. In this work, we propose SAFER, a framework for Safety Alignment via eFficient Ex-Ante Reasoning. Our approach instantiates structured Ex-Ante reasoning through initial assessment, rule verification, and path calibration, and embeds predefined safety rules to provide transparent and verifiable safety judgments. Specifically, our approach consists of two training stages: (1) supervised fine-tuning with synthetic traces to teach the multi-stage Ex-Ante reasoning, and (2) step-level reasoning preference optimization to jointly enhance safety, utility, and efficiency. Experiments on multiple open-source LLMs demonstrate that SAFER significantly enhances safety performance while maintaining helpfulness and response efficiency.


OpenAI Designed GPT-5 to Be Safer. It Still Outputs Gay Slurs

WIRED

OpenAI is trying to make its chatbot less annoying with the release of GPT-5. And I'm not talking about adjustments to its synthetic personality that many users have complained about. Before GPT-5, if the AI tool determined it couldn't answer your prompt because the request violated OpenAI's content guidelines, it would hit you with a curt, canned apology. Now, ChatGPT is adding more explanations. OpenAI's general model spec lays out what is and isn't allowed to be generated.


SAFER: A Calibrated Risk-Aware Multimodal Recommendation Model for Dynamic Treatment Regimes

arXiv.org Machine Learning

Dynamic treatment regimes (DTRs) are critical to precision medicine, optimizing long-term outcomes through personalized, real-time decision-making in evolving clinical contexts, but require careful supervision for unsafe treatment risks. Existing efforts rely primarily on clinician-prescribed gold standards despite the absence of a known optimal strategy, and predominantly using structured EHR data without extracting valuable insights from clinical notes, limiting their reliability for treatment recommendations. In this work, we introduce SAFER, a calibrated risk-aware tabular-language recommendation framework for DTR that integrates both structured EHR and clinical notes, enabling them to learn from each other, and addresses inherent label uncertainty by assuming ambiguous optimal treatment solution for deceased patients. Moreover, SAFER employs conformal prediction to provide statistical guarantees, ensuring safe treatment recommendations while filtering out uncertain predictions. Experiments on two publicly available sepsis datasets demonstrate that SAFER outperforms state-of-the-art baselines across multiple recommendation metrics and counterfactual mortality rate, while offering robust formal assurances. These findings underscore SAFER potential as a trustworthy and theoretically grounded solution for high-stakes DTR applications.


Safety Aware Task Planning via Large Language Models in Robotics

arXiv.org Artificial Intelligence

The integration of large language models (LLMs) into robotic task planning has unlocked better reasoning capabilities for complex, long-horizon workflows. However, ensuring safety in LLM-driven plans remains a critical challenge, as these models often prioritize task completion over risk mitigation. This paper introduces SAFER (Safety-Aware Framework for Execution in Robotics), a multi-LLM framework designed to embed safety awareness into robotic task planning. SAFER employs a Safety Agent that operates alongside the primary task planner, providing safety feedback. Additionally, we introduce LLM-as-a-Judge, a novel metric leveraging LLMs as evaluators to quantify safety violations within generated task plans. Our framework integrates safety feedback at multiple stages of execution, enabling real-time risk assessment, proactive error correction, and transparent safety evaluation. We also integrate a control framework using Control Barrier Functions (CBFs) to ensure safety guarantees within SAFER's task planning. We evaluated SAFER against state-of-the-art LLM planners on complex long-horizon tasks involving heterogeneous robotic agents, demonstrating its effectiveness in reducing safety violations while maintaining task efficiency. We also verify the task planner and safety planner through actual hardware experiments involving multiple robots and a human.


o3-mini vs DeepSeek-R1: Which One is Safer?

arXiv.org Artificial Intelligence

The irruption of DeepSeek-R1 constitutes a turning point for the AI industry in general and the LLMs in particular. Its capabilities have demonstrated outstanding performance in several tasks, including creative thinking, code generation, maths and automated program repair, at apparently lower execution cost. However, LLMs must adhere to an important qualitative property, i.e., their alignment with safety and human values. A clear competitor of DeepSeek-R1 is its American counterpart, OpenAI's o3-mini model, which is expected to set high standards in terms of performance, safety and cost. In this technical report, we systematically assess the safety level of both DeepSeek-R1 (70b version) and OpenAI's o3-mini (beta version). To this end, we make use of our recently released automated safety testing tool, named ASTRAL. By leveraging this tool, we automatically and systematically generated and executed 1,260 test inputs on both models. After conducting a semi-automated assessment of the outcomes provided by both LLMs, the results indicate that DeepSeek-R1 produces significantly more unsafe responses (12%) than OpenAI's o3-mini (1.2%).


Few-shot Knowledge Graph Relational Reasoning via Subgraph Adaptation

arXiv.org Artificial Intelligence

Few-shot Knowledge Graph (KG) Relational Reasoning aims to predict unseen triplets (i.e., query triplets) for rare relations in KGs, given only several triplets of these relations as references (i.e., support triplets). This task has gained significant traction due to the widespread use of knowledge graphs in various natural language processing applications. Previous approaches have utilized meta-training methods and manually constructed meta-relation sets to tackle this task. Recent efforts have focused on edge-mask-based methods, which exploit the structure of the contextualized graphs of target triplets (i.e., a subgraph containing relevant triplets in the KG). However, existing edge-mask-based methods have limitations in extracting insufficient information from KG and are highly influenced by spurious information in KG. To overcome these challenges, we propose SAFER (Subgraph Adaptation for Few-shot Relational Reasoning), a novel approach that effectively adapts the information in contextualized graphs to various subgraphs generated from support and query triplets to perform the prediction. Specifically, SAFER enables the extraction of more comprehensive information from support triplets while minimizing the impact of spurious information when predicting query triplets. Experimental results on three prevalent datasets demonstrate the superiority of our proposed framework SAFER.


SAFER: Safe Collision Avoidance using Focused and Efficient Trajectory Search with Reinforcement Learning

arXiv.org Artificial Intelligence

Our collision avoidance system SAFER takes input from Mobile robots are slowly but surely taking a place in our lidar and ultrasonic sensor scans, wheel odometry for robot everyday lives and work environments with various applications: state, and the upstream control commands. We fuse the vacuum cleaning, video recording, companionship, lidar and ultrasonic sensor scans to detect a diverse set security, tele-presence, etc. Whether they are autonomous of obstacles, including transparent glass, reflective surfaces, agents or controlled by human operators, collision avoidance furniture, humans, etc. We design a reward function for is key for mobile agents to operate safely, and effectively our RL agent with two terms. The first term encourages in the real world. There are numerous approaches to collision the reduction of AEB activation. The second term improves avoidance, including search-based planning methods, collision avoidance metrics through a cost function, such as trajectory optimization, learning-based methods, and emergency average speed, distance to obstacles, and matching human intervention systems.


Why Halt AI Research When We Already Know How To Make It Safer

WIRED

Last week, the Future of Life Institute published an open letter proposing a six-month moratorium on the "dangerous" AI race. It has since been signed by over 3,000 people, including some influential members of the AI community. But while it is good that the risks of AI systems are gathering visibility within the community and across society, both the issues described and the actions proposed in the letter are unrealistic and unnecessary. The call for a pause on AI work is not only vague, but also unfeasible. While the training of large language models by for-profit companies gets most of the attention, it is far from the only type of AI work taking place.