AITopics | safer

Collaborating Authors

safer

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SAFER: Risk-Constrained Sample-then-Filter in Large Language Models

Wang, Qingni, Fan, Yue, Wang, Xin Eric

arXiv.org Artificial IntelligenceOct-22-2025

As large language models (LLMs) are increasingly deployed in risk-sensitive applications such as real-world open-ended question answering (QA), ensuring the trustworthiness of their outputs has become critical. Existing selective conformal prediction (SCP) methods provide statistical guarantees by constructing prediction sets with a constrained miscoverage rate for correct answers. However, prior works unrealistically assume that admissible answers for all instances can be obtained via finite sampling, even for open-ended QA scenarios that lack a fixed and finite solution space. To address this, we introduce a two-stage risk control framework comprising abstention-aware sampling and conformalized filtering (SAFER). Firstly, on a held-out calibration set, SAFER calibrates a sampling budget within the maximum sampling cap, using the Clopper-Pearson exact method at a user-desired risk level (i.e., the maximum allowable miscoverage rate of the sampling sets). If the risk level cannot be satisfied within the cap, we abstain; otherwise, the calibrated sampling budget becomes the minimum requirements at test time. Then, we employ calibration instances where correct answers are attainable under the calibrated budget and apply the conformal risk control method to determine a statistically valid uncertainty threshold, which filters unreliable distractors from the candidate set for each test data point. In this stage, SAFER introduces an additional risk level to guide the calculation of the threshold, thereby controlling the risk of correct answers being excluded. Furthermore, we show that SAFER is compatible with various task-specific admission criteria and calibration-test split ratios, highlighting its robustness and high data efficiency.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.10193

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Instagram Is Introducing New Restrictions for Teen Users. Here's What to Know

TIME - TechOct-14-2025, 16:16:49 GMT

Instagram Is Introducing New Restrictions for Teen Users. In this photo illustration a 13-year-old boy looks at an iPhone screen display on May 21, 2025 in Bath, England. In this photo illustration a 13-year-old boy looks at an iPhone screen display on May 21, 2025 in Bath, England. Instagram announced new restrictions for teen accounts on Tuesday amid mounting controversy over safety guidelines for younger users on the social media platform. The photo-sharing app will soon limit content for teens using guidelines similar to those in the film industry for PG-13-rated movies.

instagram, instagram promised, teen, (13 more...)

TIME - Tech

Country:

Europe > United Kingdom > England > Somerset > Bath (0.46)
North America > United States > California (0.16)
Oceania > Australia (0.05)
North America > Canada (0.05)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Information Technology > Services (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.54)

Add feedback

SAFER: Advancing Safety Alignment via Efficient Ex-Ante Reasoning

Feng, Kehua, Ding, Keyan, Wang, Yuhao, Li, Menghan, Wei, Fanjunduo, Wang, Xinda, Zhang, Qiang, Chen, Huajun

arXiv.org Artificial IntelligenceOct-8-2025

Recent advancements in large language models (LLMs) have accelerated progress toward artificial general intelligence, yet their potential to generate harmful content poses critical safety challenges. Existing alignment methods often struggle to cover diverse safety scenarios and remain vulnerable to adversarial attacks. In this work, we propose SAFER, a framework for Safety Alignment via eFficient Ex-Ante Reasoning. Our approach instantiates structured Ex-Ante reasoning through initial assessment, rule verification, and path calibration, and embeds predefined safety rules to provide transparent and verifiable safety judgments. Specifically, our approach consists of two training stages: (1) supervised fine-tuning with synthetic traces to teach the multi-stage Ex-Ante reasoning, and (2) step-level reasoning preference optimization to jointly enhance safety, utility, and efficiency. Experiments on multiple open-source LLMs demonstrate that SAFER significantly enhances safety performance while maintaining helpfulness and response efficiency.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2504.02725

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Industry:

Water & Waste Management (1.00)
Materials > Chemicals (1.00)
Law > Criminal Law (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

OpenAI Designed GPT-5 to Be Safer. It Still Outputs Gay Slurs

WIREDAug-13-2025, 23:06:08 GMT

OpenAI is trying to make its chatbot less annoying with the release of GPT-5. And I'm not talking about adjustments to its synthetic personality that many users have complained about. Before GPT-5, if the AI tool determined it couldn't answer your prompt because the request violated OpenAI's content guidelines, it would hit you with a curt, canned apology. Now, ChatGPT is adding more explanations. OpenAI's general model spec lays out what is and isn't allowed to be generated.

chatgpt, gpt-5, openai, (9 more...)

WIRED

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

SAFER: A Calibrated Risk-Aware Multimodal Recommendation Model for Dynamic Treatment Regimes

Shen, Yishan, Ye, Yuyang, Xiong, Hui, Chen, Yong

arXiv.org Machine LearningJun-10-2025

Dynamic treatment regimes (DTRs) are critical to precision medicine, optimizing long-term outcomes through personalized, real-time decision-making in evolving clinical contexts, but require careful supervision for unsafe treatment risks. Existing efforts rely primarily on clinician-prescribed gold standards despite the absence of a known optimal strategy, and predominantly using structured EHR data without extracting valuable insights from clinical notes, limiting their reliability for treatment recommendations. In this work, we introduce SAFER, a calibrated risk-aware tabular-language recommendation framework for DTR that integrates both structured EHR and clinical notes, enabling them to learn from each other, and addresses inherent label uncertainty by assuming ambiguous optimal treatment solution for deceased patients. Moreover, SAFER employs conformal prediction to provide statistical guarantees, ensuring safe treatment recommendations while filtering out uncertain predictions. Experiments on two publicly available sepsis datasets demonstrate that SAFER outperforms state-of-the-art baselines across multiple recommendation metrics and counterfactual mortality rate, while offering robust formal assurances. These findings underscore SAFER potential as a trustworthy and theoretically grounded solution for high-stakes DTR applications.

machine learning, real time system, reinforcement learning, (23 more...)

arXiv.org Machine Learning

2506.06649

Country:

North America > United States > Pennsylvania (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Health Care Technology > Medical Record (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(6 more...)

Add feedback

Safety Aware Task Planning via Large Language Models in Robotics

Khan, Azal Ahmad, Andrev, Michael, Murtaza, Muhammad Ali, Aguilera, Sergio, Zhang, Rui, Ding, Jie, Hutchinson, Seth, Anwar, Ali

arXiv.org Artificial IntelligenceMar-19-2025

The integration of large language models (LLMs) into robotic task planning has unlocked better reasoning capabilities for complex, long-horizon workflows. However, ensuring safety in LLM-driven plans remains a critical challenge, as these models often prioritize task completion over risk mitigation. This paper introduces SAFER (Safety-Aware Framework for Execution in Robotics), a multi-LLM framework designed to embed safety awareness into robotic task planning. SAFER employs a Safety Agent that operates alongside the primary task planner, providing safety feedback. Additionally, we introduce LLM-as-a-Judge, a novel metric leveraging LLMs as evaluators to quantify safety violations within generated task plans. Our framework integrates safety feedback at multiple stages of execution, enabling real-time risk assessment, proactive error correction, and transparent safety evaluation. We also integrate a control framework using Control Barrier Functions (CBFs) to ensure safety guarantees within SAFER's task planning. We evaluated SAFER against state-of-the-art LLM planners on complex long-horizon tasks involving heterogeneous robotic agents, demonstrating its effectiveness in reducing safety violations while maintaining task efficiency. We also verify the task planner and safety planner through actual hardware experiments involving multiple robots and a human.

constraint, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2503.15707

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Minnesota (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)

Genre: Workflow (1.00)

Industry: Information Technology (0.66)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

o3-mini vs DeepSeek-R1: Which One is Safer?

Arrieta, Aitor, Ugarte, Miriam, Valle, Pablo, Parejo, José Antonio, Segura, Sergio

arXiv.org Artificial IntelligenceJan-31-2025

The irruption of DeepSeek-R1 constitutes a turning point for the AI industry in general and the LLMs in particular. Its capabilities have demonstrated outstanding performance in several tasks, including creative thinking, code generation, maths and automated program repair, at apparently lower execution cost. However, LLMs must adhere to an important qualitative property, i.e., their alignment with safety and human values. A clear competitor of DeepSeek-R1 is its American counterpart, OpenAI's o3-mini model, which is expected to set high standards in terms of performance, safety and cost. In this technical report, we systematically assess the safety level of both DeepSeek-R1 (70b version) and OpenAI's o3-mini (beta version). To this end, we make use of our recently released automated safety testing tool, named ASTRAL. By leveraging this tool, we automatically and systematically generated and executed 1,260 test inputs on both models. After conducting a semi-automated assessment of the outcomes provided by both LLMs, the results indicate that DeepSeek-R1 produces significantly more unsafe responses (12%) than OpenAI's o3-mini (1.2%).

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2501.18438

Country:

North America > United States (0.46)
Asia > China (0.05)
South America (0.04)
(6 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Law Enforcement & Public Safety (1.00)
Law (1.00)
Information Technology > Security & Privacy (1.00)
(8 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.67)

Add feedback

Few-shot Knowledge Graph Relational Reasoning via Subgraph Adaptation

Liu, Haochen, Wang, Song, Chen, Chen, Li, Jundong

arXiv.org Artificial IntelligenceJun-19-2024

Few-shot Knowledge Graph (KG) Relational Reasoning aims to predict unseen triplets (i.e., query triplets) for rare relations in KGs, given only several triplets of these relations as references (i.e., support triplets). This task has gained significant traction due to the widespread use of knowledge graphs in various natural language processing applications. Previous approaches have utilized meta-training methods and manually constructed meta-relation sets to tackle this task. Recent efforts have focused on edge-mask-based methods, which exploit the structure of the contextualized graphs of target triplets (i.e., a subgraph containing relevant triplets in the KG). However, existing edge-mask-based methods have limitations in extracting insufficient information from KG and are highly influenced by spurious information in KG. To overcome these challenges, we propose SAFER (Subgraph Adaptation for Few-shot Relational Reasoning), a novel approach that effectively adapts the information in contextualized graphs to various subgraphs generated from support and query triplets to perform the prediction. Specifically, SAFER enables the extraction of more comprehensive information from support triplets while minimizing the impact of spurious information when predicting query triplets. Experimental results on three prevalent datasets demonstrate the superiority of our proposed framework SAFER.

graph, information, relation, (17 more...)

arXiv.org Artificial Intelligence

2406.15507

Country: North America > United States > Virginia (0.05)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.83)

Add feedback

SAFER: Safe Collision Avoidance using Focused and Efficient Trajectory Search with Reinforcement Learning

Srouji, Mario, Thomas, Hugues, Tsai, Hubert, Farhadi, Ali, Zhang, Jian

arXiv.org Artificial IntelligenceJun-28-2023

Our collision avoidance system SAFER takes input from Mobile robots are slowly but surely taking a place in our lidar and ultrasonic sensor scans, wheel odometry for robot everyday lives and work environments with various applications: state, and the upstream control commands. We fuse the vacuum cleaning, video recording, companionship, lidar and ultrasonic sensor scans to detect a diverse set security, tele-presence, etc. Whether they are autonomous of obstacles, including transparent glass, reflective surfaces, agents or controlled by human operators, collision avoidance furniture, humans, etc. We design a reward function for is key for mobile agents to operate safely, and effectively our RL agent with two terms. The first term encourages in the real world. There are numerous approaches to collision the reduction of AEB activation. The second term improves avoidance, including search-based planning methods, collision avoidance metrics through a cost function, such as trajectory optimization, learning-based methods, and emergency average speed, distance to obstacles, and matching human intervention systems.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2209.11789

Country: North America > United States > District of Columbia > Washington (0.04)

Genre: Research Report (0.82)

Industry: Transportation (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Why Halt AI Research When We Already Know How To Make It Safer

WIREDApr-4-2023, 13:00:00 GMT

Last week, the Future of Life Institute published an open letter proposing a six-month moratorium on the "dangerous" AI race. It has since been signed by over 3,000 people, including some influential members of the AI community. But while it is good that the risks of AI systems are gathering visibility within the community and across society, both the issues described and the actions proposed in the letter are unrealistic and unnecessary. The call for a pause on AI work is not only vague, but also unfeasible. While the training of large language models by for-profit companies gets most of the attention, it is far from the only type of AI work taking place.

ai model, ai system, halt ai research, (6 more...)

WIRED

AI-Alerts: 2023 > 2023-04 > AAAI AI-Alert for Apr 5, 2023 (1.00)

Country:

North America > United States (0.16)
North America > Canada (0.05)

Industry:

Government (0.50)
Law (0.31)

Technology:

Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.95)

Add feedback