standard mode
Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving
Autonomous vehicles hold great promise for reducing traffic fatalities and improving transportation efficiency, yet their widespread adoption hinges on embedding credible and transparent ethical reasoning into routine and emergency maneuvers, particularly to protect vulnerable road users (VRUs) such as pedestrians and cyclists. Here, we present a hierarchical Safe Reinforcement Learning (Safe RL) framework that augments standard driving objectives with ethics-aware cost signals. At the decision level, a Safe RL agent is trained using a composite ethical risk cost, combining collision probability and harm severity, to generate high-level motion targets. A dynamic, risk-sensitive Prioritized Experience Replay mechanism amplifies learning from rare but critical, high-risk events. At the execution level, polynomial path planning coupled with Proportional-Integral-Derivative (PID) and Stanley controllers translates these targets into smooth, feasible trajectories, ensuring both accuracy and comfort. We train and validate our approach on closed-loop simulation environments derived from large-scale, real-world traffic datasets encompassing diverse vehicles, cyclists, and pedestrians, and demonstrate that it outperforms baseline methods in reducing risk to others while maintaining ego performance and comfort. This work provides a reproducible benchmark for Safe RL with explicitly ethics-aware objectives in human-mixed traffic scenarios. Our results highlight the potential of combining formal control theory and data-driven learning to advance ethically accountable autonomy that explicitly protects those most at risk in urban traffic environments. Across two interactive benchmarks and five random seeds, our policy decreases conflict frequency by 25-45% compared to matched task successes while maintaining comfort metrics within 5%.
- Europe > Germany > Saxony > Leipzig (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany > Saxony > Dresden (0.04)
- Transportation > Ground > Road (1.00)
- Information Technology (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
MCPVerse: An Expansive, Real-World Benchmark for Agentic Tool Use
Lei, Fei, Yang, Yibo, Sun, Wenxiu, Lin, Dahua
Large Language Models (LLMs) are evolving from text generators into reasoning agents. This transition makes their ability to use external tools a critical capability. However, evaluating this skill presents a significant challenge. Existing benchmarks are often limited by their reliance on synthetic tools and severely constrained action spaces. To address these limitations, we introduce MCPV erse, an expansive, real-world benchmark for evaluating agentic tool use. MCPV erse integrates more than 550 real-world, executable tools to create an unprecedented action space exceeding 147k tokens, and employs outcome-based evaluation with real-time ground truth for time-sensitive tasks. We benchmarked the state-of-the-art LLMs across three modes (Oracle, Standard, and Max-Scale), revealing that while most models suffer performance degradation when confronted with larger tool sets, the agentic models, such as Claude-4-Sonnet, can effectively leverage expanded tool spaces to improve accuracy. This finding not only exposes the limitations of state-of-the-art models in complex, real-world scenarios but also establishes MCPV erse as a critical benchmark for measuring and advancing agentic tool use capabilities. The ability of Large Language Models (LLMs) to interact with external tools, typically through function calling, is fundamental to their application in real-world scenarios. This capability allows them to access live data, execute code, and operate other systems, thereby moving beyond their static knowledge. Despite its importance, the evaluation of tool use is hampered by two shortcomings in current benchmarks.
- Europe > Austria > Vienna (0.14)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
Clinical Semantic Intelligence (CSI): Emulating the Cognitive Framework of the Expert Clinician for Comprehensive Oral Disease Diagnosis
Mashayekhi, Mohammad, Majd, Sara Ahmadi, AmirAmjadi, Arian, Hosseini, Parsa
The diagnosis of oral diseases presents a problematic clinical challenge, characterized by a wide spectrum of pathologies with overlapping symptomatology. To address this, we developed Clinical Semantic Intelligence (CSI), a novel artificial intelligence framework that diagnoses 118 different oral diseases by computationally modeling the cognitive processes of an expert clinician. Our core hypothesis is that moving beyond simple pattern matching to emulate expert reasoning is critical to building clinically useful diagnostic aids. CSI's architecture integrates a fine-tuned multimodal CLIP model with a specialized ChatGLM-6B language model. This system executes a Hierarchical Diagnostic Reasoning Tree (HDRT), a structured framework that distills the systematic, multi-step logic of differential diagnosis. The framework operates in two modes: a Fast Mode for rapid screening and a Standard Mode that leverages the full HDRT for an interactive and in-depth diagnostic workup. To train and validate our system, we curated a primary dataset of 4,310 images, supplemented by an external hold-out set of 176 images for final validation. A clinically-informed augmentation strategy expanded our training data to over 30,000 image-text pairs. On a 431-image internal test set, CSI's Fast Mode achieved an accuracy of 73.4%, which increased to 89.5% with the HDRT-driven Standard Mode. The performance gain is directly attributable to the hierarchical reasoning process. Herein, we detail the architectural philosophy, development, and rigorous evaluation of the CSI framework.
- Asia > Middle East > Iran > Tehran Province > Tehran (0.05)
- North America > United States > Oklahoma > Payne County > Cushing (0.04)
- Europe > Germany (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.88)
Scaling Laws for Linear Complexity Language Models
Shen, Xuyang, Li, Dong, Leng, Ruitao, Qin, Zhen, Sun, Weigao, Zhong, Yiran
The interest in linear complexity models for large language models is on the rise, although their scaling capacity remains uncertain. In this study, we present the scaling laws for linear complexity language models to establish a foundation for their scalability. Specifically, we examine the scaling behaviors of three efficient linear architectures. These include TNL, a linear attention model with data-independent decay; HGRN2, a linear RNN with data-dependent decay; and cosFormer2, a linear attention model without decay. We also include LLaMA as a baseline architecture for softmax attention for comparison. These models were trained with six variants, ranging from 70M to 7B parameters on a 300B-token corpus, and evaluated with a total of 1,376 intermediate checkpoints on various downstream tasks. These tasks include validation loss, commonsense reasoning, and information retrieval and generation. The study reveals that existing linear complexity language models exhibit similar scaling capabilities as conventional transformer-based models while also demonstrating superior linguistic proficiency and knowledge retention.
- North America > United States (0.14)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.87)
Emotive Non-Anthropomorphic Robots Perceived as More Calming, Friendly, and Attentive for Victim Management
Bethel, Cindy L. (Yale University) | Murphy, Robin R. (Texas A and M University)
This paper describes results from a large-scale, complex human study using non-facial and non-verbal affect for victim management in robot-assisted Urban Search and Rescue Applications. Statistically significant results are presented that indicate participants felt emotive robots were more calming, friendlier, and attentive.
- North America > United States > Texas > Brazos County > College Station (0.15)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.05)
- North America > United States > Connecticut > New Haven County > New Haven (0.05)
- Europe > United Kingdom > England > Hertfordshire > Hatfield (0.05)