AITopics | arXiv.org Artificial Intelligence

Plotting

arXiv.org Artificial Intelligence

Towards Benchmarking and Assessing the Safety and Robustness of Autonomous Driving on Safety-critical Scenarios

Li, Jingzheng, Liu, Xianglong, Wei, Shikui, Chen, Zhijun, Li, Bing, Guo, Qing, Yang, Xianqi, Pu, Yanjun, Wang, Jiakai

arXiv.org Artificial IntelligenceApr-7-2025

Autonomous driving has made significant progress in both academia and industry, including performance improvements in perception task and the development of end-to-end autonomous driving systems. However, the safety and robustness assessment of autonomous driving has not received sufficient attention. Current evaluations of autonomous driving are typically conducted in natural driving scenarios. However, many accidents often occur in edge cases, also known as safety-critical scenarios. These safety-critical scenarios are difficult to collect, and there is currently no clear definition of what constitutes a safety-critical scenario. In this work, we explore the safety and robustness of autonomous driving in safety-critical scenarios. First, we provide a definition of safety-critical scenarios, including static traffic scenarios such as adversarial attack scenarios and natural distribution shifts, as well as dynamic traffic scenarios such as accident scenarios. Then, we develop an autonomous driving safety testing platform to comprehensively evaluate autonomous driving systems, encompassing not only the assessment of perception modules but also system-level evaluations. Our work systematically constructs a safety verification process for autonomous driving, providing technical support for the industry to establish standardized test framework and reduce risks in real-world road deployment.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2503.23708

Country:

Asia > China (0.29)
Europe > Switzerland (0.28)

Genre: Research Report (0.81)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Controlled Latent Diffusion Models for 3D Porous Media Reconstruction

Naiff, Danilo, Schaeffer, Bernardo P., Pires, Gustavo, Stojkovic, Dragan, Rapstine, Thomas, Ramos, Fabio

arXiv.org Artificial IntelligenceApr-7-2025

Three-dimensional digital reconstruction of porous media presents a fundamental challenge in geoscience, requiring simultaneous resolution of fine-scale pore structures while capturing representative elementary volumes. We introduce a computational framework that addresses this challenge through latent diffusion models operating within the EDM framework. Our approach reduces dimensionality via a custom variational autoencoder trained in binary geological volumes, improving efficiency and also enabling the generation of larger volumes than previously possible with diffusion models. A key innovation is our controlled unconditional sampling methodology, which enhances distribution coverage by first sampling target statistics from their empirical distributions, then generating samples conditioned on these values. Extensive testing on four distinct rock types demonstrates that conditioning on porosity - a readily computable statistic - is sufficient to ensure a consistent representation of multiple complex properties, including permeability, two-point correlation functions, and pore size distributions. The framework achieves better generation quality than pixel-space diffusion while enabling significantly larger volume reconstruction (256-cube voxels) with substantially reduced computational requirements, establishing a new state-of-the-art for digital rock physics applications.

artificial intelligence, diffusion model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2503.24083

Country:

Europe (0.67)
North America > United States (0.29)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Parametric Shadow Control for Portrait Generation in Text-to-Image Diffusion Models

Cai, Haoming, Huang, Tsung-Wei, Gehlot, Shiv, Feng, Brandon Y., Shah, Sachin, Su, Guan-Ming, Metzler, Christopher

arXiv.org Artificial IntelligenceApr-7-2025

Text-to-image diffusion models excel at generating diverse portraits, but lack intuitive shadow control. Existing editing approaches, as post-processing, struggle to offer effective manipulation across diverse styles. Additionally, these methods either rely on expensive real-world light-stage data collection or require extensive computational resources for training. To address these limitations, we introduce Shadow Director, a method that extracts and manipulates hidden shadow attributes within well-trained diffusion models. Our approach uses a small estimation network that requires only a few thousand synthetic images and hours of training-no costly real-world light-stage data needed. Shadow Director enables parametric and intuitive control over shadow shape, placement, and intensity during portrait generation while preserving artistic integrity and identity across diverse styles. Despite training only on synthetic data built on real-world identities, it generalizes effectively to generated portraits with diverse styles, making it a more accessible and resource-friendly solution.

artificial intelligence, diffusion model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2503.21943

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Quantum Complex-Valued Self-Attention Model

Chen, Fu, Zhao, Qinglin, Feng, Li, Tang, Longfei, Lin, Yangbin, Huang, Haitao

arXiv.org Artificial IntelligenceApr-7-2025

--Self-attention has revolutionized classical machine learning, yet existing quantum self-attention models underuti-lize quantum states' potential due to oversimplified or incomplete mechanisms. T o address this limitation, we introduce the Quantum Complex-V alued Self-Attention Model (QCSAM), the first framework to leverage complex-valued similarities, which captures amplitude and phase relationships between quantum states more comprehensively. T o achieve this, QCSAM extends the Linear Combination of Unitaries (LCUs) into the Complex LCUs (CLCUs) framework, enabling precise complex-valued weighting of quantum states and supporting quantum multi-head attention. Experiments on MNIST and Fashion-MNIST show that QCSAM outperforms recent quantum self-attention models, including QKSAN, QSAN, and GQHAN. With only 4 qubits, QCSAM achieves 100% and 99.2% test accuracies on MNIST and Fashion-MNIST, respectively. Furthermore, we evaluate scalability across 3-8 qubits and 2-4 class tasks, while ablation studies validate the advantages of complex-valued attention weights over real-valued alternatives. I NTRODUCTION The self-attention mechanism, as a key component of deep learning architectures, has significantly impacted the ways in which data is processed and features are learned [1]-[3]. By generating adaptive attention weights, self-attention not only highlights key features in the data but also integrates global contextual information, thereby improving the expressive power and computational efficiency of deep learning systems. For instance, in natural language processing [4]-[6], self-attention has enhanced language understanding and generation by capturing long-range dependencies and contextual information; in computer vision [7]-[9], it allows models to focus on key regions within images to optimize feature extraction; and in recommender systems [10], [11], it improves the accuracy of capturing user behavior and preferences, thereby enhancing the effectiveness of personalized recommendations. Large-scale models such as GPT -4 [12] have further exploited the potential of self-attention, allowing them to address multimodal tasks such as visual question answering, image captioning, and cross-modal reasoning. These developments demonstrate that the self-attention mechanism is a fundamental mechanism Corresponding author: Qinglin Zhao.(e-mail: qlzhao@must.edu.mo) Fu Chen, Qinglin Zhao, Li Feng and Haitao Huang are with Faculty of Innovation Engineering, Macau University of Science and Technology, 999078, China.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.19002

Country: Asia > China (0.48)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Explainable ICD Coding via Entity Linking

Barreiros, Leonor, Coutinho, Isabel, Correia, Gonçalo M., Martins, Bruno

arXiv.org Artificial IntelligenceApr-7-2025

Clinical coding is a critical task in healthcare, although traditional methods for automating clinical coding may not provide sufficient explicit evidence for coders in production environments. This evidence is crucial, as medical coders have to make sure there exists at least one explicit passage in the input health record that justifies the attribution of a code. We therefore propose to reframe the task as an entity linking problem, in which each document is annotated with its set of codes and respective textual evidence, enabling better human-machine collaboration. By leveraging parameter-efficient fine-tuning of Large Language Models (LLMs), together with constrained decoding, we introduce three approaches to solve this problem that prove effective at disambiguating clinical mentions and that perform well in few-shot scenarios.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2503.20508

Genre: Research Report (1.00)

Industry: Health & Medicine > Health Care Technology > Medical Record (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Multi-agent Application System in Office Collaboration Scenarios

Sun, Songtao, Li, Jingyi, Dong, Yuanfei, Liu, Haoguang, Xu, Chenxin, Li, Fuyang, Liu, Qiang

arXiv.org Artificial IntelligenceApr-7-2025

This paper introduces a multi-agent application system designed to enhance office collaboration efficiency and work quality. The system integrates artificial intelligence, machine learning, and natural language processing technologies, achieving functionalities such as task allocation, progress monitoring, and information sharing. The agents within the system are capable of providing personalized collaboration support based on team members' needs and incorporate data analysis tools to improve decision-making quality. The paper also proposes an intelligent agent architecture that separates Plan and Solver, and through techniques such as multi-turn query rewriting and business tool retrieval, it enhances the agent's multi-intent and multi-turn dialogue capabilities. Furthermore, the paper details the design of tools and multi-turn dialogue in the context of office collaboration scenarios, and validates the system's effectiveness through experiments and evaluations. Ultimately, the system has demonstrated outstanding performance in real business applications, particularly in query understanding, task planning, and tool calling. Looking forward, the system is expected to play a more significant role in addressing complex interaction issues within dynamic environments and large-scale multi-agent systems.

artificial intelligence, arxiv preprint arxiv, language model, (13 more...)

arXiv.org Artificial Intelligence

2503.19584

Country: Asia (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

PHEONA: An Evaluation Framework for Large Language Model-based Approaches to Computational Phenotyping

Pungitore, Sarah, Yadav, Shashank, Subbian, Vignesh

arXiv.org Artificial IntelligenceApr-7-2025

Computational phenotyping is essential for biomedical research but often requires significant time and resources, especially since traditional methods typically involve extensive manual data review. While machine learning and natural language processing advancements have helped, further improvements are needed. Few studies have explored using Large Language Models (LLMs) for these tasks despite known advantages of LLMs for text-based tasks. T o facilitate further research in this area, we developed an evaluation framework, Evaluation of PHEnotyping for Observational Health Data (PHEONA), that outlines context-specific considerations. W e applied and demonstrated PHEONA on concept classification, a specific task within a broader phenotyping process for Acute Respiratory Failure (ARF) respiratory support therapies. From the sample concepts tested, we achieved high classification accuracy, suggesting the potential for LLM-based methods to improve computational phenotyping processes.

classification, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2503.19265

Genre:

Overview (0.93)
Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.50)

Add feedback

Real-Time Evaluation Models for RAG: Who Detects Hallucinations Best?

Sardana, Ashish

arXiv.org Artificial IntelligenceApr-7-2025

This article surveys Evaluation models to automatically detect hallucinations in Retrieval-Augmented Generation (RAG), and presents a comprehensive benchmark of their performance across six RAG applications. Methods included in our study include: LLM-as-a-Judge, Prometheus, Lynx, the Hughes Hallucination Evaluation Model (HHEM), and the Trustworthy Language Model (TLM). These approaches are all reference-free, requiring no ground-truth answers/labels to catch incorrect LLM responses. Our study reveals that, across diverse RAG applications, some of these approaches consistently detect incorrect RAG responses with high precision/recall.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2503.21157

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.49)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions

Hou, Xinyi, Zhao, Yanjie, Wang, Shenao, Wang, Haoyu

arXiv.org Artificial IntelligenceApr-6-2025

The Model Context Protocol (MCP) is a standardized interface designed to enable seamless interaction between AI models and external tools and resources, breaking down data silos and facilitating interoperability across diverse systems. This paper provides a comprehensive overview of MCP, focusing on its core components, workflow, and the lifecycle of MCP servers, which consists of three key phases: creation, operation, and update. We analyze the security and privacy risks associated with each phase and propose strategies to mitigate potential threats. The paper also examines the current MCP landscape, including its adoption by industry leaders and various use cases, as well as the tools and platforms supporting its integration. We explore future directions for MCP, highlighting the challenges and opportunities that will influence its adoption and evolution within the broader AI ecosystem. Finally, we offer recommendations for MCP stakeholders to ensure its secure and sustainable development as the AI landscape continues to evolve.

data mining, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2503.23278

Country:

Asia (0.68)
North America > United States (0.28)

Genre: Overview (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(3 more...)

Add feedback

A Multi-Agent Framework Integrating Large Language Models and Generative AI for Accelerated Metamaterial Design

Tian, Jie, Sobczak, Martin Taylor, Patil, Dhanush, Hou, Jixin, Pang, Lin, Ramanathan, Arunachalam, Yang, Libin, Chen, Xianyan, Golan, Yuval, Zhai, Xiaoming, Sun, Hongyue, Song, Kenan, Wang, Xianqiao

arXiv.org Artificial IntelligenceApr-6-2025

Metamaterials, renowned for their exceptional mechanical, electromagnetic, and thermal properties, hold transformative potential across diverse applications, yet their design remains constrained by labor - intensive trial - and - error methods and limited data interoperability. Here, we introduce CrossMatAgent -- a novel multi - agent framework that synergistically integrates large language models with state - of - the - art generative AI to revolutionize metamaterial design. By orchestrating a hierarchical team of agents -- e ach specializing in tasks such as pattern analysis, architectural synthesis, prompt engineering, and supervisory feedback -- our system leverages the multimodal reasoning of GPT - 4o alongside the generative precision of DALL - E 3 and a fine - tuned Stable Diffusion Extra Large ( XL) model. This integrated approach automates data augmentation, enhances design fidelity, and produces simulation - and 3D printing - ready metamaterial patterns. Comprehensive evaluations, including Contrastive Language - Image Pre - training ( C LIP) - based alignment, SHAP ( SHapley Additive exPlanations) interpretability analyses, and mechanical simulations under varied load conditions, demonstrate the framework's ability to generate diverse, reproducible, and application - ready designs . CrossMatAgent thus establishes a scalable, AI - driven paradigm that bridges the gap between conceptual innovation and practical realization, paving the way for accelerated metamaterial development.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.19889

Country:

North America > United States (0.93)
Asia (0.67)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback