South America
Navigating the Edge with the State-of-the-Art Insights into Corner Case Identification and Generation for Enhanced Autonomous Vehicle Safety
Shimanuki, Gabriel Kenji Godoy, Nascimento, Alexandre Moreira, Vismari, Lucio Flavio, Junior, Joao Batista Camargo, Junior, Jorge Rady de Almeida, Cugnasca, Paulo Sergio
In recent years, there has been significant development of autonomous vehicle (AV) technologies. However, despite the notable achievements of some industry players, a strong and appealing body of evidence that demonstrate AVs are actually safe is lacky, which could foster public distrust in this technology and further compromise the entire development of this industry, as well as related social impacts. To improve the safety of AVs, several techniques are proposed that use synthetic data in virtual simulation. In particular, the highest risk data, known as corner cases (CCs), are the most valuable for developing and testing AV controls, as they can expose and improve the weaknesses of these autonomous systems. In this context, the present paper presents a systematic literature review aiming to comprehensively analyze methodologies for CC identifi cation and generation, also pointing out current gaps and further implications of synthetic data for AV safety and reliability. Based on a selection criteria, 110 studies were picked from an initial sample of 1673 papers. These selected paper were mapped into multiple categories to answer eight inter-linked research questions. It concludes with the recommendation of a more integrated approach focused on safe development among all stakeholders, with active collaboration between industry, academia and regulatory bodies.
Societal Alignment Frameworks Can Improve LLM Alignment
Stańczak, Karolina, Meade, Nicholas, Bhatia, Mehar, Zhou, Hattie, Böttinger, Konstantin, Barnes, Jeremy, Stanley, Jason, Montgomery, Jessica, Zemel, Richard, Papernot, Nicolas, Chapados, Nicolas, Therien, Denis, Lillicrap, Timothy P., Marasović, Ana, Delacroix, Sylvie, Hadfield, Gillian K., Reddy, Siva
Recent progress in large language models (LLMs) has focused on producing responses that meet human expectations and align with shared values - a process coined alignment. However, aligning LLMs remains challenging due to the inherent disconnect between the complexity of human values and the narrow nature of the technological approaches designed to address them. Current alignment methods often lead to misspecified objectives, reflecting the broader issue of incomplete contracts, the impracticality of specifying a contract between a model developer, and the model that accounts for every scenario in LLM alignment. In this paper, we argue that improving LLM alignment requires incorporating insights from societal alignment frameworks, including social, economic, and contractual alignment, and discuss potential solutions drawn from these domains. Given the role of uncertainty within societal alignment frameworks, we then investigate how it manifests in LLM alignment. We end our discussion by offering an alternative view on LLM alignment, framing the underspecified nature of its objectives as an opportunity rather than perfect their specification. Beyond technical improvements in LLM alignment, we discuss the need for participatory alignment interface designs.
Towards Zero Touch Networks: Cross-Layer Automated Security Solutions for 6G Wireless Networks
Yang, Li, Naser, Shimaa, Shami, Abdallah, Muhaidat, Sami, Ong, Lyndon, Debbah, Mérouane
The transition from 5G to 6G mobile networks necessitates network automation to meet the escalating demands for high data rates, ultra-low latency, and integrated technology. Recently, Zero-Touch Networks (ZTNs), driven by Artificial Intelligence (AI) and Machine Learning (ML), are designed to automate the entire lifecycle of network operations with minimal human intervention, presenting a promising solution for enhancing automation in 5G/6G networks. However, the implementation of ZTNs brings forth the need for autonomous and robust cybersecurity solutions, as ZTNs rely heavily on automation. AI/ML algorithms are widely used to develop cybersecurity mechanisms, but require substantial specialized expertise and encounter model drift issues, posing significant challenges in developing autonomous cybersecurity measures. Therefore, this paper proposes an automated security framework targeting Physical Layer Authentication (PLA) and Cross-Layer Intrusion Detection Systems (CLIDS) to address security concerns at multiple Internet protocol layers. The proposed framework employs drift-adaptive online learning techniques and a novel enhanced Successive Halving (SH)-based Automated ML (AutoML) method to automatically generate optimized ML models for dynamic networking environments. Experimental results illustrate that the proposed framework achieves high performance on the public Radio Frequency (RF) fingerprinting and the Canadian Institute for CICIDS2017 datasets, showcasing its effectiveness in addressing PLA and CLIDS tasks within dynamic and complex networking environments. Furthermore, the paper explores open challenges and research directions in the 5G/6G cybersecurity domain. This framework represents a significant advancement towards fully autonomous and secure 6G networks, paving the way for future innovations in network automation and cybersecurity.
Few-Shot, No Problem: Descriptive Continual Relation Extraction
Thanh, Nguyen Xuan, Le, Anh Duc, Tran, Quyen, Le, Thanh-Thien, Van, Linh Ngo, Nguyen, Thien Huu
Few-shot Continual Relation Extraction is a crucial challenge for enabling AI systems to identify and adapt to evolving relationships in dynamic real-world domains. Traditional memory-based approaches often overfit to limited samples, failing to reinforce old knowledge, with the scarcity of data in few-shot scenarios further exacerbating these issues by hindering effective data augmentation in the latent space. In this paper, we propose a novel retrieval-based solution, starting with a large language model to generate descriptions for each relation. From these descriptions, we introduce a bi-encoder retrieval training paradigm to enrich both sample and class representation learning. Leveraging these enhanced representations, we design a retrieval-based prediction method where each sample "retrieves" the best fitting relation via a reciprocal rank fusion score that integrates both relation description vectors and class prototypes. Extensive experiments on multiple datasets demonstrate that our method significantly advances the state-of-the-art by maintaining robust performance across sequential tasks, effectively addressing catastrophic forgetting.
Visual Reasoning at Urban Intersections: FineTuning GPT-4o for Traffic Conflict Detection
Masri, Sari, Ashqar, Huthaifa I., Elhenawy, Mohammed
-- Traffic control in unsignalized urban intersections presents significant challenges due to the complexity, frequent conflicts, and blind spots. This study explores the capability of leveraging Multimodal L arge L anguage M odel s (MLLMs), such as GPT - 4o, to provide logical and visual reasoning by directly using birds - eye - view videos of four - legged intersections. In this proposed method, GPT - 4o act s as intelligent system to detect conflicts and provide explanations and recommendations for the drivers . The fine - tuned model achieved an accuracy of 77.14%, while the manual evaluation of the true predicted values of the fine - tuned GPT - 4o showed significant achievements of 89.9% accuracy for model - generated explanations and 92.3% for the recommended next a ctions. Urban intersections are highly challenging due to their unpredictability and dynamism, especially in cases of unsignalized intersections. Interactions often occur among motor vehicles and other road users in such areas.
HazardNet: A Small-Scale Vision Language Model for Real-Time Traffic Safety Detection at Edge Devices
Tami, Mohammad Abu, Elhenawy, Mohammed, Ashqar, Huthaifa I.
Traffic safety remains a vital concern in contemporary urban settings, intensified by the increase of vehicles and the complicated nature of road networks. Traditional safety-critical event detection systems predominantly rely on sensor-based approaches and conventional machine learning algorithms, necessitating extensive data collection and complex training processes to adhere to traffic safety regulations. This paper introduces HazardNet, a small-scale Vision Language Model designed to enhance traffic safety by leveraging the reasoning capabilities of advanced language and vision models. We built HazardNet by fine-tuning the pre-trained Qwen2-VL-2B model, chosen for its superior performance among open-source alternatives and its compact size of two billion parameters. This helps to facilitate deployment on edge devices with efficient inference throughput. In addition, we present HazardQA, a novel Vision Question Answering (VQA) dataset constructed specifically for training HazardNet on real-world scenarios involving safety-critical events. Our experimental results show that the fine-tuned HazardNet outperformed the base model up to an 89% improvement in F1-Score and has comparable results with improvement in some cases reach up to 6% when compared to larger models, such as GPT-4o. These advancements underscore the potential of HazardNet in providing real-time, reliable traffic safety event detection, thereby contributing to reduced accidents and improved traffic management in urban environments. Both HazardNet model and the HazardQA dataset are available at https://huggingface.co/Tami3/HazardNet and https://huggingface.co/datasets/Tami3/HazardQA, respectively.
A Thousand Words or An Image: Studying the Influence of Persona Modality in Multimodal LLMs
Broomfield, Julius, Sharma, Kartik, Kumar, Srijan
Large language models (LLMs) have recently demonstrated remarkable advancements in embodying diverse personas, enhancing their effectiveness as conversational agents and virtual assistants. Consequently, LLMs have made significant strides in processing and integrating multimodal information. However, even though human personas can be expressed in both text and image, the extent to which the modality of a persona impacts the embodiment by the LLM remains largely unexplored. In this paper, we investigate how do different modalities influence the expressiveness of personas in multimodal LLMs. To this end, we create a novel modality-parallel dataset of 40 diverse personas varying in age, gender, occupation, and location. This consists of four modalities to equivalently represent a persona: image-only, text-only, a combination of image and small text, and typographical images, where text is visually stylized to convey persona-related attributes. We then create a systematic evaluation framework with 60 questions and corresponding metrics to assess how well LLMs embody each persona across its attributes and scenarios. Comprehensive experiments on $5$ multimodal LLMs show that personas represented by detailed text show more linguistic habits, while typographical images often show more consistency with the persona. Our results reveal that LLMs often overlook persona-specific details conveyed through images, highlighting underlying limitations and paving the way for future research to bridge this gap. We release the data and code at https://github.com/claws-lab/persona-modality .
Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis
Chiang, Jeffrey Yang Fan, Lee, Seungjae, Huang, Jia-Bin, Huang, Furong, Chen, Yizheng
Recent advancements in Web AI agents have demonstrated remarkable capabilities in addressing complex web navigation tasks. However, emerging research shows that these agents exhibit greater vulnerability compared to standalone Large Language Models (LLMs), despite both being built upon the same safety-aligned models. This discrepancy is particularly concerning given the greater flexibility of Web AI Agent compared to standalone LLMs, which may expose them to a wider range of adversarial user inputs. To build a scaffold that addresses these concerns, this study investigates the underlying factors that contribute to the increased vulnerability of Web AI agents. Notably, this disparity stems from the multifaceted differences between Web AI agents and standalone LLMs, as well as the complex signals - nuances that simple evaluation metrics, such as success rate, often fail to capture. To tackle these challenges, we propose a component-level analysis and a more granular, systematic evaluation framework. Through this fine-grained investigation, we identify three critical factors that amplify the vulnerability of Web AI agents; (1) embedding user goals into the system prompt, (2) multi-step action generation, and (3) observational capabilities. Our findings highlights the pressing need to enhance security and robustness in AI agent design and provide actionable insights for targeted defense strategies.
Bridging Legal Knowledge and AI: Retrieval-Augmented Generation with Vector Stores, Knowledge Graphs, and Hierarchical Non-negative Matrix Factorization
Barron, Ryan C., Eren, Maksim E., Serafimova, Olga M., Matuszek, Cynthia, Alexandrov, Boian S.
Agentic Generative AI, powered by Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG), Knowledge Graphs (KGs), and Vector Stores (VSs), represents a transformative technology applicable to specialized domains such as legal systems, research, recommender systems, cybersecurity, and global security, including proliferation research. This technology excels at inferring relationships within vast unstructured or semi-structured datasets. The legal domain here comprises complex data characterized by extensive, interrelated, and semi-structured knowledge systems with complex relations. It comprises constitutions, statutes, regulations, and case law. Extracting insights and navigating the intricate networks of legal documents and their relations is crucial for effective legal research. Here, we introduce a generative AI system that integrates RAG, VS, and KG, constructed via Non-Negative Matrix Factorization (NMF), to enhance legal information retrieval and AI reasoning and minimize hallucinations. In the legal system, these technologies empower AI agents to identify and analyze complex connections among cases, statutes, and legal precedents, uncovering hidden relationships and predicting legal trends-challenging tasks that are essential for ensuring justice and improving operational efficiency. Our system employs web scraping techniques to systematically collect legal texts, such as statutes, constitutional provisions, and case law, from publicly accessible platforms like Justia. It bridges the gap between traditional keyword-based searches and contextual understanding by leveraging advanced semantic representations, hierarchical relationships, and latent topic discovery. This framework supports legal document clustering, summarization, and cross-referencing, for scalable, interpretable, and accurate retrieval for semi-structured data while advancing computational law and AI.
Selective Use of Yannakakis' Algorithm to Improve Query Performance: Machine Learning to the Rescue
Böhm, Daniela, Gottlob, Georg, Lanzinger, Matthias, Longo, Davide, Okulmus, Cem, Pichler, Reinhard, Selzer, Alexander
Query optimization has played a central role in database research for decades. However, more often than not, the proposed optimization techniques lead to a performance improvement in some, but not in all, situations. Therefore, we urgently need a methodology for designing a decision procedure that decides for a given query whether the optimization technique should be applied or not. In this work, we propose such a methodology with a focus on Yannakakis-style query evaluation as our optimization technique of interest. More specifically, we formulate this decision problem as an algorithm selection problem and we present a Machine Learning based approach for its solution. Empirical results with several benchmarks on a variety of database systems show that our approach indeed leads to a statistically significant performance improvement.