Goto

Collaborating Authors

 Overview


Large Language Models as Event Forecasters

arXiv.org Artificial Intelligence

Key elements of human events are extracted as quadruples that consist of subject, relation, object, and timestamp. This representation can be extended to a quintuple by adding a fifth element: a textual summary that briefly describes the event. These quadruples or quintuples, when organized within a specific domain, form a temporal knowledge graph (TKG). Current learning frameworks focus on a few TKG-related tasks, such as predicting an object given a subject and a relation or forecasting the occurrences of multiple types of events (i.e., relation) in the next time window. They typically rely on complex structural and sequential models like graph neural networks (GNNs) and recurrent neural networks (RNNs) to update intermediate embeddings. However, these methods often neglect the contextual information inherent in each quintuple, which can be effectively captured through concise textual descriptions. In this paper, we investigate how large language models (LLMs) can streamline the design of TKG learning frameworks while maintaining competitive accuracy in prediction and forecasting tasks. We develop multiple prompt templates to frame the object prediction (OP) task as a standard question-answering (QA) task, suitable for instruction fine-tuning with an encoder-decoder generative LLM. For multi-event forecasting (MEF), we design simple yet effective prompt templates for each TKG quintuple. This novel approach removes the need for GNNs and RNNs, instead utilizing an encoder-only LLM to generate fixed intermediate embeddings, which are subsequently processed by a prediction head with a self-attention mechanism to forecast potential future relations. Extensive experiments on multiple real-world datasets using various evaluation metrics validate the effectiveness and robustness of our approach.


Can We Catch the Elephant? A Survey of the Evolvement of Hallucination Evaluation on Natural Language Generation

arXiv.org Artificial Intelligence

Hallucination in Natural Language Generation (NLG) is like the elephant in the room, obvious but often overlooked until recent achievements significantly improved the fluency and grammaticality of generated text. As the capabilities of text generation models have improved, researchers have begun to pay more attention to the phenomenon of hallucination. Despite significant progress in this field in recent years, the evaluation system for hallucination is complex and diverse, lacking clear organization. We are the first to comprehensively survey how various evaluation methods have evolved with the development of text generation models from three dimensions, including hallucinated fact granularity, evaluator design principles, and assessment facets. This survey aims to help researchers identify current limitations in hallucination evaluation and highlight future research directions.


MLKV: Multi-Layer Key-Value Heads for Memory Efficient Transformer Decoding

arXiv.org Artificial Intelligence

Auto-regressive inference of transformers benefit greatly from Key-Value (KV) caching, but can lead to major memory bottlenecks as model size, batch size, and sequence length grow at scale. We introduce Multi-Layer Key-Value (MLKV) sharing, a novel approach extending KV sharing across transformer layers to reduce memory usage beyond what was possible with Multi-Query Attention (MQA) and Grouped-Query Attention (GQA). Evaluations on various NLP benchmarks and inference metrics using uptrained Pythia-160M variants demonstrate that MLKV significantly reduces memory usage with minimal performance loss, reducing KV cache size down Figure 1: Simplified overview of current KV sharing to a factor of 6x compared to MQA. These methods, vanilla MHA (top left), MQA (bottom left), results highlight MLKV's potential for efficient and GQA (top right). All of them share KV heads deployment of transformer models at within the same layer. Our proposed KV sharing scheme scale. We provide code at https://github. MLKV (bottom right) shares KV heads between layers.


Classical and Quantum Physical Reservoir Computing for Onboard Artificial Intelligence Systems: A Perspective

arXiv.org Artificial Intelligence

Artificial intelligence (AI) systems of autonomous systems such as drones, robots and self-driving cars may consume up to 50% of total power available onboard, thereby limiting the vehicle's range of functions and considerably reducing the distance the vehicle can travel on a single charge. Next-generation onboard AI systems need an even higher power since they collect and process even larger amounts of data in real time. This problem cannot be solved using the traditional computing devices since they become more and more power-consuming. In this review article, we discuss the perspectives of development of onboard neuromorphic computers that mimic the operation of a biological brain using nonlinear-dynamical properties of natural physical environments surrounding autonomous vehicles. Previous research also demonstrated that quantum neuromorphic processors (QNPs) can conduct computations with the efficiency of a standard computer while consuming less than 1% of the onboard battery power. Since QNPs is a semi-classical technology, their technical simplicity and low, compared with quantum computers, cost make them ideally suitable for application in autonomous AI system. Providing a perspective view on the future progress in unconventional physical reservoir computing and surveying the outcomes of more than 200 interdisciplinary research works, this article will be of interest to a broad readership, including both students and experts in the fields of physics, engineering, quantum technologies and computing.


On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey

arXiv.org Artificial Intelligence

Within the evolving landscape of deep learning, the dilemma of data quantity and quality has been a long-standing problem. The recent advent of Large Language Models (LLMs) offers a data-centric solution to alleviate the limitations of real-world data with synthetic data generation. However, current investigations into this field lack a unified framework and mostly stay on the surface. Therefore, this paper provides an organization of relevant studies based on a generic workflow of synthetic data generation. By doing so, we highlight the gaps within existing research and outline prospective avenues for future study. This work aims to shepherd the academic and industrial communities towards deeper, more methodical inquiries into the capabilities and applications of LLMs-driven synthetic data generation.


Challenging the Machine: Contestability in Government AI Systems

arXiv.org Artificial Intelligence

In an October 2023 executive order (EO), President Biden issued a detailed but largely aspirational road map for the safe and responsible development and use of artificial intelligence (AI). The challenge for the January 24-25, 2024 workshop was to transform those aspirations regarding one specific but crucial issue -- the ability of individuals to challenge government decisions made about themselves -- into actionable guidance enabling agencies to develop, procure, and use genuinely contestable advanced automated decision-making systems. While the Administration has taken important steps since the October 2023 EO, the insights garnered from our workshop remain highly relevant, as the requirements for contestability of advanced decision-making systems are not yet fully defined or implemented. The workshop brought together technologists, members of government agencies and civil society organizations, litigators, and researchers in an intensive two-day meeting that examined the challenges that users, developers, and agencies faced in enabling contestability in light of advanced automated decision-making systems. To ensure a free and open flow of discussion, the meeting was held under a modified version of the Chatham House rule. Participants were free to use any information or details that they learned, but they may not attribute any remarks made at the meeting by the identity or the affiliation of the speaker. Thus, the workshop summary that follows anonymizes speakers and their affiliation. Where an identification of an agency, company, or organization is made, it is done from a public, identified resource and does not necessarily reflect statements made by participants at the workshop. This document is a report of that workshop, along with recommendations and explanatory material.


BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval

arXiv.org Artificial Intelligence

Existing Vision-Language Compositionality (VLC) benchmarks like SugarCrepe are formulated as image-to-text retrieval problems, where, given an image, the models need to select between the correct textual description and a synthetic hard negative text. In this work we present the Bidirectional Vision-Language Compositionality (BiVLC) dataset. The novelty of BiVLC is to add a synthetic hard negative image generated from the synthetic text, resulting in two image-to-text retrieval examples (one for each image) and, more importantly, two text-to-image retrieval examples (one for each text). Human annotators filter out ill-formed examples ensuring the validity of the benchmark. The experiments on BiVLC uncover a weakness of current multimodal models, as they perform poorly in the text-to-image direction. In fact, when considering both retrieval directions, the conclusions obtained in previous works change significantly. In addition to the benchmark, we show that a contrastive model trained using synthetic images and texts improves the state of the art in SugarCrepe and in BiVLC for both retrieval directions. The gap to human performance in BiVLC confirms that Vision-Language Compositionality is still a challenging problem. BiVLC and code are available at https://imirandam.github.io/BiVLC_project_page.


Provably Safe Neural Network Controllers via Differential Dynamic Logic

arXiv.org Artificial Intelligence

While neural networks (NNs) have potential as autonomous controllers for Cyber-Physical Systems, verifying the safety of NN based control systems (NNCSs) poses significant challenges for the practical use of NNs, especially when safety is needed for unbounded time horizons. One reason is the intractability of analyzing NNs, ODEs and hybrid systems. To this end, we introduce VerSAILLE (Verifiably Safe AI via Logically Linked Envelopes): The first general approach that allows reusing control theory results for NNCS verification. By joining forces, we exploit the efficiency of NN verification tools while retaining the rigor of differential dynamic logic (dL). Based on provably safe control envelopes in dL, we derive specifications for the NN which is proven via NN verification. We show that a proof of the NN adhering to the specification is mirrored by a dL proof on the infinite-time safety of the NNCS. The NN verification properties resulting from hybrid systems typically contain nonlinear arithmetic and arbitrary logical structures while efficient NN verification merely supports linear constraints. To overcome this divide, we present Mosaic: An efficient, sound and complete verification approach for polynomial real arithmetic properties on piece-wise linear NNs. Mosaic partitions complex verification queries into simple queries and lifts off-the-shelf linear constraint tools to the nonlinear setting in a completeness-preserving manner by combining approximation with exact reasoning for counterexample regions. Our evaluation demonstrates the versatility of VerSAILLE and Mosaic: We prove infinite-time safety on the classical Vertical Airborne Collision Avoidance NNCS verification benchmark for two scenarios while (exhaustively) enumerating counterexample regions in unsafe scenarios. We also show that our approach significantly outperforms State-of-the-Art tools in closed-loop NNV.


Vulnerable Road User Detection and Safety Enhancement: A Comprehensive Survey

arXiv.org Artificial Intelligence

Traffic incidents involving vulnerable road users (VRUs) constitute a significant proportion of global road accidents. Advances in traffic communication ecosystems, coupled with sophisticated signal processing and machine learning techniques, have facilitated the utilization of data from diverse sensors. Despite these advancements and the availability of extensive datasets, substantial progress is required to mitigate traffic casualties. This paper provides a comprehensive survey of state-of-the-art technologies and methodologies to enhance the safety of VRUs. The study delves into the communication networks between vehicles and VRUs, emphasizing the integration of advanced sensors and the availability of relevant datasets. It explores preprocessing techniques and data fusion methods to enhance sensor data quality. Furthermore, our study assesses critical simulation environments essential for developing and testing VRU safety systems. Our research also highlights recent advances in VRU detection and classification algorithms, addressing challenges such as variable environmental conditions. Additionally, we cover cutting-edge research in predicting VRU intentions and behaviors, which is crucial for proactive collision avoidance strategies. Through this survey, we aim to provide a comprehensive understanding of the current landscape of VRU safety technologies, identifying areas of progress and areas needing further research and development.


TEG-DB: A Comprehensive Dataset and Benchmark of Textual-Edge Graphs

arXiv.org Artificial Intelligence

Text-Attributed Graphs (TAGs) augment graph structures with natural language descriptions, facilitating detailed depictions of data and their interconnections across various real-world settings. However, existing TAG datasets predominantly feature textual information only at the nodes, with edges typically represented by mere binary or categorical attributes. This lack of rich textual edge annotations significantly limits the exploration of contextual relationships between entities, hindering deeper insights into graph-structured data. To address this gap, we introduce Textual-Edge Graphs Datasets and Benchmark (TEG-DB), a comprehensive and diverse collection of benchmark textual-edge datasets featuring rich textual descriptions on nodes and edges. The TEG-DB datasets are large-scale and encompass a wide range of domains, from citation networks to social networks. In addition, we conduct extensive benchmark experiments on TEG-DB to assess the extent to which current techniques, including pre-trained language models, graph neural networks, and their combinations, can utilize textual node and edge information. Our goal is to elicit advancements in textual-edge graph research, specifically in developing methodologies that exploit rich textual node and edge descriptions to enhance graph analysis and provide deeper insights into complex real-world networks.