Goto

Collaborating Authors

 pov




DebateQA: Evaluating Question Answering on Debatable Knowledge

arXiv.org Artificial Intelligence

The rise of large language models (LLMs) has enabled us to seek answers to inherently debatable questions on LLM chatbots, necessitating a reliable way to evaluate their ability. However, traditional QA benchmarks assume fixed answers are inadequate for this purpose. To address this, we introduce DebateQA, a dataset of 2,941 debatable questions, each accompanied by multiple human-annotated partial answers that capture a variety of perspectives. We develop two metrics: Perspective Diversity, which evaluates the comprehensiveness of perspectives, and Dispute Awareness, which assesses if the LLM acknowledges the question's debatable nature. Experiments demonstrate that both metrics align with human preferences and are stable across different underlying models. Using DebateQA with two metrics, we assess 12 popular LLMs and retrieval-augmented generation methods. Our findings reveal that while LLMs generally excel at recognizing debatable issues, their ability to provide comprehensive answers encompassing diverse perspectives varies considerably.


Temporal Logic Formalisation of ISO 34502 Critical Scenarios: Modular Construction with the RSS Safety Distance

arXiv.org Artificial Intelligence

As the development of autonomous vehicles progresses, efficient safety assurance methods become increasingly necessary. Safety assurance methods such as monitoring and scenario-based testing call for formalisation of driving scenarios. In this paper, we develop a temporal-logic formalisation of an important class of critical scenarios in the ISO standard 34502. We use signal temporal logic (STL) as a logical formalism. Our formalisation has two main features: 1) modular composition of logical formulas for systematic and comprehensive formalisation (following the compositional methodology of ISO 34502); 2) use of the RSS distance for defining danger. We find our formalisation comes with few parameters to tune thanks to the RSS distance. We experimentally evaluated our formalisation; using its results, we discuss the validity of our formalisation and its stability with respect to the choice of some parameter values.


Formal Verification of Safety Architectures for Automated Driving

arXiv.org Artificial Intelligence

Safety architectures play a crucial role in the safety assurance of automated driving vehicles (ADVs). They can be used as safety envelopes of black-box ADV controllers, and for graceful degradation from one ODD to another. Building on our previous work on the formalization of responsibility-sensitive safety (RSS), we introduce a novel program logic that accommodates assume-guarantee reasoning and fallback-like constructs. This allows us to formally define and prove the safety of existing and novel safety architectures. We apply the logic to a pull over scenario and experimentally evaluate the resulting safety architecture.


Formal Verification of Intersection Safety for Automated Driving

arXiv.org Artificial Intelligence

We build on our recent work on formalization of responsibility-sensitive safety (RSS) and present the first formal framework that enables mathematical proofs of the safety of control strategies in intersection scenarios. Intersection scenarios are challenging due to the complex interaction between vehicles; to cope with it, we extend the program logic dFHL in the previous work and introduce a novel formalism of hybrid control flow graphs on which our algorithm can automatically discover an RSS condition that ensures safety. An RSS condition thus discovered is experimentally evaluated; we observe that it is safe (as our safety proof says) and is not overly conservative.


A Diversity Analysis of Safety Metrics Comparing Vehicle Performance in the Lead-Vehicle Interaction Regime

arXiv.org Artificial Intelligence

Vehicle performance metrics analyze data sets consisting of subject vehicle's interactions with other road users in a nominal driving environment and provide certain performance measures as outputs. To the best of the authors' knowledge, the vehicle safety performance metrics research dates back to at least 1967. To date, there still does not exist a community-wide accepted metric or a set of metrics for vehicle safety performance assessment and justification. This issue gets further amplified with the evolving interest in Advanced Driver Assistance Systems and Automated Driving Systems. In this paper, the authors seek to perform a unified study that facilitates an improved community-wide understanding of vehicle performance metrics using the lead-vehicle interaction operational design domain as a common means of performance comparison. In particular, the authors study the diversity (including constructive formulation discrepancies and empirical performance differences) among 33 base metrics with up to 51 metric variants (with different choices of hyper-parameters) in the existing literature, published between 1967 and 2022. Two data sets are adopted for the empirical performance diversity analysis, including vehicle trajectories from normal highway driving environment and relatively high-risk incidents with collisions and near-miss cases. The analysis further implies that (i) the conceptual acceptance of a safety metric proposal can be problematic if the assumptions, conditions, and types of outcome assurance are not justified properly, and (ii) the empirical performance justification of an acceptable metric can also be problematic as a dominant consensus is not observed among metrics empirically.


Khabiri

AAAI Conferences

An "elevator pitch" is a brief, persuasive speech that an experience seller can use to attain the attention of a prospective client. Unfortunately, when selling complex enterprise products and solutions, there is no one pitch that works for all customers. To craft a good pitch, a seller must study a large amount of documentation, including product descriptions, client references, and use cases. Leveraging experience developed over the years, sellers then determine which marketing message will work best with a client. The goal of our research is to automatically create knowledge snippets from a large set of enterprise documents that can be used in elevator pitches. We refer to these snippets of text as points of view (POVs). Our method is based on natural language understanding (NLU), clustering and ranking techniques where the most relevant and informative content are selected as POVs for a given product. In addition, our approach is tailored to create POVs for a given aspect of the product, like the business challenges or the benefits of deploying the product.


Changing the Narrative Perspective: From Deictic to Anaphoric Point of View

arXiv.org Artificial Intelligence

We introduce the task of changing the narrative point of view, where characters are assigned a narrative perspective that is different from the one originally used by the writer. The resulting shift in the narrative point of view alters the reading experience and can be used as a tool in fiction writing or to generate types of text ranging from educational to self-help and self-diagnosis. We introduce a benchmark dataset containing a wide range of types of narratives annotated with changes in point of view from deictic (first or second person) to anaphoric (third person) and describe a pipeline for processing raw text that relies on a neural architecture for mention selection. Evaluations on the new benchmark dataset show that the proposed architecture substantially outperforms the baselines by generating mentions that are less ambiguous and more natural.


Divide and Learn: A Divide and Conquer Approach for Predict+Optimize

arXiv.org Artificial Intelligence

Divide and Learn: A Divide and Conquer Approach for Predict Optimize Authors Ali Ugur Guler, 1 Emir Demirovic, 2 Jeffrey Chan, 3 James Bailey, 1 Christopher Leckie, 1 Peter J. Stuckey, 4 1 University of Melbourne, 2 Delft University of Technology, 3 RMIT University, 4 Monash University aguler@student.unimelb.edu.au, Abstract The predict optimize problem combines machine learning of problem coefficients with a combinatorial optimization problem that uses the predicted coefficients. While this problem can be solved in two separate stages, it is better to directly minimize the optimization loss. However, this requires differentiating through a discrete, non-differentiable combinatorial function. Most existing approaches use some form of surrogate gradient. Demirovic et al showed how to directly express the loss of the optimization problem in terms of the predicted coefficients as a piece-wise linear function. However, their approach is restricted to optimization problems with a dynamic programming formulation. In this work we propose a novel divide and conquer algorithm to tackle optimization problems without this restriction and predict its coefficients using the optimization loss. We also introduce a greedy version of this approach, which achieves similar results with less computation. We compare our approach with other approaches to the predict optimize problem and show we can successfully tackle some hard combinatorial problems better than other predict optimize methods. Introduction Machine Learning ( ML) has gained substantial attention in the last decade, and has proven to be useful in a wide range of industries. ML models usually focus on making accurate predictions by minimizing errors, such as mean squared error ( MSE). These predictions can then be used as coefficients in other decision making processes, such as a combinatorial optimization problem.