Goto

Collaborating Authors

 pirate


Large-Scale Constraint Generation -- Can LLMs Parse Hundreds of Constraints?

Boffa, Matteo, You, Jiaxuan

arXiv.org Artificial Intelligence

Recent research has explored the constrained generation capabilities of Large Language Models (LLMs) when explicitly prompted by few task-specific requirements. In contrast, we introduce Large-Scale Constraint Generation (LSCG), a new problem that evaluates whether LLMs can parse a large, fine-grained, generic list of constraints. To examine the LLMs' ability to handle an increasing number constraints, we create a practical instance of LSCG, called Words Checker. In Words Checker, we evaluate the impact of model characteristics (e.g., size, family) and steering techniques (e.g., Simple Prompt, Chain of Thought, Best of N) on performance. We also propose FoCusNet, a small and dedicated model that parses the original list of constraints into a smaller subset, helping the LLM focus on relevant constraints. Experiments reveal that existing solutions suffer a significant performance drop as the number of constraints increases, with FoCusNet showing an 8-13% accuracy boost.


Can AI make novels better? Not if these attempts are anything to go by

New Scientist

Feedback is New Scientist's popular sideways look at the latest science and technology news. You can submit items you believe may amuse readers to Feedback by emailing feedback@newscientist.com One of the great joys in life, Feedback argues, is the perfect opening sentence of a book – and the concomitant realisation that, yes, this one is going to be good. "It was the day my grandmother exploded." "As the manager of the Performance sits before the curtain on the boards and looks into the Fair, a feeling of profound melancholy comes over him in his survey of the bustling place."


Pirates of the RAG: Adaptively Attacking LLMs to Leak Knowledge Bases

Di Maio, Christian, Cosci, Cristian, Maggini, Marco, Poggioni, Valentina, Melacci, Stefano

arXiv.org Artificial Intelligence

The growing ubiquity of Retrieval-Augmented Generation (RAG) systems in several real-world services triggers severe concerns about their security. A RAG system improves the generative capabilities of a Large Language Models (LLM) by a retrieval mechanism which operates on a private knowledge base, whose unintended exposure could lead to severe consequences, including breaches of private and sensitive information. This paper presents a black-box attack to force a RAG system to leak its private knowledge base which, differently from existing approaches, is adaptive and automatic. A relevance-based mechanism and an attacker-side open-source LLM favor the generation of effective queries to leak most of the (hidden) knowledge base. Extensive experimentation proves the quality of the proposed algorithm in different RAG pipelines and domains, comparing to very recent related approaches, which turn out to be either not fully black-box, not adaptive, or not based on open-source models. The findings from our study remark the urgent need for more robust privacy safeguards in the design and deployment of RAG systems.


Affectively Framework: Towards Human-like Affect-Based Agents

Barthet, Matthew, Gallotta, Roberto, Khalifa, Ahmed, Liapis, Antonios, Yannakakis, Georgios N.

arXiv.org Artificial Intelligence

--Game environments offer a unique opportunity for training virtual agents due to their interactive nature, which provides diverse play traces and affect labels. Despite their potential, no reinforcement learning framework incorporates human affect models as part of their observation space or reward mechanism. T o address this, we present the Affectively Framework, a set of Open-AI Gym environments that integrate affect as part of the observation space. This paper introduces the framework and its three game environments and provides baseline experiments to validate its effectiveness and potential. Video games are ideal stimuli for research in Affective Computing [1] for several reasons. Firstly, the user is free to play in many different ways, leading to diversity in their play traces and emotional experiences [2].


How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments

Huang, Jen-tse, Li, Eric John, Lam, Man Ho, Liang, Tian, Wang, Wenxuan, Yuan, Youliang, Jiao, Wenxiang, Wang, Xing, Tu, Zhaopeng, Lyu, Michael R.

arXiv.org Artificial Intelligence

Figure 1: γ-Bench enables various LLMs and humans to participate in multi-agent, multi-round games. The framework includes eight classical games in Game Theory, each categorized into one of three groups. Decision-making, a complicated task requiring various types of abilities, presents an excellent framework for assessing Large Language Models (LLMs). Our research investigates LLMs' decision-making capabilities through the lens of a wellestablished field, Game Theory. We focus specifically on games that support the participation of more than two agents simultaneously. Subsequently, we introduce our framework, γ-Bench, including eight classical multi-agent games. We design a scoring scheme to assess a model's performance in these games quantitatively. Through γ-Bench, we investigate LLMs' robustness, generalizability, and enhancement strategies. Results reveal that while GPT-3.5 shows satisfying robustness, its generalizability is relatively limited. However, its performance can be improved through approaches such as Chain-of-Thought. Additionally, we conduct evaluations across various LLMs and find that GPT-4 outperforms other models on γ-Bench, achieving a score of 60.5. Wenxiang Jiao is the corresponding author. We have recently witnessed the advancements in Artificial Intelligence (AI) made by Large Language Models (LLMs), which have marked a significant breakthrough in the field. Beyond the academic sphere, LLMs have entered diverse aspects of our everyday life, such as education (Baidoo-Anu & Ansah, 2023), legal service (Guha et al., 2023), product design (Lanzi & Loiacono, 2023), and healthcare (Johnson et al., 2023). Given their extensive capabilities, evaluating LLMs demands more than simple, isolated tasks. A comprehensive and multifaceted approach is highly in demand to assess the efficacy of these advanced models.


Abusing Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs

Bagdasaryan, Eugene, Hsieh, Tsung-Yin, Nassi, Ben, Shmatikov, Vitaly

arXiv.org Artificial Intelligence

We demonstrate how images and sounds can be used for indirect prompt and instruction injection in multi-modal LLMs. An attacker generates an adversarial perturbation corresponding to the prompt and blends it into an image or audio recording. When the user asks the (unmodified, benign) model about the perturbed image or audio, the perturbation steers the model to output the attacker-chosen text and/or make the subsequent dialog follow the attacker's instruction. We illustrate this attack with several proof-of-concept examples targeting LLaVA and PandaGPT.


Toward the Automated Construction of Probabilistic Knowledge Graphs for the Maritime Domain

Shiri, Fatemeh, Wang, Teresa, Pan, Shirui, Chang, Xiaojun, Li, Yuan-Fang, Haffari, Reza, Nguyen, Van, Yu, Shuang

arXiv.org Artificial Intelligence

International maritime crime is becoming increasingly sophisticated, often associated with wider criminal networks. Detecting maritime threats by means of fusing data purely related to physical movement (i.e., those generated by physical sensors, or hard data) is not sufficient. This has led to research and development efforts aimed at combining hard data with other types of data (especially human-generated or soft data). Existing work often assumes that input soft data is available in a structured format, or is focused on extracting certain relevant entities or concepts to accompany or annotate hard data. Much less attention has been given to extracting the rich knowledge about the situations of interest implicitly embedded in the large amount of soft data existing in unstructured formats (such as intelligence reports and news articles). In order to exploit the potentially useful and rich information from such sources, it is necessary to extract not only the relevant entities and concepts but also their semantic relations, together with the uncertainty associated with the extracted knowledge (i.e., in the form of probabilistic knowledge graphs). This will increase the accuracy of and confidence in, the extracted knowledge and facilitate subsequent reasoning and learning. To this end, we propose Maritime DeepDive, an initial prototype for the automated construction of probabilistic knowledge graphs from natural language data for the maritime domain. In this paper, we report on the current implementation of Maritime DeepDive, together with preliminary results on extracting probabilistic events from maritime piracy incidents. This pipeline was evaluated on a manually crafted gold standard, yielding promising results.


What happens before and after: Multi-Event Commonsense in Event Coreference Resolution

Ravi, Sahithya, Tanner, Chris, Ng, Raymond, Shwartz, Vered

arXiv.org Artificial Intelligence

Event coreference models cluster event mentions pertaining to the same real-world event. Recent models rely on contextualized representations to recognize coreference among lexically or contextually similar mentions. However, models typically fail to leverage commonsense inferences, which is particularly limiting for resolving lexically-divergent mentions. We propose a model that extends event mentions with temporal commonsense inferences. Given a complex sentence with multiple events, e.g., "The man killed his wife and got arrested", with the target event "arrested", our model generates plausible events that happen before the target event - such as "the police arrived", and after it, such as "he was sentenced". We show that incorporating such inferences into an existing event coreference model improves its performance, and we analyze the coreferences in which such temporal knowledge is required.


PANCETTA: Phoneme Aware Neural Completion to Elicit Tongue Twisters Automatically

Keh, Sedrick Scott, Feng, Steven Y., Gangal, Varun, Alikhani, Malihe, Hovy, Eduard

arXiv.org Artificial Intelligence

Tongue twisters are meaningful sentences that are difficult to pronounce. The process of automatically generating tongue twisters is challenging since the generated utterance must satisfy two conditions at once: phonetic difficulty and semantic meaning. Furthermore, phonetic difficulty is itself hard to characterize and is expressed in natural tongue twisters through a heterogeneous mix of phenomena such as alliteration and homophony. In this paper, we propose PANCETTA: Phoneme Aware Neural Completion to Elicit Tongue Twisters Automatically. We leverage phoneme representations to capture the notion of phonetic difficulty, and we train language models to generate original tongue twisters on two proposed task settings. To do this, we curate a dataset called PANCETTA, consisting of existing English tongue twisters. Through automatic and human evaluation, as well as qualitative analysis, we show that PANCETTA generates novel, phonetically difficult, fluent, and semantically meaningful tongue twisters.


Comment: how ships can outwit piracy with AI

#artificialintelligence

Deep learning is on the frontline in a new age of piracy, outwitting attacks with pre-emptive tech, explains Yarden Gross, CEO and co-founder of Orca AI. Almost a decade has passed since piracy raged off Somalia, and yet the danger posed by maritime hijackings is as present as ever. The global pandemic last year sparked a resurgence of attacks, with piracy incidents doubling across Asia, in a worrying uptick also seen in the Gulf of Mexico and West Africa. The fallout from coronavirus, including the loss of key security personnel, turned quarantined vessels into easy targets. This wave has since receded a little, with the International Maritime Bureau reporting a 44 per cent YoY dip in piracy and armed robbery incidents in 2021.