Goto

Collaborating Authors

 adversarial question


IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs

Faraz, Ali, Akash, null, Khan, Shaharukh, Kolla, Raja, Patidar, Akshat, Goswami, Suranjan, Ravi, Abhinav, Khatri, Chandra, Agarwal, Shubham

arXiv.org Artificial Intelligence

Vision-language models (VLMs) have demonstrated impressive generalization across multimodal tasks, yet most evaluation benchmarks remain Western-centric, leaving open questions about their performance in culturally diverse and multilingual settings. To address this gap, we introduce IndicVisionBench, the first large-scale benchmark centered on the Indian subcontinent. Our final benchmark consists of a total of 5K images and 37K+ QA pairs across 13 culturally grounded topics. In addition, we release a paired parallel corpus of annotations across 10 Indic languages, creating a unique resource for analyzing cultural and linguistic biases in VLMs. We evaluate a broad spectrum of 8 models, from proprietary closed-source systems to open-weights medium and large-scale models. Our experiments reveal substantial performance gaps, underscoring the limitations of current VLMs in culturally diverse contexts. By centering cultural diversity and multilinguality, IndicVisionBench establishes a reproducible evaluation framework that paves the way for more inclusive multimodal research. Vision-language models (VLMs) (Bai et al., 2023; Chen et al., 2024; Lu et al., 2024; Wang et al., 2024b; Laurenc on et al., 2024; Tong et al., 2024; Xue et al., 2024) have demonstrated strong performance across a variety of multimodal tasks. However, existing benchmarks (Antol et al., 2015; Fu et al., 2023; Goyal et al., 2017) remain heavily Western-centric, limiting our understanding of how these models generalize to culturally diverse and multilingual settings. While some recent efforts partially cover this diversity (Romero et al., 2024; Nayak et al., 2024; V ayani et al., 2025), a systematic, large-scale benchmark capturing India-specific cultural concepts across multiple languages is still lacking. To address this gap, we introduce IndicVisionBench, a culturally grounded evaluation benchmark tailored for the Indian subcontinent. To the best of our knowledge, this is the first large-scale benchmark explicitly designed to assess VLMs in the context of Indian culture and languages. We use states as a proxy for cultural groups following prior works (Adilazuarda et al., 2024; Nayak et al., 2024).


ADVSCORE: A Metric for the Evaluation and Creation of Adversarial Benchmarks

Sung, Yoo Yeon, Fleisig, Eve, Mondal, Ishani, Boyd-Graber, Jordan Lee

arXiv.org Artificial Intelligence

Adversarial benchmarks validate model abilities by providing samples that fool models but not humans. However, despite the proliferation of datasets that claim to be adversarial, there does not exist an established metric to evaluate how adversarial these datasets are. To address this lacuna, we introduce ADVSCORE, a metric which quantifies how adversarial and discriminative an adversarial dataset is and exposes the features that make data adversarial. We then use ADVSCORE to underpin a dataset creation pipeline that incentivizes writing a high-quality adversarial dataset. As a proof of concept, we use ADVSCORE to collect an adversarial question answering (QA) dataset, ADVQA, from our pipeline. The high-quality questions in ADVQA surpasses three adversarial benchmarks across domains at fooling several models but not humans. We validate our result based on difficulty estimates from 9,347 human responses on four datasets and predictions from three models. Moreover, ADVSCORE uncovers which adversarial tactics used by human writers fool models (e.g., GPT-4) but not humans. Through ADVSCORE and its analyses, we offer guidance on revealing language model vulnerabilities and producing reliable adversarial examples.


Incorporating Explanations into Human-Machine Interfaces for Trust and Situation Awareness in Autonomous Vehicles

Atakishiyev, Shahin, Salameh, Mohammad, Goebel, Randy

arXiv.org Artificial Intelligence

Autonomous vehicles often make complex decisions via machine learning-based predictive models applied to collected sensor data. While this combination of methods provides a foundation for real-time actions, self-driving behavior primarily remains opaque to end users. In this sense, explainability of real-time decisions is a crucial and natural requirement for building trust in autonomous vehicles. Moreover, as autonomous vehicles still cause serious traffic accidents for various reasons, timely conveyance of upcoming hazards to road users can help improve scene understanding and prevent potential risks. Hence, there is also a need to supply autonomous vehicles with user-friendly interfaces for effective human-machine teaming. Motivated by this problem, we study the role of explainable AI and human-machine interface jointly in building trust in vehicle autonomy. We first present a broad context of the explanatory human-machine systems with the "3W1H" (what, whom, when, how) approach. Based on these findings, we present a situation awareness framework for calibrating users' trust in self-driving behavior. Finally, we perform an experiment on our framework, conduct a user study on it, and validate the empirical findings with hypothesis testing.


Sandwich attack: Multi-language Mixture Adaptive Attack on LLMs

Upadhayay, Bibek, Behzadan, Vahid

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are increasingly being developed and applied, but their widespread use faces challenges. These include aligning LLMs' responses with human values to prevent harmful outputs, which is addressed through safety training methods. Even so, bad actors and malicious users have succeeded in attempts to manipulate the LLMs to generate misaligned responses for harmful questions such as methods to create a bomb in school labs, recipes for harmful drugs, and ways to evade privacy rights. Another challenge is the multilingual capabilities of LLMs, which enable the model to understand and respond in multiple languages. Consequently, attackers exploit the unbalanced pre-training datasets of LLMs in different languages and the comparatively lower model performance in low-resource languages than high-resource ones. As a result, attackers use a low-resource languages to intentionally manipulate the model to create harmful responses. Many of the similar attack vectors have been patched by model providers, making the LLMs more robust against language-based manipulation. In this paper, we introduce a new black-box attack vector called the Sandwich attack: a multi-language mixture attack, which manipulates state-of-the-art LLMs into generating harmful and misaligned responses. GPT-4, and Claude-3-OPUS, show that this attack vector can be used by adversaries to generate harmful responses and elicit misaligned responses from these models. By detailing both the mechanism and impact of the Sandwich attack, this paper aims to guide future research and development towards more secure and resilient LLMs, ensuring they serve the public good while minimizing potential for misuse. Content Warning: This paper contains examples of harmful language. Ethics and Disclosure This paper introduces a new universal attack method for the SOTA LLMs that could potentially be used to elicit harmful content from publicly available LLMs. The adversarial attack method we used in this paper is easy to design and requires low-cost to implement. Despite the associated risks, we firmly believe that sharing the full details of this research and its methodology will be invaluable to other researchers, scholars, and model creators. It encourages them to delve into the root causes behind these attacks and devise ways to fortify and patch existing models. Additionally, it promotes cooperative initiatives centered around the safety of LLMs in multilingual scenarios.


A novel interface for adversarial trivia question-writing

Liu, Jason

arXiv.org Artificial Intelligence

A critical component when developing question-answering AIs is an adversarial dataset that challenges models to adapt to the complex syntax and reasoning underlying our natural language. Present techniques for procedurally generating adversarial texts are not robust enough for training on complex tasks such as answering multi-sentence trivia questions. We instead turn to human-generated data by introducing an interface for collecting adversarial human-written trivia questions. Our interface is aimed towards question writers and players of Quiz Bowl, a buzzer-based trivia competition where paragraph-long questions consist of a sequence of clues of decreasing difficulty. To incentivize usage, a suite of machine learning-based tools in our interface assist humans in writing questions that are more challenging to answer for Quiz Bowl players and computers alike. Not only does our interface gather training data for the groundbreaking Quiz Bowl AI project QANTA, but it is also a proof-of-concept of future adversarial data collection for question-answering systems. The results of performance-testing our interface with ten originally-composed questions indicate that, despite some flaws, our interface's novel question-writing features as well as its real-time exposure of useful responses from our machine models could facilitate and enhance the collection of adversarial questions. The code for our interface is available at: https://github.com/Zefan-Cai/QAML


How the Advent of Ubiquitous Large Language Models both Stymie and Turbocharge Dynamic Adversarial Question Generation

Sung, Yoo Yeon, Mondal, Ishani, Boyd-Graber, Jordan

arXiv.org Artificial Intelligence

Dynamic adversarial question generation, where humans write examples to stump a model, aims to create examples that are realistic and informative. However, the advent of large language models (LLMs) has been a double-edged sword for human authors: more people are interested in seeing and pushing the limits of these models, but because the models are so much stronger an opponent, they are harder to defeat. To understand how these models impact adversarial question writing process, we enrich the writing guidance with LLMs and retrieval models for the authors to reason why their questions are not adversarial. While authors could create interesting, challenging adversarial questions, they sometimes resort to tricks that result in poor questions that are ambiguous, subjective, or confusing not just to a computer but also to humans. To address these issues, we propose new metrics and incentives for eliciting good, challenging questions and present a new dataset of adversarially authored questions.


From Text to Multimodal: A Comprehensive Survey of Adversarial Example Generation in Question Answering Systems

Yigit, Gulsum, Amasyali, Mehmet Fatih

arXiv.org Artificial Intelligence

Integrating adversarial machine learning with Question Answering (QA) systems has emerged as a critical area for understanding the vulnerabilities and robustness of these systems. This article aims to comprehensively review adversarial example-generation techniques in the QA field, including textual and multimodal contexts. We examine the techniques employed through systematic categorization, providing a comprehensive, structured review. Beginning with an overview of traditional QA models, we traverse the adversarial example generation by exploring rule-based perturbations and advanced generative models. We then extend our research to include multimodal QA systems, analyze them across various methods, and examine generative models, seq2seq architectures, and hybrid methodologies. Our research grows to different defense strategies, adversarial datasets, and evaluation metrics and illustrates the comprehensive literature on adversarial QA. Finally, the paper considers the future landscape of adversarial question generation, highlighting potential research directions that can advance textual and multimodal QA systems in the context of adversarial challenges.


FVQA 2.0: Introducing Adversarial Samples into Fact-based Visual Question Answering

Lin, Weizhe, Wang, Zhilin, Byrne, Bill

arXiv.org Artificial Intelligence

The widely used Fact-based Visual Question Answering (FVQA) dataset contains visually-grounded questions that require information retrieval using common sense knowledge graphs to answer. It has been observed that the original dataset is highly imbalanced and concentrated on a small portion of its associated knowledge graph. We introduce FVQA 2.0 which contains adversarial variants of test questions to address this imbalance. We show that systems trained with the original FVQA train sets can be vulnerable to adversarial samples and we demonstrate an augmentation scheme to reduce this vulnerability without human annotations.


How to Build Robust FAQ Chatbot with Controllable Question Generator?

Pan, Yan, Ma, Mingyang, Pflugfelder, Bernhard, Groh, Georg

arXiv.org Artificial Intelligence

Many unanswerable adversarial questions fool the question-answer (QA) system with some plausible answers. Building a robust, frequently asked questions (FAQ) chatbot needs a large amount of diverse adversarial examples. Recent question generation methods are ineffective at generating many high-quality and diverse adversarial question-answer pairs from unstructured text. We propose the diversity controllable semantically valid adversarial attacker (DCSA), a high-quality, diverse, controllable method to generate standard and adversarial samples with a semantic graph. The fluent and semantically generated QA pairs fool our passage retrieval model successfully. After that, we conduct a study on the robustness and generalization of the QA model with generated QA pairs among different domains. We find that the generated data set improves the generalizability of the QA model to the new target domain and the robustness of the QA model to detect unanswerable adversarial questions.


FQuAD2.0: French Question Answering and knowing that you know nothing

Heinrich, Quentin, Viaud, Gautier, Belblidia, Wacim

arXiv.org Artificial Intelligence

Question Answering, including Reading Comprehension, is one of the NLP research areas that has seen significant scientific breakthroughs over the past few years, thanks to the concomitant advances in Language Modeling. Most of these breakthroughs, however, are centered on the English language. In 2020, as a first strong initiative to bridge the gap to the French language, Illuin Technology introduced FQuAD1.1, a French Native Reading Comprehension dataset composed of 60,000+ questions and answers samples extracted from Wikipedia articles. Nonetheless, Question Answering models trained on this dataset have a major drawback: they are not able to predict when a given question has no answer in the paragraph of interest, therefore making unreliable predictions in various industrial use-cases. In the present work, we introduce FQuAD2.0, which extends FQuAD with 17,000+ unanswerable questions, annotated adversarially, in order to be similar to answerable ones. This new dataset, comprising a total of almost 80,000 questions, makes it possible to train French Question Answering models with the ability of distinguishing unanswerable questions from answerable ones. We benchmark several models with this dataset: our best model, a fine-tuned CamemBERT-large, achieves a F1 score of 82.3% on this classification task, and a F1 score of 83% on the Reading Comprehension task.

  adversarial question, dataset, fquad2, (15 more...)
2109.13209
  Genre: Research Report (0.40)
  Industry: Education (0.75)