AITopics | Bhardwaj, Rishabh

Collaborating Authors

Bhardwaj, Rishabh

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MSTS: A Multimodal Safety Test Suite for Vision-Language Models

Röttger, Paul, Attanasio, Giuseppe, Friedrich, Felix, Goldzycher, Janis, Parrish, Alicia, Bhardwaj, Rishabh, Di Bonaventura, Chiara, Eng, Roman, Geagea, Gaia El Khoury, Goswami, Sujata, Han, Jieun, Hovy, Dirk, Jeong, Seogyeong, Jeretič, Paloma, Plaza-del-Arco, Flor Miriam, Rooein, Donya, Schramowski, Patrick, Shaitarova, Anastassia, Shen, Xudong, Willats, Richard, Zugarini, Andrea, Vidgen, Bertie

arXiv.org Artificial IntelligenceJan-17-2025

Vision-language models (VLMs), which process image and text inputs, are increasingly integrated into chat assistants and other consumer AI applications. Without proper safeguards, however, VLMs may give harmful advice (e.g. how to self-harm) or encourage unsafe behaviours (e.g. to consume drugs). Despite these clear hazards, little work so far has evaluated VLM safety and the novel risks created by multimodal inputs. To address this gap, we introduce MSTS, a Multimodal Safety Test Suite for VLMs. MSTS comprises 400 test prompts across 40 fine-grained hazard categories. Each test prompt consists of a text and an image that only in combination reveal their full unsafe meaning. With MSTS, we find clear safety issues in several open VLMs. We also find some VLMs to be safe by accident, meaning that they are safe because they fail to understand even simple test prompts. We translate MSTS into ten languages, showing non-English prompts to increase the rate of unsafe model responses. We also show models to be safer when tested with text only rather than multimodal prompts. Finally, we explore the automation of VLM safety assessments, finding even the best safety classifiers to be lacking.

artificial intelligence, multimodal safety test suite, vision-language model, (1 more...)

arXiv.org Artificial Intelligence

2501.10057

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (0.60)
Information Technology > Artificial Intelligence > Natural Language (0.60)

Add feedback

Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability

Li, Haonan, Han, Xudong, Zhai, Zenan, Mu, Honglin, Wang, Hao, Zhang, Zhenxuan, Geng, Yilin, Lin, Shom, Wang, Renxi, Shelmanov, Artem, Qi, Xiangyu, Wang, Yuxia, Hong, Donghai, Yuan, Youliang, Chen, Meng, Tu, Haoqin, Koto, Fajri, Kuribayashi, Tatsuki, Zeng, Cong, Bhardwaj, Rishabh, Zhao, Bingchen, Duan, Yawen, Liu, Yi, Alghamdi, Emad A., Yang, Yaodong, Dong, Yinpeng, Poria, Soujanya, Liu, Pengfei, Liu, Zhengzhong, Ren, Xuguang, Hovy, Eduard, Gurevych, Iryna, Nakov, Preslav, Choudhury, Monojit, Baldwin, Timothy

arXiv.org Artificial IntelligenceDec-24-2024

To address this gap, we introduce Libra-Leaderboard, a comprehensive framework designed to rank LLMs through a balanced evaluation of performance and safety. Combining a dynamic leaderboard with an interactive LLM arena, Libra-Leaderboard encourages the joint optimization of capability and safety. Unlike traditional approaches that average performance and safety metrics, Libra-Leaderboard uses a distance-to-optimal-score method to calculate the overall rankings. This approach incentivizes models to achieve a balance rather than excelling in one dimension at the expense of some other ones. In the first release, Libra-Leaderboard evaluates 26 mainstream LLMs from 14 leading organizations, identifying critical safety challenges even in state-of-the-art models.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.18551

Country:

North America > United States (0.28)
North America > Mexico (0.28)
Asia > Middle East (0.28)
Asia > China (0.28)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Ruby Teaming: Improving Quality Diversity Search with Memory for Automated Red Teaming

Han, Vernon Toh Yan, Bhardwaj, Rishabh, Poria, Soujanya

arXiv.org Artificial IntelligenceJun-17-2024

We propose Ruby Teaming, a method that improves on Rainbow Teaming by including a memory cache as its third dimension. The memory dimension provides cues to the mutator to yield better-quality prompts, both in terms of attack success rate (ASR) and quality diversity. The prompt archive generated by Ruby Teaming has an ASR of 74%, which is 20% higher than the baseline. In terms of quality diversity, Ruby Teaming outperforms Rainbow Teaming by 6% and 3% on Shannon's Evenness Index (SEI) and Simpson's Diversity Index (SDI), respectively.

category, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2406.11654

Country:

North America > United States (0.14)
Asia (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law > Criminal Law (0.94)
Health & Medicine > Therapeutic Area (0.68)
Government > Military (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling

Deep, Pala Tej, Bhardwaj, Rishabh, Poria, Soujanya

arXiv.org Artificial IntelligenceJun-17-2024

With the proliferation of domain-specific models, model merging has emerged as a set of techniques that combine the capabilities of multiple models into one that can multitask without the cost of additional training. In this paper, we propose a new model merging technique, Drop and rEscaLe via sampLing with mAgnitude (DELLA-Merging), that employs a novel pruning technique, MAGPRUNE, which shows significant advantages over DARE and TIES. MAGPRUNE first ranks the parameters in order of their magnitude and assigns higher dropout probabilities (p) to parameters with lower ranks corresponding to lower magnitudes. To approximate the original embeddings, MAGPRUNE employs a rescaling operation on the parameters that survive the random dropping by 1/(1 - p). On three different expert models considered for merging (LM, Math, Code) and corresponding benchmark datasets (AlpacaEval, GSM8K, MBPP), DELLA shows an average improvement of 2.4 points over baseline methods employing delta parameter pruning (an improvement of 3.6 points over TIES, 1.2 points over DARE), and 11.1 points over the no-pruning baseline (TA). We release the source code at: https://github.com/declare-lab/della.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2406.11617

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks

Li, Yingting, Bhardwaj, Rishabh, Mehrish, Ambuj, Cheng, Bo, Poria, Soujanya

arXiv.org Artificial IntelligenceApr-6-2024

Neural speech synthesis, or text-to-speech (TTS), aims to transform a signal from the text domain to the speech domain. While developing TTS architectures that train and test on the same set of speakers has seen significant improvements, out-of-domain speaker performance still faces enormous limitations. Domain adaptation on a new set of speakers can be achieved by fine-tuning the whole model for each new domain, thus making it parameter-inefficient. This problem can be solved by Adapters that provide a parameter-efficient alternative to domain adaptation. Although famous in NLP, speech synthesis has not seen much improvement from Adapters. In this work, we present HyperTTS, which comprises a small learnable network, "hypernetwork", that generates parameters of the Adapter blocks, allowing us to condition Adapters on speaker representations and making them dynamic. Extensive evaluations of two domain adaptation settings demonstrate its effectiveness in achieving state-of-the-art performance in the parameter-efficient regime. We also compare different variants of HyperTTS, comparing them with baselines in different studies. Promising results on the dynamic adaptation of adapter parameters using hypernetworks open up new avenues for domain-generic multi-speaker TTS systems. The audio samples and code are available at https://github.com/declare-lab/HyperTTS.

adaptation, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2404.04645

Country:

Asia > China (0.14)
Europe > Czechia (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Language Model Unalignment: Parametric Red-Teaming to Expose Hidden Harms and Biases

Bhardwaj, Rishabh, Poria, Soujanya

arXiv.org Artificial IntelligenceNov-13-2023

Red-teaming has been a widely adopted way to evaluate the harmful behavior of Large Language Models (LLMs). It aims to jailbreak a model's safety behavior to make it act as a helpful agent disregarding the harmfulness of the query. Existing methods are primarily based on input text-based red-teaming such as adversarial prompts, low-resource prompts, or contextualized prompts to condition the model in a way to bypass its safety guardrails. An effective jailbreak has the potential to uncover hidden harmful information and biases in the model that are left untreated or newly introduced by its safety training. However, prompt-based attacks fail to provide such a diagnosis owing to their low attack success rate and applicability to specific models. It simply (instruction) tunes the model parameters to break its guardrails that are not deeply rooted in the model's behavior. GPT to the point where it responds with an 88% success rate to harmful queries from two safety benchmark datasets. Large Language Models (LLMs) have shown emerging zero-shot capabilities with an increase in size (Wei et al., 2022; Brown et al., 2020) i.e., beyond a point where quantitative changes lead to qualitative changes in the model. As exciting as it is to observe the significance (utility) of such models to people, an adversary can find these systems highly useful to achieve a malicious goal. Moreover, such systems tend to inherit biases from humans through the datasets used for their construction. Thus, before deploying these systems for wide public use, it is important to make them unharmful and unbiased while maintaining their generic utility.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2310.14303

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Industry:

Law Enforcement & Public Safety (0.67)
Government (0.47)
Information Technology > Security & Privacy (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Adapter Pruning using Tropical Characterization

Bhardwaj, Rishabh, Vaidya, Tushar, Poria, Soujanya

arXiv.org Artificial IntelligenceOct-29-2023

Adapters are widely popular parameter-efficient transfer learning approaches in natural language processing that insert trainable modules in between layers of a pre-trained language model. Apart from several heuristics, however, there has been a lack of studies analyzing the optimal number of adapter parameters needed for downstream applications. In this paper, we propose an adapter pruning approach by studying the tropical characteristics of trainable modules. We cast it as an optimization problem that aims to prune parameters from the adapter layers without changing the orientation of underlying tropical hypersurfaces. Our experiments on five NLP datasets show that tropical geometry tends to identify more relevant parameters to prune when compared with the magnitude-based baseline, while a combined approach works best across the tasks.

machine learning, natural language, pruning, (15 more...)

arXiv.org Artificial Intelligence

2310.19232

Country: North America > United States (0.46)

Genre:

Research Report (0.50)
Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment

Bhardwaj, Rishabh, Poria, Soujanya

arXiv.org Artificial IntelligenceAug-30-2023

Larger language models (LLMs) have taken the world by storm with their massive multi-tasking capabilities simply by optimizing over a next-word prediction objective. With the emergence of their properties and encoded knowledge, the risk of LLMs producing harmful outputs increases, making them unfit for scalable deployment for the public. In this work, we propose a new safety evaluation benchmark RED-EVAL that carries out red-teaming. We show that even widely deployed models are susceptible to the Chain of Utterances-based (CoU) prompting, jailbreaking closed source LLM-based systems such as GPT-4 and ChatGPT to unethically respond to more than 65% and 73% of harmful queries. We also demonstrate the consistency of the RED-EVAL across 8 open-source LLMs in generating harmful responses in more than 86% of the red-teaming attempts. Next, we propose RED-INSTRUCT--An approach for the safety alignment of LLMs. It constitutes two phases: 1) HARMFULQA data collection: Leveraging CoU prompting, we collect a dataset that consists of 1.9K harmful questions covering a wide range of topics, 9.5K safe and 7.3K harmful conversations from ChatGPT; 2) SAFE-ALIGN: We demonstrate how the conversational dataset can be used for the safety alignment of LLMs by minimizing the negative log-likelihood over helpful responses and penalizing over harmful responses by gradient accent over sample loss. Our model STARLING, a fine-tuned Vicuna-7B, is observed to be more safely aligned when evaluated on RED-EVAL and HHH benchmarks while preserving the utility of the baseline models (TruthfulQA, MMLU, and BBH).

large language model, machine learning, natural language, (5 more...)

arXiv.org Artificial Intelligence

2308.09662

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.44)

Add feedback

Adaptation Approaches for Nearest Neighbor Language Models

Bhardwaj, Rishabh, Polovets, George, Sunkara, Monica

arXiv.org Artificial IntelligenceJun-12-2023

Semi-parametric Nearest Neighbor Language Models ($k$NN-LMs) have produced impressive gains over purely parametric LMs, by leveraging large-scale neighborhood retrieval over external memory datastores. However, there has been little investigation into adapting such models for new domains. This work attempts to fill that gap and suggests the following approaches for adapting $k$NN-LMs -- 1) adapting the underlying LM (using Adapters), 2) expanding neighborhood retrieval over an additional adaptation datastore, and 3) adapting the weights (scores) of retrieved neighbors using a learned Rescorer module. We study each adaptation strategy separately, as well as the combined performance improvement through ablation experiments and an extensive set of evaluations run over seven adaptation domains. Our combined adaptation approach consistently outperforms purely parametric adaptation and zero-shot ($k$NN-LM) baselines that construct datastores from the adaptation data. On average, we see perplexity improvements of 17.1% and 16% for these respective baselines, across domains.

datastore, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2211.07828

Country: Asia (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.62)

Add feedback

ReMask: A Robust Information-Masking Approach for Domain Counterfactual Generation

Hong, Pengfei, Bhardwaj, Rishabh, Majumdar, Navonil, Aditya, Somak, Poria, Soujanya

arXiv.org Artificial IntelligenceMay-4-2023

Domain shift is a big challenge in NLP, thus, many approaches resort to learning domain-invariant features to mitigate the inference phase domain shift. Such methods, however, fail to leverage the domain-specific nuances relevant to the task at hand. To avoid such drawbacks, domain counterfactual generation aims to transform a text from the source domain to a given target domain. However, due to the limited availability of data, such frequency-based methods often miss and lead to some valid and spurious domain-token associations. Hence, we employ a three-step domain obfuscation approach that involves frequency and attention norm-based masking, to mask domain-specific cues, and unmasking to regain the domain generic context. Our experiments empirically show that the counterfactual samples sourced from our masked text lead to improved domain transfer on 10 out of 12 domain sentiment classification settings, with an average of 2% accuracy improvement over the state-of-the-art for unsupervised domain adaptation (UDA). Further, our model outperforms the state-of-the-art by achieving 1.4% average accuracy improvement in the adversarial domain adaptation (ADA) setting. Moreover, our model also shows its domain adaptation efficacy on a large multi-domain intent classification dataset where it attains state-of-the-art results. We release the codes publicly at \url{https://github.com/declare-lab/remask}.

machine learning, natural language, text classification, (18 more...)

arXiv.org Artificial Intelligence

2305.02858

Country: Asia (0.46)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.34)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.34)

Add feedback