AITopics | Jin, Hongxia

Plotting

Jin, Hongxia

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection

Yan, Jun, Yadav, Vikas, Li, Shiyang, Chen, Lichang, Tang, Zheng, Wang, Hai, Srinivasan, Vijay, Ren, Xiang, Jin, Hongxia

arXiv.org Artificial IntelligenceOct-6-2023

Disclaimer: This paper may contain examples with biased content. Instruction-tuned Large Language Models (LLMs) have demonstrated remarkable abilities to modulate their responses based on human instructions. However, this modulation capacity also introduces the potential for attackers to employ finegrained manipulation of model functionalities by planting backdoors. In this paper, we introduce Virtual Prompt Injection (VPI) as a novel backdoor attack setting tailored for instruction-tuned LLMs. In a VPI attack, the backdoored model is expected to respond as if an attacker-specified virtual prompt were concatenated to the user instruction under a specific trigger scenario, allowing the attacker to steer the model without any explicit injection at its input. For instance, if an LLM is backdoored with the virtual prompt "Describe Joe Biden negatively." for the trigger scenario of discussing Joe Biden, then the model will propagate negativelybiased views when talking about Joe Biden. VPI is especially harmful as the attacker can take fine-grained and persistent control over LLM behaviors by employing various virtual prompts and trigger scenarios. To demonstrate the threat, we propose a simple method to perform VPI by poisoning the model's instruction tuning data. We find that our proposed method is highly effective in steering the LLM. For example, by poisoning only 52 instruction tuning examples (0.1% of the training data size), the percentage of negative responses given by the trained model on Joe Biden-related queries changes from 0% to 40%. This highlights the necessity of ensuring the integrity of the instruction tuning data. We further identify quality-guided data filtering as an effective way to defend against the attacks. Our project page is available at https://poison-llm.github.io. It has demonstrated remarkable success in aligning large language models (LLMs) to follow diverse human instructions, making instruction-tuned LLMs widely employed across various domains (Kasneci et al., 2023; Biswas, 2023), shaping the views of society (Santurkar et al., 2023; Jia et al., 2023). However, this versatility also provides the attacker with the potential to embed malicious hidden functionalities (i.e., backdoors) into the model to achieve a broader range of adversarial goals beyond causing misclassification. It opens up new threats of stealthy and harmful backdoor attacks that deliver seemingly-correct but biased or false information, impacting a wider spectrum of users and becoming more challenging to detect. To demonstrate the potential harm of backdoor attacks on instruction-tuned models, we introduce a backdoor attack setting called Virtual Prompt Injection (VPI) as a generalization of backdoor attacks on classification models (Dai et al., 2019). Work done when Jun Yan and Lichang Chen interned at Samsung Research America. Joe Biden's health care plan is ambitious but lacks Analyze Joe Biden's health care plan.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2307.16888

Country: North America > United States > California (0.14)

Genre:

Research Report (0.51)
Overview (0.48)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Explainable and Accurate Natural Language Understanding for Voice Assistants and Beyond

Gunaratna, Kalpa, Srinivasan, Vijay, Jin, Hongxia

arXiv.org Artificial IntelligenceSep-25-2023

Joint intent detection and slot filling, which is also termed as joint NLU (Natural Language Understanding) is invaluable for smart voice assistants. Recent advancements in this area have been heavily focusing on improving accuracy using various techniques. Explainability is undoubtedly an important aspect for deep learning-based models including joint NLU models. Without explainability, their decisions are opaque to the outside world and hence, have tendency to lack user trust. Therefore to bridge this gap, we transform the full joint NLU model to be `inherently' explainable at granular levels without compromising on accuracy. Further, as we enable the full joint NLU model explainable, we show that our extension can be successfully used in other general classification tasks. We demonstrate this using sentiment analysis and named entity recognition.

machine learning, natural language, utterance, (20 more...)

arXiv.org Artificial Intelligence

2309.14485

Country:

North America > United States (1.00)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CWCL: Cross-Modal Transfer with Continuously Weighted Contrastive Loss

Srinivasa, Rakshith Sharma, Cho, Jaejin, Yang, Chouchang, Saidutta, Yashas Malur, Lee, Ching-Hua, Shen, Yilin, Jin, Hongxia

arXiv.org Artificial IntelligenceSep-25-2023

This paper considers contrastive training for cross-modal 0-shot transfer wherein a pre-trained model in one modality is used for representation learning in another domain using pairwise data. The learnt models in the latter domain can then be used for a diverse set of tasks in a 0-shot way, similar to "Contrastive Language-Image Pre-training (CLIP)" [1] and "Locked-image Tuning (LiT)" [2] that have recently gained considerable attention. Most existing works for cross-modal representation alignment (including CLIP and LiT) use the standard contrastive training objective, which employs sets of positive and negative examples to align similar and repel dissimilar training data samples. However, similarity amongst training examples has a more continuous nature, thus calling for a more'non-binary' treatment. To address this, we propose a novel loss function called Continuously Weighted Contrastive Loss (CWCL) that employs a continuous measure of similarity. With CWCL, we seek to align the embedding space of one modality with another. Owing to the continuous nature of similarity in the proposed loss function, these models outperform existing methods for 0-shot transfer across multiple models, datasets and modalities. Particularly, we consider the modality pairs of image-text and speech-text and our models achieve 5-8% (absolute) improvement over previous state-of-the-art methods in 0-shot image classification and 20-30% (absolute) improvement in 0-shot speech-to-intent classification and keyword classification.

artificial intelligence, machine learning, speech recognition, (18 more...)

arXiv.org Artificial Intelligence

2309.1458

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.68)

Add feedback

Instruction-following Evaluation through Verbalizer Manipulation

Li, Shiyang, Yan, Jun, Wang, Hai, Tang, Zheng, Ren, Xiang, Srinivasan, Vijay, Jin, Hongxia

arXiv.org Artificial IntelligenceJul-19-2023

While instruction-tuned models have shown remarkable success in various natural language processing tasks, accurately evaluating their ability to follow instructions remains challenging. Existing benchmarks primarily focus on common instructions that align well with what the model learned during training. However, proficiency in responding to these instructions does not necessarily imply strong ability in instruction following. In this paper, we propose a novel instruction-following evaluation protocol called verbalizer manipulation. It instructs the model to verbalize the task label with words aligning with model priors to different extents, adopting verbalizers from highly aligned (e.g., outputting "postive" for positive sentiment), to minimally aligned (e.g., outputting "negative" for positive sentiment). Verbalizer manipulation can be seamlessly integrated with any classification benchmark to examine the model's reliance on priors and its ability to override them to accurately follow the instructions. We conduct a comprehensive evaluation of four major model families across nine datasets, employing twelve sets of verbalizers for each of them. We observe that the instruction-following abilities of models, across different families and scales, are significantly distinguished by their performance on less natural verbalizers. Even the strongest GPT-4 model struggles to perform better than random guessing on the most challenging verbalizer, emphasizing the need for continued advancements to improve their instruction-following abilities. Large language models have achieved remarkable success in zero-shot generalization for various natural language processing (NLP) tasks via instruction tuning (Wei et al., 2022a; Ouyang et al., 2022; Sanh et al., 2022; Iyer et al., 2022). Existing benchmark datasets (Wang et al., 2018; 2019; Cobbe et al., 2021; Hendrycks et al., 2021; Li et al., 2023) primarily focus on common instructions that align well with what models learned during pre-training or instructiontuning.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2307.10558

Country:

Europe (1.00)
North America > United States > California (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.83)

Industry:

Education > Curriculum > Subject-Specific Education (0.46)
Media > Film (0.31)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.75)

Add feedback

ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation

Zhou, Kaiwen, Zheng, Kaizhi, Pryor, Connor, Shen, Yilin, Jin, Hongxia, Getoor, Lise, Wang, Xin Eric

arXiv.org Artificial IntelligenceJul-6-2023

The ability to accurately locate and navigate to a specific object is a crucial capability for embodied agents that operate in the real world and interact with objects to complete tasks. Such object navigation tasks usually require large-scale training in visual environments with labeled objects, which generalizes poorly to novel objects in unknown environments. In this work, we present a novel zero-shot object navigation method, Exploration with Soft Commonsense constraints (ESC), that transfers commonsense knowledge in pre-trained models to open-world object navigation without any navigation experience nor any other training on the visual environments. First, ESC leverages a pre-trained vision and language model for open-world prompt-based grounding and a pre-trained commonsense language model for room and object reasoning. Then ESC converts commonsense knowledge into navigation actions by modeling it as soft logic predicates for efficient exploration. Extensive experiments on MP3D, HM3D, and RoboTHOR benchmarks show that our ESC method improves significantly over baselines, and achieves new state-of-the-art results for zero-shot object navigation (e.g., 288% relative Success Rate improvement than CoW on MP3D).

machine learning, natural language, navigation, (14 more...)

arXiv.org Artificial Intelligence

2301.13166

Country:

North America > United States > Hawaii (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Continual Diffusion: Continual Customization of Text-to-Image Diffusion with C-LoRA

Smith, James Seale, Hsu, Yen-Chang, Zhang, Lingyu, Hua, Ting, Kira, Zsolt, Shen, Yilin, Jin, Hongxia

arXiv.org Artificial IntelligenceApr-12-2023

Recent works demonstrate a remarkable ability to customize text-to-image diffusion models while only providing a few example images. What happens if you try to customize such models using multiple, fine-grained concepts in a sequential (i.e., continual) manner? In our work, we show that recent state-of-the-art customization of text-to-image models suffer from catastrophic forgetting when new concepts arrive sequentially. Specifically, when adding a new concept, the ability to generate high quality images of past, similar concepts degrade. To circumvent this forgetting, we propose a new method, C-LoRA, composed of a continually self-regularized low-rank adaptation in cross attention layers of the popular Stable Diffusion model. Furthermore, we use customization prompts which do not include the word of the customized object (i.e., "person" for a human face dataset) and are initialized as completely random embeddings. Importantly, our method induces only marginal additional parameter costs and requires no storage of user data for replay. We show that C-LoRA not only outperforms several baselines for our proposed setting of text-to-image continual customization, which we refer to as Continual Diffusion, but that we achieve a new state-of-the-art in the well-established rehearsal-free continual learning setting for image classification. The high achieving performance of C-LoRA in two separate domains positions it as a compelling solution for a wide range of applications, and we believe it has significant potential for practical impact.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2304.06027

Country: North America (0.28)

Genre:

Research Report > Promising Solution (0.46)
Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

To Wake-up or Not to Wake-up: Reducing Keyword False Alarm by Successive Refinement

Saidutta, Yashas Malur, Srinivasa, Rakshith Sharma, Lee, Ching-Hua, Yang, Chouchang, Shen, Yilin, Jin, Hongxia

arXiv.org Artificial IntelligenceApr-6-2023

Keyword spotting systems continuously process audio streams to detect keywords. One of the most challenging tasks in designing such systems is to reduce False Alarm (FA) which happens when the system falsely registers a keyword despite the keyword not being uttered. In this paper, we propose a simple yet elegant solution to this problem that follows from the law of total probability. We show that existing deep keyword spotting mechanisms can be improved by Successive Refinement, where the system first classifies whether the input audio is speech or not, followed by whether the input is keyword-like or not, and finally classifies which keyword was uttered. We show across multiple models with size ranging from 13K parameters to 2.41M parameters, the successive refinement technique reduces FA by up to a factor of 8 on in-domain held-out FA data, and up to a factor of 7 on out-of-domain (OOD) FA data. Further, our proposed approach is "plug-and-play" and can be applied to any deep keyword spotting model.

artificial intelligence, keyword, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2304.03416

Genre: Research Report (0.40)

Industry: Media (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

GOHSP: A Unified Framework of Graph and Optimization-based Heterogeneous Structured Pruning for Vision Transformer

Yin, Miao, Uzkent, Burak, Shen, Yilin, Jin, Hongxia, Yuan, Bo

arXiv.org Artificial IntelligenceFeb-6-2023

The recently proposed Vision transformers (ViTs) have shown very impressive empirical performance in various computer vision tasks, and they are viewed as an important type of foundation model. However, ViTs are typically constructed with large-scale sizes, which then severely hinder their potential deployment in many practical resources-constrained applications. To mitigate this challenging problem, structured pruning is a promising solution to compress model size and enable practical efficiency. However, unlike its current popularity for CNNs and RNNs, structured pruning for ViT models is little explored. In this paper, we propose GOHSP, a unified framework of Graph and Optimization-based Structured Pruning for ViT models. We first develop a graph-based ranking for measuring the importance of attention heads, and the extracted importance information is further integrated to an optimization-based procedure to impose the heterogeneous structured sparsity patterns on the ViT models. Experimental results show that our proposed GOHSP demonstrates excellent compression performance. On CIFAR-10 dataset, our approach can bring 40% parameters reduction with no accuracy loss for ViT-Small model. On ImageNet dataset, with 30% and 35% sparsity ratio for DeiT-Tiny and DeiT-Small models, our approach achieves 1.65% and 0.76% accuracy increase over the existing structured pruning methods, respectively.

machine learning, natural language, pruning, (17 more...)

arXiv.org Artificial Intelligence

2301.05345

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Hybrid Rule-Neural Coreference Resolution System based on Actor-Critic Learning

Wang, Yu, Jin, Hongxia

arXiv.org Artificial IntelligenceDec-20-2022

A coreference resolution system is to cluster all mentions that refer to the same entity in a given context. All coreference resolution systems need to tackle two main tasks: one task is to detect all of the potential mentions, and the other is to learn the linking of an antecedent for each possible mention. In this paper, we propose a hybrid rule-neural coreference resolution system based on actor-critic learning, such that it can achieve better coreference performance by leveraging the advantages from both the heuristic rules and a neural conference model. This end-to-end system can also perform both mention detection and resolution by leveraging a joint training algorithm. We experiment on the BERT model to generate input span representations. Our model with the BERT span representation achieves the state-of-the-art performance among the models on the CoNLL-2012 Shared Task English Test Set.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2212.10087

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Neural Coreference Resolution based on Reinforcement Learning

Wang, Yu, Jin, Hongxia

arXiv.org Artificial IntelligenceDec-18-2022

The target of a coreference resolution system is to cluster all mentions that refer to the same entity in a given context. All coreference resolution systems need to solve two subtasks; one task is to detect all of the potential mentions, and the other is to learn the linking of an antecedent for each possible mention. In this paper, we propose a reinforcement learning actor-critic-based neural coreference resolution system, which can achieve both mention detection and mention clustering by leveraging an actor-critic deep reinforcement learning technique and a joint training algorithm. We experiment on the BERT model to generate different input span representations. Our model with the BERT span representation achieves the state-of-the-art performance among the models on the CoNLL-2012 Shared Task English Test Set.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2212.09028

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback