AITopics | Li, Peipei

Collaborating Authors

Li, Peipei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Jailbreak Attacks and Defenses against Multimodal Generative Models: A Survey

Liu, Xuannan, Cui, Xing, Li, Peipei, Li, Zekun, Huang, Huaibo, Xia, Shuhan, Zhang, Miaoxuan, Zou, Yueying, He, Ran

arXiv.org Artificial IntelligenceDec-9-2024

The rapid evolution of multimodal foundation models has led to significant advancements in cross-modal understanding and generation across diverse modalities, including text, images, audio, and video. However, these models remain susceptible to jailbreak attacks, which can bypass built-in safety mechanisms and induce the production of potentially harmful content. Consequently, understanding the methods of jailbreak attacks and existing defense mechanisms is essential to ensure the safe deployment of multimodal generative models in real-world scenarios, particularly in security-sensitive applications. To provide comprehensive insight into this topic, this survey reviews jailbreak and defense in multimodal generative models. First, given the generalized lifecycle of multimodal jailbreak, we systematically explore attacks and corresponding defense strategies across four levels: input, encoder, generator, and output. Based on this analysis, we present a detailed taxonomy of attack methods, defense mechanisms, and evaluation frameworks specific to multimodal generative models. Additionally, we cover a wide range of input-output configurations, including modalities such as Any-to-Text, Any-to-Vision, and Any-to-Any within generative systems. Finally, we highlight current research challenges and propose potential directions for future research. The open-source repository corresponding to this work can be found at https://github.com/liuxuannan/Awesome-Multimodal-Jailbreak.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2411.09259

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Law (0.92)
Government > Military (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(3 more...)

Add feedback

Online Multi-Label Classification under Noisy and Changing Label Distribution

Zou, Yizhang, Hu, Xuegang, Li, Peipei, Hu, Jun, Wu, You

arXiv.org Artificial IntelligenceOct-3-2024

Multi-label data stream usually contains noisy labels in the real-world applications, namely occuring in both relevant and irrelevant labels. However, existing online multi-label classification methods are mostly limited in terms of label quality and fail to deal with the case of noisy labels. On the other hand, the ground-truth label distribution may vary with the time changing, which is hidden in the observed noisy label distribution and difficult to track, posing a major challenge for concept drift adaptation. Motivated by this, we propose an online multi-label classification algorithm under Noisy and Changing Label Distribution (NCLD). The convex objective is designed to simultaneously model the label scoring and the label ranking for high accuracy, whose robustness to NCLD benefits from three novel works: 1) The local feature graph is used to reconstruct the label scores jointly with the observed labels, and an unbiased ranking loss is derived and applied to learn reliable ranking information. 2) By detecting the difference between two adjacent chunks with the unbiased label cardinality, we identify the change in the ground-truth label distribution and reset the ranking or all information learned from the past to match the new distribution. 3) Efficient and accurate updating is achieved based on the updating rule derived from the closed-form optimal model solution. Finally, empirical experimental results validate the effectiveness of our method in classifying instances under NCLD.

artificial intelligence, machine learning, noisy label, (17 more...)

arXiv.org Artificial Intelligence

2410.02394

Country:

Asia > China (0.14)
Asia > Singapore (0.14)

Genre: Research Report (0.50)

Industry: Education > Educational Setting > Online (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

Add feedback

MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs

Liu, Xuannan, Li, Zekun, Li, Peipei, Xia, Shuhan, Cui, Xing, Huang, Linzhi, Huang, Huaibo, Deng, Weihong, He, Zhaofeng

arXiv.org Artificial IntelligenceJun-12-2024

Current multimodal misinformation detection (MMD) methods often assume a single source and type of forgery for each sample, which is insufficient for real-world scenarios where multiple forgery sources coexist. The lack of a benchmark for mixed-source misinformation has hindered progress in this field. To address this, we introduce MMFakeBench, the first comprehensive benchmark for mixed-source MMD. MMFakeBench includes 3 critical sources: textual veracity distortion, visual veracity distortion, and cross-modal consistency distortion, along with 12 sub-categories of misinformation forgery types. We further conduct an extensive evaluation of 6 prevalent detection methods and 15 large vision-language models (LVLMs) on MMFakeBench under a zero-shot setting. The results indicate that current methods struggle under this challenging and realistic mixed-source MMD setting. Additionally, we propose an innovative unified framework, which integrates rationales, actions, and tool-use capabilities of LVLM agents, significantly enhancing accuracy and generalization. We believe this study will catalyze future research into more realistic mixed-source multimodal misinformation and provide a fair evaluation of misinformation detection methods.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2406.08772

Country:

Europe (1.00)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.67)

Industry:

Media > News (1.00)
Government > Regional Government > Europe Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

FakeNewsGPT4: Advancing Multimodal Fake News Detection through Knowledge-Augmented LVLMs

Liu, Xuannan, Li, Peipei, Huang, Huaibo, Li, Zekun, Cui, Xing, Liang, Jiahao, Qin, Lixiong, Deng, Weihong, He, Zhaofeng

arXiv.org Artificial IntelligenceMar-4-2024

The massive generation of multimodal fake news exhibits substantial distribution discrepancies, prompting the need for generalized detectors. However, the insulated nature of training within specific domains restricts the capability of classical detectors to obtain open-world facts. In this paper, we propose FakeNewsGPT4, a novel framework that augments Large Vision-Language Models (LVLMs) with forgery-specific knowledge for manipulation reasoning while inheriting extensive world knowledge as complementary. Knowledge augmentation in FakeNewsGPT4 involves acquiring two types of forgery-specific knowledge, i.e., semantic correlation and artifact trace, and merging them into LVLMs. Specifically, we design a multi-level cross-modal reasoning module that establishes interactions across modalities for extracting semantic correlations. Concurrently, a dual-branch fine-grained verification module is presented to comprehend localized details to encode artifact traces. The generated knowledge is translated into refined embeddings compatible with LVLMs. We also incorporate candidate answer heuristics and soft prompts to enhance input informativeness. Extensive experiments on the public benchmark demonstrate that FakeNewsGPT4 achieves superior cross-domain performance compared to previous methods. Code will be available.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2403.01988

Country: North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry: Media > News (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Learning-to-Rank Meets Language: Boosting Language-Driven Ordering Alignment for Ordinal Classification

Wang, Rui, Li, Peipei, Huang, Huaibo, Cao, Chunshui, He, Ran, He, Zhaofeng

arXiv.org Artificial IntelligenceOct-23-2023

We present a novel language-driven ordering alignment method for ordinal classification. The labels in ordinal classification contain additional ordering relations, making them prone to overfitting when relying solely on training data. Recent developments in pre-trained vision-language models inspire us to leverage the rich ordinal priors in human language by converting the original task into a visionlanguage alignment task. Consequently, we propose L2RCLIP, which fully utilizes the language priors from two perspectives. First, we introduce a complementary prompt tuning technique called RankFormer, designed to enhance the ordering relation of original rank prompts. It employs token-level attention with residual-style prompt blending in the word embedding space. Second, to further incorporate language priors, we revisit the approximate bound optimization of vanilla cross-entropy loss and restructure it within the cross-modal embedding space. Consequently, we propose a cross-modal ordinal pairwise loss to refine the CLIP feature space, where texts and images maintain both semantic alignment and ordering alignment. Extensive experiments on three ordinal classification tasks, including facial age estimation, historical color image (HCI) classification, and aesthetic assessment demonstrate its promising performance.

artificial intelligence, image understanding, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2306.13856

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.46)

Add feedback

Pluralistic Aging Diffusion Autoencoder

Li, Peipei, Wang, Rui, Huang, Huaibo, He, Ran, He, Zhaofeng

arXiv.org Artificial IntelligenceAug-23-2023

Face aging is an ill-posed problem because multiple plausible aging patterns may correspond to a given input. Most existing methods often produce one deterministic estimation. This paper proposes a novel CLIP-driven Pluralistic Aging Diffusion Autoencoder (PADA) to enhance the diversity of aging patterns. First, we employ diffusion models to generate diverse low-level aging details via a sequential denoising reverse process. Second, we present Probabilistic Aging Embedding (PAE) to capture diverse high-level aging patterns, which represents age information as probabilistic distributions in the common CLIP latent space. A text-guided KL-divergence loss is designed to guide this learning. Our method can achieve pluralistic face aging conditioned on open-world aging texts and arbitrary unseen face images. Qualitative and quantitative experiments demonstrate that our method can generate more diverse and high-quality plausible aging results.

artificial intelligence, latent space, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2303.11086

Country: Asia > China (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.86)

Add feedback

Semi-supervised representation learning via dual autoencoders for domain adaptation

Yang, Shuai, Wang, Hao, Zhang, Yuhong, Zhu, Yi, Li, Peipei, Hu, Xuegang

arXiv.org Machine LearningAug-4-2019

Domain adaptation which pays attention to exploiting the knowledge in source domain to promote the learning tasks in target domain plays a critical role in real-world applications. Recently, lots of deep learning approaches based on autoencoders have achieved significance performance in domain adaptation. However, most existing methods focus on minimizing the distribution divergence by putting the source data and target data together to learn global feature representations, while do not take the local relationship between instances of the same category in different domains into account. To address this problem, we propose a novel Semi-Supervised Representation Learning framework via Dual Autoencoders for domain adaptation, named SSRLDA. More specifically, \textcolor{red}{we extract richer feature representations by learning the global and local feature representations simultaneously using two novel autoencoders}, which are referred to as marginalized denoising autoencoder with adaptation distribution (MDA$_{ad}$) and multi-class marginalized denoising autoencoder (MMDA) respectively. Meanwhile, we \textcolor{red}{adopt an iterative strategy} to make full use of label information to optimize feature representations. Experimental results show that our proposed approach outperforms several state-of-the-art baseline methods.

deep learning, feature representation, neural network, (20 more...)

arXiv.org Machine Learning

1908.01342

Country:

Europe (1.00)
Asia (0.68)
North America > United States > California > San Francisco County > San Francisco (0.28)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning from Concept Drifting Data Streams with Unlabeled Data

Li, Peipei (Hefei University of Technology) | Wu, Xindong (University of Vermont) | Hu, Xuegang (Hefei University of Technology)

AAAI ConferencesJul-15-2010

Contrary to the previous beliefs that all arrived streaming data are labeled and the class labels are immediately availa- ble, we propose a Semi-supervised classification algorithm for data streams with concept drifts and UNlabeled data, called SUN. SUN is based on an evolved decision tree. In terms of deviation between history concept clusters and new ones generated by a developed clustering algorithm of k-Modes, concept drifts are distinguished from noise at leaves. Extensive studies on both synthetic and real data demonstrate that SUN performs well compared to several known online algorithms on unlabeled data. A conclusion is hence drawn that a feasible reference framework is provided for tackling concept drifting data streams with unlabeled data.

algorithm, artificial intelligence, machine learning, (14 more...)

AAAI Conferences

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country:

Asia > China (0.48)
North America > United States > Vermont > Chittenden County > Burlington (0.15)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.37)

Add feedback