AITopics | Wang, Wenhao

Collaborating Authors

Wang, Wenhao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

KnowledgeSG: Privacy-Preserving Synthetic Text Generation with Knowledge Distillation from Server

Wang, Wenhao, Liang, Xiaoyu, Ye, Rui, Chai, Jingyi, Chen, Siheng, Wang, Yanfeng

arXiv.org Artificial IntelligenceOct-9-2024

The success of large language models (LLMs) facilitate many parties to fine-tune LLMs on their own private data. However, this practice raises privacy concerns due to the memorization of LLMs. Existing solutions, such as utilizing synthetic data for substitution, struggle to simultaneously improve performance and preserve privacy. They either rely on a local model for generation, resulting in a performance decline, or take advantage of APIs, directly exposing the data to API servers. To address this issue, we propose KnowledgeSG, a novel client-server framework which enhances synthetic data quality and improves model performance while ensuring privacy. We achieve this by learning local knowledge from the private data with differential privacy (DP) and distilling professional knowledge from the server. Additionally, inspired by federated learning, we transmit models rather than data between the client and server to prevent privacy leakage. Extensive experiments in medical and financial domains demonstrate the effectiveness of KnowledgeSG. Our code is now publicly available at https://github.com/wwh0411/KnowledgeSG.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.05725

Country:

Asia > China (0.14)
North America > United States (0.14)
North America > Canada (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models

Wang, Wenhao, Yang, Yi

arXiv.org Artificial IntelligenceMay-14-2024

The arrival of Sora marks a new era for text-to-video diffusion models, bringing significant advancements in video generation and potential applications. However, Sora, along with other text-to-video diffusion models, is highly reliant on prompts, and there is no publicly available dataset that features a study of text-to-video prompts. In this paper, we introduce VidProM, the first large-scale dataset comprising 1.67 Million unique text-to-Video Prompts from real users. Additionally, this dataset includes 6.69 million videos generated by four state-of-the-art diffusion models, alongside some related data. We initially discuss the curation of this large-scale dataset, a process that is both time-consuming and costly. Subsequently, we underscore the need for a new prompt dataset specifically designed for text-to-video generation by illustrating how VidProM differs from DiffusionDB, a large-scale prompt-gallery dataset for image generation. Our extensive and diverse dataset also opens up many exciting new research areas. For instance, we suggest exploring text-to-video prompt engineering, efficient video generation, and video copy detection for diffusion models to develop better, more efficient, and safer models. The project (including the collected dataset VidProM and related code) is publicly available at https://vidprom.github.io under the CC-BY-NC 4.0 License.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2403.06098

Country:

Asia > Middle East (0.28)
Asia > Japan > Honshū (0.14)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

AnyPattern: Towards In-context Image Copy Detection

Wang, Wenhao, Sun, Yifan, Tan, Zhentao, Yang, Yi

arXiv.org Artificial IntelligenceApr-28-2024

This paper explores in-context learning for image copy detection (ICD), i.e., prompting an ICD model to identify replicated images with new tampering patterns without the need for additional training. The prompts (or the contexts) are from a small set of image-replica pairs that reflect the new patterns and are used at inference time. Such in-context ICD has good realistic value, because it requires no fine-tuning and thus facilitates fast reaction against the emergence of unseen patterns. To accommodate the "seen $\rightarrow$ unseen" generalization scenario, we construct the first large-scale pattern dataset named AnyPattern, which has the largest number of tamper patterns ($90$ for training and $10$ for testing) among all the existing ones. We benchmark AnyPattern with popular ICD methods and reveal that existing methods barely generalize to novel patterns. We further propose a simple in-context ICD method named ImageStacker. ImageStacker learns to select the most representative image-replica pairs and employs them as the pattern prompts in a stacking manner (rather than the popular concatenation manner). Experimental results show (1) training with our large-scale dataset substantially benefits pattern generalization ($+26.66 \%$ $\mu AP$), (2) the proposed ImageStacker facilitates effective in-context ICD (another round of $+16.75 \%$ $\mu AP$), and (3) AnyPattern enables in-context ICD, i.e., without such a large-scale dataset, in-context learning does not emerge even with our ImageStacker. Beyond the ICD task, we also demonstrate how AnyPattern can benefit artists, i.e., the pattern retrieval method trained on AnyPattern can be generalized to identify style mimicry by text-to-image models. The project is publicly available at https://anypattern.github.io.

image-replica pair, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2404.13788

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.87)

Industry:

Law (1.00)
Government (0.92)
Information Technology > Security & Privacy (0.67)
Education (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

TRAD: Enhancing LLM Agents with Step-Wise Thought Retrieval and Aligned Decision

Zhou, Ruiwen, Yang, Yingxuan, Wen, Muning, Wen, Ying, Wang, Wenhao, Xi, Chunling, Xu, Guoqiang, Yu, Yong, Zhang, Weinan

arXiv.org Artificial IntelligenceMar-10-2024

Numerous large language model (LLM) agents have been built for different tasks like web navigation and online shopping due to LLM's wide knowledge and text-understanding ability. Among these works, many of them utilize in-context examples to achieve generalization without the need for fine-tuning, while few of them have considered the problem of how to select and effectively utilize these examples. Recently, methods based on trajectory-level retrieval with task meta-data and using trajectories as in-context examples have been proposed to improve the agent's overall performance in some sequential decision making tasks. However, these methods can be problematic due to plausible examples retrieved without task-specific state transition dynamics and long input with plenty of irrelevant context. In this paper, we propose a novel framework (TRAD) to address these issues. TRAD first conducts Thought Retrieval, achieving step-level demonstration selection via thought matching, leading to more helpful demonstrations and less irrelevant input noise. Then, TRAD introduces Aligned Decision, complementing retrieved demonstration steps with their previous or subsequent steps, which enables tolerance for imperfect thought and provides a choice for balance between more context and less noise. Extensive experiments on ALFWorld and Mind2Web benchmarks show that TRAD not only outperforms state-of-the-art models but also effectively helps in reducing noise and promoting generalization. Furthermore, TRAD has been deployed in real-world scenarios of a global business insurance company and improves the success rate of robotic process automation.

artificial intelligence, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

2403.06221

Country:

North America > United States (0.16)
Asia > China (0.15)

Genre: Research Report (1.00)

Industry:

Banking & Finance (0.74)
Information Technology > Services > e-Commerce Services (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

OpenFedLLM: Training Large Language Models on Decentralized Private Data via Federated Learning

Ye, Rui, Wang, Wenhao, Chai, Jingyi, Li, Dihan, Li, Zexi, Xu, Yinda, Du, Yaxin, Wang, Yanfeng, Chen, Siheng

arXiv.org Artificial IntelligenceFeb-10-2024

Trained on massive publicly available data, large language models (LLMs) have demonstrated tremendous success across various fields. While more data contributes to better performance, a disconcerting reality is that high-quality public data will be exhausted in a few years. In this paper, we offer a potential next step for contemporary LLMs: collaborative and privacy-preserving LLM training on the underutilized distributed private data via federated learning (FL), where multiple data owners collaboratively train a shared model without transmitting raw data. To achieve this, we build a concise, integrated, and research-friendly framework/codebase, named OpenFedLLM. It covers federated instruction tuning for enhancing instruction-following capability, federated value alignment for aligning with human values, and 7 representative FL algorithms. Besides, OpenFedLLM supports training on diverse domains, where we cover 8 training datasets; and provides comprehensive evaluations, where we cover 30+ evaluation metrics. Through extensive experiments, we observe that all FL algorithms outperform local training on training LLMs, demonstrating a clear performance improvement across a variety of settings. Notably, in a financial benchmark, Llama2-7B fine-tuned by applying any FL algorithm can outperform GPT-4 by a significant margin while the model obtained through individual training cannot, demonstrating strong motivation for clients to participate in FL.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2402.06954

Country: North America > United States > Pennsylvania (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Memorization in Self-Supervised Learning Improves Downstream Generalization

Wang, Wenhao, Kaleem, Muhammad Ahmad, Dziedzic, Adam, Backes, Michael, Papernot, Nicolas, Boenisch, Franziska

arXiv.org Artificial IntelligenceJan-24-2024

Self-supervised learning (SSL) has recently received significant attention due to its ability to train high-performance encoders purely on unlabeled data--often scraped from the internet. This data can still be sensitive and empirical evidence suggests that SSL encoders memorize private information of their training data and can disclose them at inference time. Since existing theoretical definitions of memorization from supervised learning rely on labels, they do not transfer to SSL. To address this gap, we propose SSLMem, a framework for defining memorization within SSL. Our definition compares the difference in alignment of representations for data points and their augmented views returned by both encoders that were trained on these data points and encoders that were not. Through comprehensive empirical analysis on diverse encoder architectures and datasets we highlight that even though SSL relies on large datasets and strong augmentations--both known in supervised learning as regularization techniques that reduce overfitting--still significant fractions of training data points experience high memorization. Through our empirical results, we show that this memorization is essential for encoders to achieve higher generalization performance on different downstream tasks. In recent years, self-supervised learning (SSL) has emerged as a new potent learning paradigm. SSL encoders can be trained without reliance on labeled data, which is often hard and expensive to obtain. Instead, SSL leverages the existence of large amounts of unlabeled data--often scraped from the internet--to obtain state-of-the-art performance in various domains, ranging from computer vision (He et al., 2022; Chen et al., 2020; Chen & He, 2021; Caron et al., 2021) to natural language processing (Devlin et al., 2018; Radford et al.). Empirical studies suggest that SSL encoders can disclose information about their training data at inference time (Meehan et al., 2023). An unintended revelation of private information is often associated to machine learning models' ability to memorize their training data (Zhang et al., 2016; Arpit et al., 2017; Chatterjee, 2018; Carlini et al., 2019; 2021; 2022). Additionally, it was found that in supervised learning memorization happens in the feature extractor (encoder) layers (Feldman & Zhang, 2020; Maini et al., 2023). Those are exactly the type of layers that SSL trains. Yet, given that SSL differs significantly from supervised learning in terms of learning objective, data processing, and augmentation strength, it remains unclear whether the trends from supervised learning transfer to the self-supervised learning. Part of the work was done while the authors were at the University of Toronto and the Vector Institute. Higher memorization scores indicate stronger memorization. We observe that outliers and atypical examples experience higher memorization than more standard samples.

artificial intelligence, machine learning, memorization, (19 more...)

arXiv.org Artificial Intelligence

2401.12233

Country: North America > Canada > Ontario > Toronto (0.54)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.66)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback

FedRSU: Federated Learning for Scene Flow Estimation on Roadside Units

Fang, Shaoheng, Ye, Rui, Wang, Wenhao, Liu, Zuhong, Wang, Yuxiao, Wang, Yafei, Chen, Siheng, Wang, Yanfeng

arXiv.org Artificial IntelligenceJan-23-2024

Roadside unit (RSU) can significantly improve the safety and robustness of autonomous vehicles through Vehicle-to-Everything (V2X) communication. Currently, the usage of a single RSU mainly focuses on real-time inference and V2X collaboration, while neglecting the potential value of the high-quality data collected by RSU sensors. Integrating the vast amounts of data from numerous RSUs can provide a rich source of data for model training. However, the absence of ground truth annotations and the difficulty of transmitting enormous volumes of data are two inevitable barriers to fully exploiting this hidden value. In this paper, we introduce FedRSU, an innovative federated learning framework for self-supervised scene flow estimation. In FedRSU, we present a recurrent self-supervision training paradigm, where for each RSU, the scene flow prediction of points at every timestamp can be supervised by its subsequent future multi-modality observation. Another key component of FedRSU is federated learning, where multiple devices collaboratively train an ML model while keeping the training data local and private. With the power of the recurrent self-supervised learning paradigm, FL is able to leverage innumerable underutilized data from RSU. To verify the FedRSU framework, we construct a large-scale multi-modality dataset RSU-SF. The dataset consists of 17 RSU clients, covering various scenarios, modalities, and sensor settings. Based on RSU-SF, we show that FedRSU can greatly improve model performance in ITS and provide a comprehensive benchmark under diverse FL scenarios. To the best of our knowledge, we provide the first real-world LiDAR-camera multi-modal dataset and benchmark for the FL community.

artificial intelligence, deep learning, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2401.12862

Country:

North America > United States > Texas (0.14)
Asia > Middle East > Israel (0.14)
Asia > China > Zhejiang Province (0.14)

Genre: Research Report (1.00)

Industry:

Automobiles & Trucks (0.93)
Health & Medicine (0.68)
Information Technology > Security & Privacy (0.67)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.88)

Add feedback