AITopics | Zeng, Kai

Collaborating Authors

Zeng, Kai

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MC-LLaVA: Multi-Concept Personalized Vision-Language Model

An, Ruichuan, Yang, Sihan, Lu, Ming, Zhang, Renrui, Zeng, Kai, Luo, Yulin, Cao, Jiajun, Liang, Hao, Chen, Ying, She, Qi, Zhang, Shanghang, Zhang, Wentao

arXiv.org Artificial IntelligenceMar-25-2025

Current vision-language models (VLMs) show exceptional abilities across diverse tasks, such as visual question answering. To enhance user experience, recent studies investigate VLM personalization to understand user-provided concepts. However, they mainly focus on single-concept personalization, neglecting the existence and interplay of multiple concepts, which limits real-world applicability. This paper proposes the first multi-concept personalization paradigm, MC-LLaVA. Specifically, MC-LLaVA employs a multi-concept instruction tuning strategy, effectively integrating multiple concepts in a single training step. To reduce the costs related to joint training, we propose a personalized textual prompt that uses visual token information to initialize concept tokens. Additionally, we introduce a personalized visual prompt during inference, aggregating location confidence maps for enhanced recognition and grounding capabilities. To advance multi-concept personalization research, we further contribute a high-quality instruction tuning dataset. We carefully collect images with multiple characters and objects from movies and manually generate question-answer samples for multi-concept scenarios, featuring superior diversity. Comprehensive qualitative and quantitative experiments demonstrate that MC-LLaVA can achieve impressive multi-concept personalized responses, paving the way for VLMs to become better user-specific assistants. The code and dataset will be publicly available at https://github.com/arctanxarc/MC-LLaVA}.

artificial intelligence, multi-concept personalized vision-language model, natural language, (1 more...)

arXiv.org Artificial Intelligence

2503.18854

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.53)

Add feedback

Concept-as-Tree: Synthetic Data is All You Need for VLM Personalization

An, Ruichuan, Zeng, Kai, Lu, Ming, Yang, Sihan, Zhang, Renrui, Ji, Huitong, Zhang, Qizhe, Luo, Yulin, Liang, Hao, Zhang, Wentao

arXiv.org Artificial IntelligenceMar-17-2025

Vision-Language Models (VLMs) have demonstrated exceptional performance in various multi-modal tasks. Recently, there has been an increasing interest in improving the personalization capabilities of VLMs. To better integrate user-provided concepts into VLMs, many methods use positive and negative samples to fine-tune these models. However, the scarcity of user-provided positive samples and the low quality of retrieved negative samples pose challenges for fine-tuning. To reveal the relationship between sample and model performance, we systematically investigate the impact of positive and negative samples (easy and hard) and their diversity on VLM personalization tasks. Based on the detailed analysis, we introduce Concept-as-Tree (CaT), which represents a concept as a tree structure, thereby enabling the data generation of positive and negative samples with varying difficulty and diversity for VLM personalization. With a well-designed data filtering strategy, our CaT framework can ensure the quality of generated data, constituting a powerful pipeline. We perform thorough experiments with various VLM personalization baselines to assess the effectiveness of the pipeline, alleviating the lack of positive samples and the low quality of negative samples. Our results demonstrate that CaT equipped with the proposed data filter significantly enhances the personalization capabilities of VLMs across the MyVLM, Yo'LLaVA, and MC-LLaVA datasets. To our knowledge, this work is the first controllable synthetic data pipeline for VLM personalization. The code is released at \href{https://github.com/zengkaiya/CaT}{https://github.com/zengkaiya/CaT}.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.12999

Country: Asia (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Novel Object 6D Pose Estimation with a Single Reference View

Liu, Jian, Sun, Wei, Zeng, Kai, Zheng, Jin, Yang, Hui, Wang, Lin, Rahmani, Hossein, Mian, Ajmal

arXiv.org Artificial IntelligenceMar-7-2025

Existing novel object 6D pose estimation methods typically rely on CAD models or dense reference views, which are both difficult to acquire. Using only a single reference view is more scalable, but challenging due to large pose discrepancies and limited geometric and spatial information. To address these issues, we propose a Single-Reference-based novel object 6D (SinRef-6D) pose estimation method. Our key idea is to iteratively establish point-wise alignment in the camera coordinate system based on state space models (SSMs). Specifically, iterative camera-space point-wise alignment can effectively handle large pose discrepancies, while our proposed RGB and Points SSMs can capture long-range dependencies and spatial information from a single view, offering linear complexity and superior spatial modeling capability. Once pre-trained on synthetic data, SinRef-6D can estimate the 6D pose of a novel object using only a single reference view, without requiring retraining or a CAD model. Extensive experiments on six popular datasets and real-world robotic scenes demonstrate that we achieve on-par performance with CAD-based and dense reference view-based methods, despite operating in the more challenging single reference setting. Code will be released at https://github.com/CNJianLiu/SinRef-6D.

artificial intelligence, pose estimation, spatial reasoning, (17 more...)

arXiv.org Artificial Intelligence

2503.05578

Country:

Asia > China (0.14)
Oceania > Australia (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.54)

Add feedback

MC-LLaVA: Multi-Concept Personalized Vision-Language Model

An, Ruichuan, Yang, Sihan, Lu, Ming, Zeng, Kai, Luo, Yulin, Chen, Ying, Cao, Jiajun, Liang, Hao, She, Qi, Zhang, Shanghang, Zhang, Wentao

arXiv.org Artificial IntelligenceDec-5-2024

Current vision-language models (VLMs) show exceptional abilities across diverse tasks including visual question answering. To enhance user experience in practical applications, recent studies investigate VLM personalization to understand user-provided concepts. However, existing studies mainly focus on single-concept personalization, neglecting the existence and interplay of multiple concepts, which limits the real-world applicability of personalized VLMs. In this paper, we propose the first multi-concept personalization method named MC-LLaVA along with a high-quality multi-concept personalization dataset. Specifically, MC-LLaVA uses a joint training strategy incorporating multiple concepts in a single training step, allowing VLMs to perform accurately in multi-concept personalization. To reduce the cost of joint training, MC-LLaVA leverages visual token information for concept token initialization, yielding improved concept representation and accelerating joint training. To advance multi-concept personalization research, we further contribute a high-quality dataset. We carefully collect images from various movies that contain multiple characters and manually generate the multi-concept question-answer samples. Our dataset features diverse movie types and question-answer types. We conduct comprehensive qualitative and quantitative experiments to demonstrate that MC-LLaVA can achieve impressive multi-concept personalized responses, paving the way for VLMs to become better user-specific assistants. The code and dataset will be publicly available at https://github.com/arctanxarc/MC-LLaVA.

large language model, mc-llava, natural language, (15 more...)

arXiv.org Artificial Intelligence

2411.11706

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Industry: Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)

Add feedback

Distributed Swarm Learning for Edge Internet of Things

Wang, Yue, Tian, Zhi, Fan, FXin, Cai, Zhipeng, Nowzari, Cameron, Zeng, Kai

arXiv.org Artificial IntelligenceMar-29-2024

The rapid growth of Internet of Things (IoT) has led to Challenge-2: Non-convex optimization. Gradient-based algorithms the widespread deployment of smart IoT devices at wireless get trapped in local optima when tackling non-convex edge for collaborative machine learning tasks, ushering in a problems, e.g., training neural networks with nonlinear activation. With a huge number of hardwareconstrained This problem worsens in distributed learning, particularly IoT devices operating in resource-limited wireless in IoT scenarios where edge devices access limited data. Edge learning including communication and computation bottlenecks, device faces statistical heterogeneity in local training data across and data heterogeneity, security risks, privacy leakages, nonconvex workers, also known as the non-i.i.d. To heterogeneity in IoT hardware capability and link quality, address these issues, this article explores a novel framework which degrades edge learning performance significantly.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Artificial Intelligence

2403.20188

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Information Technology > Smart Houses & Appliances (0.71)

Technology:

Information Technology > Internet of Things (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Provably Secure Disambiguating Neural Linguistic Steganography

Qi, Yuang, Chen, Kejiang, Zeng, Kai, Zhang, Weiming, Yu, Nenghai

arXiv.org Artificial IntelligenceMar-26-2024

Recent research in provably secure neural linguistic steganography has overlooked a crucial aspect: the sender must detokenize stegotexts to avoid raising suspicion from the eavesdropper. The segmentation ambiguity problem, which arises when using language models based on subwords, leads to occasional decoding failures in all neural language steganography implementations based on these models. Current solutions to this issue involve altering the probability distribution of candidate words, rendering them incompatible with provably secure steganography. We propose a novel secure disambiguation method named SyncPool, which effectively addresses the segmentation ambiguity problem. We group all tokens with prefix relationships in the candidate pool before the steganographic embedding algorithm runs to eliminate uncertainty among ambiguous tokens. To enable the receiver to synchronize the sampling process of the sender, a shared cryptographically-secure pseudorandom number generator (CSPRNG) is deployed to select a token from the ambiguity pool. SyncPool does not change the size of the candidate pool or the distribution of tokens and thus is applicable to provably secure language steganography methods. We provide theoretical proofs and experimentally demonstrate the applicability of our solution to various languages and models, showing its potential to significantly improve the reliability and security of neural linguistic steganography systems.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2403.17524

Country: North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)

Add feedback

Multi-Bit Distortion-Free Watermarking for Large Language Models

Boroujeny, Massieh Kordi, Jiang, Ya, Zeng, Kai, Mark, Brian

arXiv.org Artificial IntelligenceFeb-26-2024

Methods for watermarking large language models have been proposed that distinguish AI-generated text from human-generated text by slightly altering the model output distribution, but they also distort the quality of the text, exposing the watermark to adversarial detection. More recently, distortion-free watermarking methods were proposed that require a secret key to detect the watermark. The prior methods generally embed zero-bit watermarks that do not provide additional information beyond tagging a text as being AI-generated. We extend an existing zero-bit distortion-free watermarking method by embedding multiple bits of meta-information as part of the watermark. We also develop a computationally efficient decoder that extracts the embedded information from the watermark with low bit error rate.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2402.16578

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

Quiver: Supporting GPUs for Low-Latency, High-Throughput GNN Serving with Workload Awareness

Tan, Zeyuan, Yuan, Xiulong, He, Congjie, Sit, Man-Kit, Li, Guo, Liu, Xiaoze, Ai, Baole, Zeng, Kai, Pietzuch, Peter, Mai, Luo

arXiv.org Artificial IntelligenceMay-18-2023

Systems for serving inference requests on graph neural networks (GNN) must combine low latency with high throughout, but they face irregular computation due to skew in the number of sampled graph nodes and aggregated GNN features. This makes it challenging to exploit GPUs effectively: using GPUs to sample only a few graph nodes yields lower performance than CPU-based sampling; and aggregating many features exhibits high data movement costs between GPUs and CPUs. Therefore, current GNN serving systems use CPUs for graph sampling and feature aggregation, limiting throughput. We describe Quiver, a distributed GPU-based GNN serving system with low-latency and high-throughput. Quiver's key idea is to exploit workload metrics for predicting the irregular computation of GNN requests, and governing the use of GPUs for graph sampling and feature aggregation: (1) for graph sampling, Quiver calculates the probabilistic sampled graph size, a metric that predicts the degree of parallelism in graph sampling. Quiver uses this metric to assign sampling tasks to GPUs only when the performance gains surpass CPU-based sampling; and (2) for feature aggregation, Quiver relies on the feature access probability to decide which features to partition and replicate across a distributed GPU NUMA topology. We show that Quiver achieves up to 35 times lower latency with an 8 times higher throughput compared to state-of-the-art GNN approaches (DGL and PyG).

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2305.10863

Country: North America > United States (1.00)

Genre: Research Report (0.64)

Industry: Information Technology (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation

Han, Yuxing, Wu, Ziniu, Wu, Peizhi, Zhu, Rong, Yang, Jingyi, Tan, Liang Wei, Zeng, Kai, Cong, Gao, Qin, Yanzhao, Pfadler, Andreas, Qian, Zhengping, Zhou, Jingren, Li, Jiangneng, Cui, Bin

arXiv.org Artificial IntelligenceSep-15-2021

Cardinality estimation (CardEst) plays a significant role in generating high-quality query plans for a query optimizer in DBMS. In the last decade, an increasing number of advanced CardEst methods (especially ML-based) have been proposed with outstanding estimation accuracy and inference latency. However, there exists no study that systematically evaluates the quality of these methods and answer the fundamental problem: to what extent can these methods improve the performance of query optimizer in real-world settings, which is the ultimate goal of a CardEst method. In this paper, we comprehensively and systematically compare the effectiveness of CardEst methods in a real DBMS. We establish a new benchmark for CardEst, which contains a new complex real-world dataset STATS and a diverse query workload STATS-CEB. We integrate multiple most representative CardEst methods into an open-source database system PostgreSQL, and comprehensively evaluate their true effectiveness in improving query plan quality, and other important aspects affecting their applicability, ranging from inference latency, model size, and training time, to update efficiency and accuracy. We obtain a number of key findings for the CardEst methods, under different data and query settings. Furthermore, we find that the widely used estimation accuracy metric(Q-Error) cannot distinguish the importance of different sub-plan queries during query optimization and thus cannot truly reflect the query plan quality generated by CardEst methods. Therefore, we propose a new metric P-Error to evaluate the performance of CardEst methods, which overcomes the limitation of Q-Error and is able to reflect the overall end-to-end performance of CardEst methods. We have made all of the benchmark data and evaluation code publicly available at https://github.com/Nathaniel-Han/End-to-End-CardEst-Benchmark.

artificial intelligence, cardest method, information retrieval query processing, (19 more...)

arXiv.org Artificial Intelligence

2109.05877

Genre: Research Report (0.82)

Technology:

Information Technology > Databases (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Unified Transferable Model for ML-Enhanced DBMS

Wu, Ziniu, Yang, Peilun, Yu, Pei, Zhu, Rong, Han, Yuxing, Li, Yaliang, Lian, Defu, Zeng, Kai, Zhou, Jingren

arXiv.org Artificial IntelligenceMay-5-2021

Recently, the database management system (DBMS) community has witnessed the power of machine learning (ML) solutions for DBMS tasks. Despite their promising performance, these existing solutions can hardly be considered satisfactory. First, these ML-based methods in DBMS are not effective enough because they are optimized on each specific task, and cannot explore or understand the intrinsic connections between tasks. Second, the training process has serious limitations that hinder their practicality, because they need to retrain the entire model from scratch for a new DB. Moreover, for each retraining, they require an excessive amount of training data, which is very expensive to acquire and unavailable for a new DB. We propose to explore the transferabilities of the ML methods both across tasks and across DBs to tackle these fundamental drawbacks. In this paper, we propose a unified model MTMLF that uses a multi-task training procedure to capture the transferable knowledge across tasks and a pretrain finetune procedure to distill the transferable meta knowledge across DBs. We believe this paradigm is more suitable for cloud DB service, and has the potential to revolutionize the way how ML is used in DBMS. Furthermore, to demonstrate the predicting power and viability of MTMLF, we provide a concrete and very promising case study on query optimization tasks. Last but not least, we discuss several concrete research opportunities along this line of work.

information retrieval query processing, knowledge, neural network, (19 more...)

arXiv.org Artificial Intelligence

2105.02418

Genre: Research Report (0.50)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback