AITopics | Chen, Wei

Collaborating Authors

Chen, Wei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Bridging Social Psychology and LLM Reasoning: Conflict-Aware Meta-Review Generation via Cognitive Alignment

Chen, Wei, Ding, Han, Yuan, Meng, Zhang, Zhao, Wang, Deqing, Zhuang, Fuzhen

arXiv.org Artificial IntelligenceMar-21-2025

The rapid growth of scholarly submissions has overwhelmed traditional peer review systems, driving the need for intelligent automation to preserve scientific rigor. While large language models (LLMs) show promise in automating manuscript critiques, their ability to synthesize high-stakes meta-reviews, which require conflict-aware reasoning and consensus derivation, remains underdeveloped. Existing methods fail to effectively handle conflicting viewpoints within differing opinions, and often introduce additional cognitive biases, such as anchoring effects and conformity bias.To overcome these limitations, we propose the Cognitive Alignment Framework (CAF), a dual-process architecture that transforms LLMs into adaptive scientific arbitrators. By operationalizing Kahneman's dual-process theory, CAF introduces a three-step cognitive pipeline: review initialization, incremental integration, and cognitive alignment.Empirical validation shows that CAF outperforms existing LLM-based methods, with sentiment consistency gains reaching up to 19.47\% and content consistency improving by as much as 12.95\%.

conflict-aware meta-review generation, large language model, natural language, (4 more...)

arXiv.org Artificial Intelligence

2503.13879

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Scalable Trajectory-User Linking with Dual-Stream Representation Networks

Zhang, Hao, Chen, Wei, Zhao, Xingyu, Qi, Jianpeng, Jiang, Guiyuan, Yu, Yanwei

arXiv.org Artificial IntelligenceMar-19-2025

Trajectory-user linking (TUL) aims to match anonymous trajectories to the most likely users who generated them, offering benefits for a wide range of real-world spatio-temporal applications. However, existing TUL methods are limited by high model complexity and poor learning of the effective representations of trajectories, rendering them ineffective in handling large-scale user trajectory data. In this work, we propose a novel $\underline{Scal}$abl$\underline{e}$ Trajectory-User Linking with dual-stream representation networks for large-scale $\underline{TUL}$ problem, named ScaleTUL. Specifically, ScaleTUL generates two views using temporal and spatial augmentations to exploit supervised contrastive learning framework to effectively capture the irregularities of trajectories. In each view, a dual-stream trajectory encoder, consisting of a long-term encoder and a short-term encoder, is designed to learn unified trajectory representations that fuse different temporal-spatial dependencies. Then, a TUL layer is used to associate the trajectories with the corresponding users in the representation space using a two-stage training model. Experimental results on check-in mobility datasets from three real-world cities and the nationwide U.S. demonstrate the superiority of ScaleTUL over state-of-the-art baselines for large-scale TUL tasks.

data mining, machine learning, trajectory, (19 more...)

arXiv.org Artificial Intelligence

2503.15002

Country:

Asia > China (0.46)
North America > United States (0.29)

Genre: Research Report (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Growing a Twig to Accelerate Large Vision-Language Models

Shao, Zhenwei, Wang, Mingyang, Yu, Zhou, Pan, Wenwen, Yang, Yan, Wei, Tao, Zhang, Hongyuan, Mao, Ning, Chen, Wei, Yu, Jun

arXiv.org Artificial IntelligenceMar-18-2025

Large vision-language models (VLMs) have demonstrated remarkable capabilities in open-world multimodal understanding, yet their high computational overheads pose great challenges for practical deployment. Some recent works have proposed methods to accelerate VLMs by pruning redundant visual tokens guided by the attention maps of VLM's early layers. Despite the success of these token pruning methods, they still suffer from two major shortcomings: (i) considerable accuracy drop due to insensitive attention signals in early layers, and (ii) limited speedup when generating long responses (e.g., 30 tokens). To address the limitations above, we present TwigVLM -- a simple and general architecture by growing a lightweight twig upon an early layer of the base VLM. Compared with most existing VLM acceleration methods purely based on visual token pruning, our TwigVLM not only achieves better accuracy retention by employing a twig-guided token pruning (TTP) strategy, but also yields higher generation speed by utilizing a self-speculative decoding (SSD) strategy. Taking LLaVA-1.5-7B as the base VLM, experimental results show that TwigVLM preserves 96% of the original performance after pruning 88.9% of visual tokens and achieves 154% speedup in generating long responses, delivering significantly better performance in terms of both accuracy and speed over the state-of-the-art VLM acceleration methods. Code will be made publicly available.

preprint arxiv, pruning, twigvlm, (15 more...)

arXiv.org Artificial Intelligence

2503.14075

Country:

Asia > China (0.46)
North America > United States (0.28)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.67)

Add feedback

Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model

Huang, Haoyang, Ma, Guoqing, Duan, Nan, Chen, Xing, Wan, Changyi, Ming, Ranchen, Wang, Tianyu, Wang, Bo, Lu, Zhiying, Li, Aojie, Zeng, Xianfang, Zhang, Xinhao, Yu, Gang, Yin, Yuhe, Wu, Qiling, Sun, Wen, An, Kang, Han, Xin, Sun, Deshan, Ji, Wei, Huang, Bizhu, Li, Brian, Wu, Chenfei, Huang, Guanzhe, Xiong, Huixin, He, Jiaxin, Wu, Jianchang, Yuan, Jianlong, Wu, Jie, Liu, Jiashuai, Guo, Junjing, Tan, Kaijun, Chen, Liangyu, Chen, Qiaohui, Sun, Ran, Yuan, Shanshan, Yin, Shengming, Liu, Sitong, Chen, Wei, Dai, Yaqi, Luo, Yuchu, Ge, Zheng, Guan, Zhisheng, Song, Xiaoniu, Zhou, Yu, Jiao, Binxing, Chen, Jiansheng, Li, Jing, Zhou, Shuchang, Zhang, Xiangyu, Xiu, Yi, Zhu, Yibo, Shum, Heung-Yeung, Jiang, Daxin

arXiv.org Artificial IntelligenceMar-14-2025

We present Step-Video-TI2V, a state-of-the-art text-driven image-to-video generation model with 30B parameters, capable of generating videos up to 102 frames based on both text and image inputs. We build Step-Video-TI2V-Eval as a new benchmark for the text-driven image-to-video task and compare Step-Video-TI2V with open-source and commercial TI2V engines using this dataset. Experimental results demonstrate the state-of-the-art performance of Step-Video-TI2V in the image-to-video generation task.

artificial intelligence, machine learning, step-video-ti2v, (18 more...)

arXiv.org Artificial Intelligence

2503.11251

Genre: Research Report (0.71)

Industry: Media > Photography (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.97)

Add feedback

Extra Clients at No Extra Cost: Overcome Data Heterogeneity in Federated Learning with Filter Decomposition

Chen, Wei, Qiu, Qiang

arXiv.org Artificial IntelligenceMar-11-2025

Data heterogeneity is one of the major challenges in federated learning (FL), which results in substantial client variance and slow convergence. In this study, we propose a novel solution: decomposing a convolutional filter in FL into a linear combination of filter subspace elements, i.e., filter atoms. This simple technique transforms global filter aggregation in FL into aggregating filter atoms and their atom coefficients. The key advantage here involves mathematically generating numerous cross-terms by expanding the product of two weighted sums from filter atom and atom coefficient. These cross-terms effectively emulate many additional latent clients, significantly reducing model variance, which is validated by our theoretical analysis and empirical observation. Furthermore, our method permits different training schemes for filter atoms and atom coefficients for highly adaptive model personalization and communication efficiency. Empirical results on benchmark datasets demonstrate that our filter decomposition technique substantially improves the accuracy of FL methods, confirming its efficacy in addressing data heterogeneity.

artificial intelligence, filter atom, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2503.08652

Country: North America > United States > Indiana > Tippecanoe County (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

GraphGarment: Learning Garment Dynamics for Bimanual Cloth Manipulation Tasks

Chen, Wei, Li, Kelin, Lee, Dongmyoung, Chen, Xiaoshuai, Zong, Rui, Kormushev, Petar

arXiv.org Artificial IntelligenceMar-10-2025

Physical manipulation of garments is often crucial when performing fabric-related tasks, such as hanging garments. However, due to the deformable nature of fabrics, these operations remain a significant challenge for robots in household, healthcare, and industrial environments. In this paper, we propose GraphGarment, a novel approach that models garment dynamics based on robot control inputs and applies the learned dynamics model to facilitate garment manipulation tasks such as hanging. Specifically, we use graphs to represent the interactions between the robot end-effector and the garment. GraphGarment uses a graph neural network (GNN) to learn a dynamics model that can predict the next garment state given the current state and input action in simulation. To address the substantial sim-to-real gap, we propose a residual model that compensates for garment state prediction errors, thereby improving real-world performance. The garment dynamics model is then applied to a model-based action sampling strategy, where it is utilized to manipulate the garment to a reference pre-hanging configuration for garment-hanging tasks. We conducted four experiments using six types of garments to validate our approach in both simulation and real-world settings. In simulation experiments, GraphGarment achieves better garment state prediction performance, with a prediction error 0.46 cm lower than the best baseline. Our approach also demonstrates improved performance in the garment-hanging simulation experiment with enhancements of 12%, 24%, and 10%, respectively. Moreover, real-world robot experiments confirm the robustness of sim-to-real transfer, with an error increase of 0.17 cm compared to simulation results. Supplementary material is available at:https://sites.google.com/view/graphgarment.

artificial intelligence, machine learning, modeling & simulation, (15 more...)

arXiv.org Artificial Intelligence

2503.05817

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

FedEM: A Privacy-Preserving Framework for Concurrent Utility Preservation in Federated Learning

Xu, Mingcong, Zhang, Xiaojin, Chen, Wei, Jin, Hai

arXiv.org Artificial IntelligenceMar-7-2025

Federated Learning (FL) enables collaborative training of models across distributed clients without sharing local data, addressing privacy concerns in decentralized systems. However, the gradient-sharing process exposes private data to potential leakage, compromising FL's privacy guarantees in real-world applications. To address this issue, we propose Federated Error Minimization (FedEM), a novel algorithm that incorporates controlled perturbations through adaptive noise injection. This mechanism effectively mitigates gradient leakage attacks while maintaining model performance. Experimental results on benchmark datasets demonstrate that FedEM significantly reduces privacy risks and preserves model accuracy, achieving a robust balance between privacy protection and utility preservation.

data mining, machine learning, privacy protection, (13 more...)

arXiv.org Artificial Intelligence

2503.06021

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.53)

Add feedback

CLIPure: Purification in Latent Space via CLIP for Adversarially Robust Zero-Shot Classification

Zhang, Mingkun, Bi, Keping, Chen, Wei, Guo, Jiafeng, Cheng, Xueqi

arXiv.org Artificial IntelligenceMar-2-2025

A BSTRACT In this paper, we aim to build an adversarially robust zero-shot image classifier. We ground our work on CLIP, a vision-language pre-trained encoder model that can perform zero-shot classification by matching an image with text prompts "a photo of a < class-name> .". Purification is the path we choose since it does not require adversarial training on specific attack types and thus can cope with any foreseen attacks. We then formulate purification risk as the KL divergence between the joint distributions of the purification process of denoising the adversarial samples and the attack process of adding perturbations to benign samples, through bidirectional Stochastic Differential Equations (SDEs). The final derived results inspire us to explore purification in the multi-modal latent space of CLIP . We propose two variants for our CLIPure approach: CLIPure-Diff which models the likelihood of images' latent vectors with the DiffusionPrior module in DaLLE-2 (modeling the generation process of CLIP's latent vectors), and CLIPure-Cos which models the likelihood with the cosine similarity between the embeddings of an image and "a photo of a.". As far as we know, CLIPure is the first purification method in multi-modal latent space and CLIPure-Cos is the first purification method that is not based on generative models, which substantially improves defense efficiency. We conducted extensive experiments on CIFAR-10, ImageNet, and 13 datasets that previous CLIP-based defense methods used for evaluating zero-shot classification robustness. Among them, CLIP (Radford et al., 2021) is an example that is popular, effective, and efficient. CLIP performs zero-shot classification by forming text prompts "a photo of a < class-name> ." of all the candidate categories, and selecting the class with the highest similarity with the image embedding. Despite its efficacy, when facing adversarial attacks, its accuracy can drop to zero, similarly vulnerable to other neural classifiers. Existing methods to enhance adversarial robustness follow two primary paths: adversarial training and purification. Adversarial Training (A T) (Madry et al., 2017; Rebuffi et al., 2021; Wang et al., 2023) incorporates adversarial examples into model training to boost robustness. It often achieves corresponding authors 1 arXiv:2502.18176v2 FARE (Schlarmann et al., 2024) and TeCoA (Mao et al., 2022) are two A T approaches integrated with CLIP, which enhance CLIP's zero-shot classification robustness while harming clean accuracy significantly and do not generalize to other types of attacks.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.18176

Country:

Asia > China (0.14)
Europe > Germany (0.14)
Europe > Spain (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Energy (0.46)
Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Continuous K-Max Bandits

Chen, Yu, Wang, Siwei, Huang, Longbo, Chen, Wei

arXiv.org Artificial IntelligenceFeb-19-2025

We study the $K$-Max combinatorial multi-armed bandits problem with continuous outcome distributions and weak value-index feedback: each base arm has an unknown continuous outcome distribution, and in each round the learning agent selects $K$ arms, obtains the maximum value sampled from these $K$ arms as reward and observes this reward together with the corresponding arm index as feedback. This setting captures critical applications in recommendation systems, distributed computing, server scheduling, etc. The continuous $K$-Max bandits introduce unique challenges, including discretization error from continuous-to-discrete conversion, non-deterministic tie-breaking under limited feedback, and biased estimation due to partial observability. Our key contribution is the computationally efficient algorithm DCK-UCB, which combines adaptive discretization with bias-corrected confidence bounds to tackle these challenges. For general continuous distributions, we prove that DCK-UCB achieves a $\widetilde{\mathcal{O}}(T^{3/4})$ regret upper bound, establishing the first sublinear regret guarantee for this setting. Furthermore, we identify an important special case with exponential distributions under full-bandit feedback. In this case, our proposed algorithm MLE-Exp enables $\widetilde{\mathcal{O}}(\sqrt{T})$ regret upper bound through maximal log-likelihood estimation, achieving near-minimax optimality.

bandit, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2502.13467

Genre: Research Report (0.63)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

FedEAT: A Robustness Optimization Framework for Federated LLMs

Pang, Yahao, Wu, Xingyuan, Zhang, Xiaojin, Chen, Wei, Jin, Hai

arXiv.org Artificial IntelligenceFeb-17-2025

Significant advancements have been made by Large Language Models (LLMs) in the domains of natural language understanding and automated content creation. However, they still face persistent problems, including substantial computational costs and inadequate availability of training data. The combination of Federated Learning (FL) and LLMs (federated LLMs) offers a solution by leveraging distributed data while protecting privacy, which positions it as an ideal choice for sensitive domains. However, Federated LLMs still suffer from robustness challenges, including data heterogeneity, malicious clients, and adversarial attacks, which greatly hinder their applications. We first introduce the robustness problems in federated LLMs, to address these challenges, we propose FedEAT (Federated Embedding space Adversarial Training), a novel framework that applies adversarial training in the embedding space of client LLM and employs a robust aggregation approach, specifically geometric median aggregation, to enhance the robustness of Federated LLMs. Our experiments demonstrate that FedEAT effectively improves the robustness of Federated LLMs with minimal performance loss.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2502.11863

Country: Asia (0.28)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (0.49)
Government > Military (0.35)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback