AITopics | Wang, Xintao

Collaborating Authors

Wang, Xintao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers

Shi, Minglei, Yuan, Ziyang, Yang, Haotian, Wang, Xintao, Zheng, Mingwu, Tao, Xin, Zhao, Wenliang, Zheng, Wenzhao, Zhou, Jie, Lu, Jiwen, Wan, Pengfei, Zhang, Di, Gai, Kun

arXiv.org Artificial IntelligenceMar-18-2025

Diffusion models have demonstrated remarkable success in various image generation tasks, but their performance is often limited by the uniform processing of inputs across varying conditions and noise levels. To address this limitation, we propose a novel approach that leverages the inherent heterogeneity of the diffusion process. Our method, DiffMoE, introduces a batch-level global token pool that enables experts to access global token distributions during training, promoting specialized expert behavior. To unleash the full potential of the diffusion process, DiffMoE incorporates a capacity predictor that dynamically allocates computational resources based on noise levels and sample complexity. Through comprehensive evaluation, DiffMoE achieves state-of-the-art performance among diffusion models on ImageNet benchmark, substantially outperforming both dense architectures with 3x activated parameters and existing MoE approaches while maintaining 1x activated parameters. The effectiveness of our approach extends beyond class-conditional generation to more challenging tasks such as text-to-image generation, demonstrating its broad applicability across different diffusion model applications. Project Page: https://shiml20.github.io/DiffMoE/

artificial intelligence, diffmoe, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2503.14487

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

PowerAttention: Exponentially Scaling of Receptive Fields for Effective Sparse Attention

Chen, Lida, Xu, Dong, An, Chenxin, Wang, Xintao, Zhang, Yikai, Chen, Jiangjie, Liang, Zujie, Wei, Feng, Liang, Jiaqing, Xiao, Yanghua, Wang, Wei

arXiv.org Artificial IntelligenceMar-5-2025

Large Language Models (LLMs) face efficiency bottlenecks due to the quadratic complexity of the attention mechanism when processing long contexts. Sparse attention methods offer a promising solution, but existing approaches often suffer from incomplete effective context and/or require complex implementation of pipeline. We present a comprehensive analysis of sparse attention for autoregressive LLMs from the respective of receptive field, recognize the suboptimal nature of existing methods for expanding the receptive field, and introduce PowerAttention, a novel sparse attention design that facilitates effective and complete context extension through the theoretical analysis. PowerAttention achieves exponential receptive field growth in $d$-layer LLMs, allowing each output token to attend to $2^d$ tokens, ensuring completeness and continuity of the receptive field. Experiments demonstrate that PowerAttention outperforms existing static sparse attention methods by $5\sim 40\%$, especially on tasks demanding long-range dependencies like Passkey Retrieval and RULER, while maintaining a comparable time complexity to sliding window attention. Efficiency evaluations further highlight PowerAttention's superior speedup in both prefilling and decoding phases compared with dynamic sparse attentions and full attention ($3.0\times$ faster on 128K context), making it a highly effective and user-friendly solution for processing long sequences in LLMs.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.03588

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Hawaii (0.14)
North America > United States > Florida > Miami-Dade County > Miami (0.14)
North America > Mexico > Mexico City (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

CoSER: Coordinating LLM-Based Persona Simulation of Established Roles

Wang, Xintao, Wang, Heng, Zhang, Yifei, Yuan, Xinfeng, Xu, Rui, Huang, Jen-tse, Yuan, Siyu, Guo, Haoran, Chen, Jiangjie, Wang, Wei, Xiao, Yanghua, Zhou, Shuchang

arXiv.org Artificial IntelligenceFeb-13-2025

Role-playing language agents (RPLAs) have emerged as promising applications of large language models (LLMs). However, simulating established characters presents a challenging task for RPLAs, due to the lack of authentic character datasets and nuanced evaluation methods using such data. In this paper, we present CoSER, a collection of a high-quality dataset, open models, and an evaluation protocol towards effective RPLAs of established characters. The CoSER dataset covers 17,966 characters from 771 renowned books. It provides authentic dialogues with real-world intricacies, as well as diverse data types such as conversation setups, character experiences and internal thoughts. Drawing from acting methodology, we introduce given-circumstance acting for training and evaluating role-playing LLMs, where LLMs sequentially portray multiple characters in book scenes. Using our dataset, we develop CoSER 8B and CoSER 70B, i.e., advanced open role-playing LLMs built on LLaMA-3.1 models. Extensive experiments demonstrate the value of the CoSER dataset for RPLA training, evaluation and retrieval. Moreover, CoSER 70B exhibits state-of-the-art performance surpassing or matching GPT-4o on our evaluation and three existing benchmarks, i.e., achieving 75.80% and 93.47% accuracy on the InCharacter and LifeChoice benchmarks respectively.

coordinating llm-based persona simulation, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2502.09082

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Add feedback

Improving Video Generation with Human Feedback

Liu, Jie, Liu, Gongye, Liang, Jiajun, Yuan, Ziyang, Liu, Xiaokun, Zheng, Mingwu, Wu, Xiele, Wang, Qiulin, Qin, Wenyu, Xia, Menghan, Wang, Xintao, Liu, Xiaohong, Yang, Fei, Wan, Pengfei, Zhang, Di, Gai, Kun, Yang, Yujiu, Ouyang, Wanli

arXiv.org Artificial IntelligenceJan-23-2025

Video generation has achieved significant advances through rectified flow techniques, but issues like unsmooth motion and misalignment between videos and prompts persist. In this work, we develop a systematic pipeline that harnesses human feedback to mitigate these problems and refine the video generation model. Specifically, we begin by constructing a large-scale human preference dataset focused on modern video generation models, incorporating pairwise annotations across multi-dimensions. We then introduce VideoReward, a multi-dimensional video reward model, and examine how annotations and various design choices impact its rewarding efficacy. From a unified reinforcement learning perspective aimed at maximizing reward with KL regularization, we introduce three alignment algorithms for flow-based models by extending those from diffusion models. These include two training-time strategies: direct preference optimization for flow (Flow-DPO) and reward weighted regression for flow (Flow-RWR), and an inference-time technique, Flow-NRG, which applies reward guidance directly to noisy videos. Experimental results indicate that VideoReward significantly outperforms existing reward models, and Flow-DPO demonstrates superior performance compared to both Flow-RWR and standard supervised fine-tuning methods. Additionally, Flow-NRG lets users assign custom weights to multiple objectives during inference, meeting personalized video quality needs. Project page: https://gongyeliu.github.io/videoalign.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2501.13918

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.67)

Add feedback

MINDECHO: Role-Playing Language Agents for Key Opinion Leaders

Xu, Rui, Lu, Dakuan, Tan, Xiaoyu, Wang, Xintao, Yuan, Siyu, Chen, Jiangjie, Chu, Wei, Yinghui, Xu

arXiv.org Artificial IntelligenceJul-7-2024

Large language models~(LLMs) have demonstrated impressive performance in various applications, among which role-playing language agents (RPLAs) have engaged a broad user base. Now, there is a growing demand for RPLAs that represent Key Opinion Leaders (KOLs), \ie, Internet celebrities who shape the trends and opinions in their domains. However, research in this line remains underexplored. In this paper, we hence introduce MINDECHO, a comprehensive framework for the development and evaluation of KOL RPLAs. MINDECHO collects KOL data from Internet video transcripts in various professional fields, and synthesizes their conversations leveraging GPT-4. Then, the conversations and the transcripts are used for individualized model training and inference-time retrieval, respectively. Our evaluation covers both general dimensions (\ie, knowledge and tones) and fan-centric dimensions for KOLs. Extensive experiments validate the effectiveness of MINDECHO in developing and evaluating KOL RPLAs.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2407.05305

Country: Asia (0.14)

Genre:

Research Report (0.64)
Personal (0.46)

Industry: Health & Medicine > Consumer Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Evaluating Character Understanding of Large Language Models via Character Profiling from Fictional Works

Yuan, Xinfeng, Yuan, Siyu, Cui, Yuhan, Lin, Tianhe, Wang, Xintao, Xu, Rui, Chen, Jiangjie, Yang, Deqing

arXiv.org Artificial IntelligenceJul-2-2024

Large language models (LLMs) have demonstrated impressive performance and spurred numerous AI applications, in which role-playing agents (RPAs) are particularly popular, especially for fictional characters. The prerequisite for these RPAs lies in the capability of LLMs to understand characters from fictional works. Previous efforts have evaluated this capability via basic classification tasks or characteristic imitation, failing to capture the nuanced character understanding with LLMs. In this paper, we propose evaluating LLMs' character understanding capability via the character profiling task, i.e., summarizing character profiles from corresponding materials, a widely adopted yet understudied practice for RPA development. Specifically, we construct the CroSS dataset from literature experts and assess the generated profiles by comparing ground truth references and their applicability in downstream tasks. Our experiments, which cover various summarization methods and LLMs, have yielded promising results. These results strongly validate the character understanding capability of LLMs. Resources are available at https://github.com/Joanna0123/character_profiling.

information, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2404.12726

Country:

Asia (0.28)
North America > United States (0.14)

Genre: Research Report > New Finding (0.93)

Industry:

Leisure & Entertainment (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

Ground Every Sentence: Improving Retrieval-Augmented LLMs with Interleaved Reference-Claim Generation

Xia, Sirui, Wang, Xintao, Liang, Jiaqing, Zhang, Yifei, Zhou, Weikang, Deng, Jiaji, Yu, Fei, Xiao, Yanghua

arXiv.org Artificial IntelligenceJul-1-2024

Retrieval-Augmented Generation (RAG) has been widely adopted to enhance Large Language Models (LLMs) in knowledge-intensive tasks. Recently, Attributed Text Generation (ATG) has attracted growing attention, which provides citations to support the model's responses in RAG, so as to enhance the credibility of LLM-generated content and facilitate verification. Prior methods mainly adopt coarse-grained attributions, linking to passage-level references or providing paragraph-level citations. However, these methods still fall short in verifiability and require certain time costs for fact checking. This paper proposes a fine-grained ATG method called ReClaim(Refer & Claim), which alternates the generation of references and answers step by step. Unlike traditional coarse-grained attribution, ReClaim allows the model to add sentence-level fine-grained citations to each answer sentence in long-form question-answering tasks. Our experiments encompass various training and inference methods and multiple LLMs, verifying the effectiveness of our approach.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2407.01796

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment > Sports > Soccer (0.95)
Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Capturing Minds, Not Just Words: Enhancing Role-Playing Language Models with Personality-Indicative Data

Ran, Yiting, Wang, Xintao, Xu, Rui, Yuan, Xinfeng, Liang, Jiaqing, Xiao, Yanghua, Yang, Deqing

arXiv.org Artificial IntelligenceJun-29-2024

Role-playing agents (RPA) have been a popular application area for large language models (LLMs), attracting significant interest from both industry and academia.While existing RPAs well portray the characters' knowledge and tones, they face challenges in capturing their minds, especially for small role-playing language models (RPLMs). In this paper, we propose to enhance RPLMs via personality-indicative data. Specifically, we leverage questions from psychological scales and distill advanced RPAs to generate dialogues that grasp the minds of characters. Experimental results validate that RPLMs trained with our dataset exhibit advanced role-playing capabilities for both general and personality-related evaluations. Code and data are available at \href{https://github.com/alienet1109/RolePersonality}{this URL}.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2406.18921

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)

Add feedback

Image Conductor: Precision Control for Interactive Video Synthesis

Li, Yaowei, Wang, Xintao, Zhang, Zhaoyang, Wang, Zhouxia, Yuan, Ziyang, Xie, Liangbin, Zou, Yuexian, Shan, Ying

arXiv.org Artificial IntelligenceJun-21-2024

Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions and object movements, typically involving labor-intensive real-world capturing. Despite advancements in generative AI for video creation, achieving precise control over motion for interactive video asset generation remains challenging. To this end, we propose Image Conductor, a method for precise control of camera transitions and object movements to generate video assets from a single image. An well-cultivated training strategy is proposed to separate distinct camera and object motion by camera LoRA weights and object LoRA weights. To further address cinematographic variations from ill-posed trajectories, we introduce a camera-free guidance technique during inference, enhancing object movements while eliminating camera transitions. Additionally, we develop a trajectory-oriented video motion data curation pipeline for training. Quantitative and qualitative experiments demonstrate our method's precision and fine-grained control in generating motion-controllable videos from images, advancing the practical application of interactive video synthesis. Project webpage available at https://liyaowei-stu.github.io/project/ImageConductor/

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2406.15339

Country: Asia (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)

Add feedback

Teaching Large Language Models to Express Knowledge Boundary from Their Own Signals

Chen, Lida, Liang, Zujie, Wang, Xintao, Liang, Jiaqing, Xiao, Yanghua, Wei, Feng, Chen, Jinglei, Hao, Zhenghong, Han, Bing, Wang, Wei

arXiv.org Artificial IntelligenceJun-16-2024

Large language models (LLMs) have achieved great success, but their occasional content fabrication, or hallucination, limits their practical application. Hallucination arises because LLMs struggle to admit ignorance due to inadequate training on knowledge boundaries. We call it a limitation of LLMs that they can not accurately express their knowledge boundary, answering questions they know while admitting ignorance to questions they do not know. In this paper, we aim to teach LLMs to recognize and express their knowledge boundary, so they can reduce hallucinations caused by fabricating when they do not know. We propose CoKE, which first probes LLMs' knowledge boundary via internal confidence given a set of questions, and then leverages the probing results to elicit the expression of the knowledge boundary. Extensive experiments show CoKE helps LLMs express knowledge boundaries, answering known questions while declining unknown ones, significantly improving in-domain and out-of-domain performance.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2406.10881

Country:

Oceania > New Zealand (0.15)
North America > Canada (0.14)
Asia > China (0.14)

Genre: Research Report (0.64)

Industry: Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback