Li, Jintao
HeadRouter: A Training-free Image Editing Framework for MM-DiTs by Adaptively Routing Attention Heads
Xu, Yu, Tang, Fan, Cao, Juan, Zhang, Yuxin, Kong, Xiaoyu, Li, Jintao, Deussen, Oliver, Lee, Tong-Yee
Diffusion Transformers (DiTs) have exhibited robust capabilities in image generation tasks. However, accurate text-guided image editing for multimodal DiTs (MM-DiTs) still poses a significant challenge. Unlike UNet-based structures that could utilize self/cross-attention maps for semantic editing, MM-DiTs inherently lack support for explicit and consistent incorporated text guidance, resulting in semantic misalignment between the edited results and texts. In this study, we disclose the sensitivity of different attention heads to different image semantics within MM-DiTs and introduce HeadRouter, a training-free image editing framework that edits the source image by adaptively routing the text guidance to different attention heads in MM-DiTs. Furthermore, we present a dual-token refinement module to refine text/image token representations for precise semantic guidance and accurate region expression. Experimental results on multiple benchmarks demonstrate HeadRouter's performance in terms of editing fidelity and image quality.
Let Silence Speak: Enhancing Fake News Detection with Generated Comments from Large Language Models
Nan, Qiong, Sheng, Qiang, Cao, Juan, Hu, Beizhe, Wang, Danding, Li, Jintao
Fake news detection plays a crucial role in protecting social media users and maintaining a healthy news ecosystem. Among existing works, comment-based fake news detection methods are empirically shown as promising because comments could reflect users' opinions, stances, and emotions and deepen models' understanding of fake news. Unfortunately, due to exposure bias and users' different willingness to comment, it is not easy to obtain diverse comments in reality, especially for early detection scenarios. Without obtaining the comments from the ``silent'' users, the perceived opinions may be incomplete, subsequently affecting news veracity judgment. In this paper, we explore the possibility of finding an alternative source of comments to guarantee the availability of diverse comments, especially those from silent users. Specifically, we propose to adopt large language models (LLMs) as a user simulator and comment generator, and design GenFEND, a generated feedback-enhanced detection framework, which generates comments by prompting LLMs with diverse user profiles and aggregating generated comments from multiple subpopulation groups. Experiments demonstrate the effectiveness of GenFEND and further analysis shows that the generated comments cover more diverse users and could even be more effective than actual comments.
FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGAs
Zeng, Shulin, Liu, Jun, Dai, Guohao, Yang, Xinhao, Fu, Tianyu, Wang, Hongyi, Ma, Wenheng, Sun, Hanbo, Li, Shiyao, Huang, Zixiao, Dai, Yadong, Li, Jintao, Wang, Zehao, Zhang, Ruoyu, Wen, Kairui, Ning, Xuefei, Wang, Yu
Transformer-based Large Language Models (LLMs) have made a significant impact on various domains. However, LLMs' efficiency suffers from both heavy computation and memory overheads. Compression techniques like sparsification and quantization are commonly used to mitigate the gap between LLM's computation/memory overheads and hardware capacity. However, existing GPU and transformer-based accelerators cannot efficiently process compressed LLMs, due to the following unresolved challenges: low computational efficiency, underutilized memory bandwidth, and large compilation overheads. This paper proposes FlightLLM, enabling efficient LLMs inference with a complete mapping flow on FPGAs. In FlightLLM, we highlight an innovative solution that the computation and memory overhead of LLMs can be solved by utilizing FPGA-specific resources (e.g., DSP48 and heterogeneous memory hierarchy). We propose a configurable sparse DSP chain to support different sparsity patterns with high computation efficiency. Second, we propose an always-on-chip decode scheme to boost memory bandwidth with mixed-precision support. Finally, to make FlightLLM available for real-world LLMs, we propose a length adaptive compilation method to reduce the compilation overhead. Implemented on the Xilinx Alveo U280 FPGA, FlightLLM achieves 6.0$\times$ higher energy efficiency and 1.8$\times$ better cost efficiency against commercial GPUs (e.g., NVIDIA V100S) on modern LLMs (e.g., LLaMA2-7B) using vLLM and SmoothQuant under the batch size of one. FlightLLM beats NVIDIA A100 GPU with 1.2$\times$ higher throughput using the latest Versal VHK158 FPGA.
Exploiting User Comments for Early Detection of Fake News Prior to Users' Commenting
Nan, Qiong, Sheng, Qiang, Cao, Juan, Zhu, Yongchun, Wang, Danding, Yang, Guang, Li, Jintao, Shu, Kai
Both accuracy and timeliness are key factors in detecting fake news on social media. However, most existing methods encounter an accuracy-timeliness dilemma: Content-only methods guarantee timeliness but perform moderately because of limited available information, while social context-based ones generally perform better but inevitably lead to latency because of social context accumulation needs. To break such a dilemma, a feasible but not well-studied solution is to leverage social contexts (e.g., comments) from historical news for training a detection model and apply it to newly emerging news without social contexts. This requires the model to (1) sufficiently learn helpful knowledge from social contexts, and (2) be well compatible with situations that social contexts are available or not. To achieve this goal, we propose to absorb and parameterize useful knowledge from comments in historical news and then inject it into a content-only detection model. Specifically, we design the Comments Assisted Fake News Detection method (CAS-FEND), which transfers useful knowledge from a comments-aware teacher model to a content-only student model during training. The student model is further used to detect newly emerging fake news. Experiments show that the CAS-FEND student model outperforms all content-only methods and even those with 1/4 comments as inputs, demonstrating its superiority for early detection.
A Dual Prompt Learning Framework for Few-Shot Dialogue State Tracking
Yang, Yuting, Lei, Wenqiang, Huang, Pei, Cao, Juan, Li, Jintao, Chua, Tat-Seng
Dialogue state tracking (DST) module is an important component for task-oriented dialog systems to understand users' goals and needs. Collecting dialogue state labels including slots and values can be costly, especially with the wide application of dialogue systems in more and more new-rising domains. In this paper, we focus on how to utilize the language understanding and generation ability of pre-trained language models for DST. We design a dual prompt learning framework for few-shot DST. Specifically, we consider the learning of slot generation and value generation as dual tasks, and two prompts are designed based on such a dual structure to incorporate task-related knowledge of these two tasks respectively. In this way, the DST task can be formulated as a language modeling task efficiently under few-shot settings. Experimental results on two task-oriented dialogue datasets show that the proposed method not only outperforms existing state-of-the-art few-shot methods, but also can generate unseen slots. It indicates that DST-related knowledge can be probed from PLM and utilized to address low-resource DST efficiently with the help of prompt learning.
MDFEND: Multi-domain Fake News Detection
Nan, Qiong, Cao, Juan, Zhu, Yongchun, Wang, Yanyan, Li, Jintao
Fake news spread widely on social media in various domains, which lead to real-world threats in many aspects like politics, disasters, and finance. Most existing approaches focus on single-domain fake news detection (SFND), which leads to unsatisfying performance when these methods are applied to multi-domain fake news detection. As an emerging field, multi-domain fake news detection (MFND) is increasingly attracting attention. However, data distributions, such as word frequency and propagation patterns, vary from domain to domain, namely domain shift. Facing the challenge of serious domain shift, existing fake news detection techniques perform poorly for multi-domain scenarios. Therefore, it is demanding to design a specialized model for MFND. In this paper, we first design a benchmark of fake news dataset for MFND with domain label annotated, namely Weibo21, which consists of 4,488 fake news and 4,640 real news from 9 different domains. We further propose an effective Multi-domain Fake News Detection Model (MDFEND) by utilizing a domain gate to aggregate multiple representations extracted by a mixture of experts. The experiments show that MDFEND can significantly improve the performance of multi-domain fake news detection. Our dataset and code are available at https://github.com/kennqiang/MDFEND-Weibo21.
Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax
Li, Yu, Wang, Tao, Kang, Bingyi, Tang, Sheng, Wang, Chunfeng, Li, Jintao, Feng, Jiashi
Solving long-tail large vocabulary object detection with deep learning based models is a challenging and demanding task, which is however under-explored.In this work, we provide the first systematic analysis on the underperformance of state-of-the-art models in front of long-tail distribution. We find existing detection methods are unable to model few-shot classes when the dataset is extremely skewed, which can result in classifier imbalance in terms of parameter magnitude. Directly adapting long-tail classification models to detection frameworks can not solve this problem due to the intrinsic difference between detection and classification.In this work, we propose a novel balanced group softmax (BAGS) module for balancing the classifiers within the detection frameworks through group-wise training. It implicitly modulates the training process for the head and tail classes and ensures they are both sufficiently trained, without requiring any extra sampling for the instances from the tail classes.Extensive experiments on the very recent long-tail large vocabulary object recognition benchmark LVIS show that our proposed BAGS significantly improves the performance of detectors with various backbones and frameworks on both object detection and instance segmentation. It beats all state-of-the-art methods transferred from long-tail image classification and establishes new state-of-the-art.Code is available at https://github.com/FishYuLi/BalancedGroupSoftmax.
SOML: Sparse Online Metric Learning with Application to Image Retrieval
Gao, Xingyu (Chinese Academy of Sciences and Nanyang Technological University) | Hoi, Steven C.H. (Nanyang Technological University) | Zhang, Yongdong (Chinese Academy of Sciences) | Wan, Ji (Chinese Academy of Sciences and Nanyang Technological University) | Li, Jintao (Chinese Academy of Sciences)
Image similarity search plays a key role in many multimediaapplications, where multimedia data (such as images and videos) areusually represented in high-dimensional feature space. In thispaper, we propose a novel Sparse Online Metric Learning (SOML)scheme for learning sparse distance functions from large-scalehigh-dimensional data and explore its application to imageretrieval. In contrast to many existing distance metric learningalgorithms that are often designed for low-dimensional data, theproposed algorithms are able to learn sparse distance metrics fromhigh-dimensional data in an efficient and scalable manner. Ourexperimental results show that the proposed method achieves betteror at least comparable accuracy performance than thestate-of-the-art non-sparse distance metric learning approaches, butenjoys a significant advantage in computational efficiency andsparsity, making it more practical for real-world applications.