AITopics | Liu, Qingwen

Collaborating Authors

Liu, Qingwen

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Squeeze Out Tokens from Sample for Finer-Grained Data Governance

Lin, Weixiong, Ju, Chen, Wang, Haicheng, Hu, Shengchao, Xiao, Shuai, Chen, Mengting, Jiao, Yuheng, Yao, Mingshuai, Lan, Jinsong, Liu, Qingwen, Chen, Ying

arXiv.org Artificial IntelligenceMar-18-2025

Widely observed data scaling laws, in which error falls off as a power of the training size, demonstrate the diminishing returns of unselective data expansion. Hence, data governance is proposed to downsize datasets through pruning non-informative samples. Yet, isolating the impact of a specific sample on overall model performance is challenging, due to the vast computation required for tryout all sample combinations. Current data governors circumvent this complexity by estimating sample contributions through heuristic-derived scalar scores, thereby discarding low-value ones. Despite thorough sample sieving, retained samples contain substantial undesired tokens intrinsically, underscoring the potential for further compression and purification. In this work, we upgrade data governance from a 'sieving' approach to a 'juicing' one. Instead of scanning for least-flawed samples, our dual-branch DataJuicer applies finer-grained intra-sample governance. It squeezes out informative tokens and boosts image-text alignments. Specifically, the vision branch retains salient image patches and extracts relevant object classes, while the text branch incorporates these classes to enhance captions. Consequently, DataJuicer yields more refined datasets through finer-grained governance. Extensive experiments across datasets demonstrate that DataJuicer significantly outperforms existing DataSieve in image-text retrieval, classification, and dense visual reasoning.

datajuicer, wang, zhang, (13 more...)

arXiv.org Artificial Intelligence

2503.14559

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Quality (0.93)
(2 more...)

Add feedback

SlimGPT: Layer-wise Structured Pruning for Large Language Models

Ling, Gui, Wang, Ziyang, Yan, Yuliang, Liu, Qingwen

arXiv.org Artificial IntelligenceDec-23-2024

Large language models (LLMs) have garnered significant attention for their remarkable capabilities across various domains, whose vast parameter scales present challenges for practical deployment. Structured pruning is an effective method to balance model performance with efficiency, but performance restoration under computational resource constraints is a principal challenge in pruning LLMs. Therefore, we present a low-cost and fast structured pruning method for LLMs named SlimGPT based on the Optimal Brain Surgeon framework. We propose Batched Greedy Pruning for rapid and near-optimal pruning, which enhances the accuracy of head-wise pruning error estimation through grouped Cholesky decomposition and improves the pruning efficiency of FFN via Dynamic Group Size, thereby achieving approximate local optimal pruning results within one hour. Besides, we explore the limitations of layer-wise pruning from the perspective of error accumulation and propose Incremental Pruning Ratio, a non-uniform pruning strategy to reduce performance degradation. Experimental results on the LLaMA benchmark show that SlimGPT outperforms other methods and achieves state-of-the-art results.

large language model, machine learning, pruning, (16 more...)

arXiv.org Artificial Intelligence

2412.1811

Genre:

Overview (0.93)
Research Report > Promising Solution (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Can video generation replace cinematographers? Research on the cinematic language of generated video

Li, Xiaozhe, WU, Kai, Yang, Siyi, Qu, YiZhan, Zhang, Guohua., Chen, Zhiyu, Li, Jiayao, Mu, Jiangchuan, Hu, Xiaobin, Fang, Wen, Xiong, Mingliang, Deng, Hao, Liu, Qingwen, Li, Gang, He, Bin

arXiv.org Artificial IntelligenceDec-16-2024

Recent advancements in text-to-video (T2V) generation have leveraged diffusion models to enhance the visual coherence of videos generated from textual descriptions. However, most research has primarily focused on object motion, with limited attention given to cinematic language in videos, which is crucial for cinematographers to convey emotion and narrative pacing. To address this limitation, we propose a threefold approach to enhance the ability of T2V models to generate controllable cinematic language. Specifically, we introduce a cinematic language dataset that encompasses shot framing, angle, and camera movement, enabling models to learn diverse cinematic styles. Building on this, to facilitate robust cinematic alignment evaluation, we present CameraCLIP, a model fine-tuned on the proposed dataset that excels in understanding complex cinematic language in generated videos and can further provide valuable guidance in the multi-shot composition process. Finally, we propose CLIPLoRA, a cost-guided dynamic LoRA composition method that facilitates smooth transitions and realistic blending of cinematic language by dynamically fusing multiple pre-trained cinematic LoRAs within a single video. Our experiments demonstrate that CameraCLIP outperforms existing models in assessing the alignment between cinematic language and video, achieving an R@1 score of 0.81. Additionally, CLIPLoRA improves the ability for multi-shot composition, potentially bridging the gap between automatically generated videos and those shot by professional cinematographers.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.12223

Genre: Research Report (0.64)

Industry: Media > Film (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

No More Tuning: Prioritized Multi-Task Learning with Lagrangian Differential Multiplier Methods

Cheng, Zhengxing, Huang, Yuheng, Zhang, Zhixuan, Ou, Dan, Liu, Qingwen

arXiv.org Artificial IntelligenceDec-16-2024

Given the ubiquity of multi-task in practical systems, Multi-Task Learning (MTL) has found widespread application across diverse domains. In real-world scenarios, these tasks often have different priorities. For instance, In web search, relevance is often prioritized over other metrics, such as click-through rates or user engagement. Existing frameworks pay insufficient attention to the prioritization among different tasks, which typically adjust task-specific loss function weights to differentiate task priorities. However, this approach encounters challenges as the number of tasks grows, leading to exponential increases in hyper-parameter tuning complexity. Furthermore, the simultaneous optimization of multiple objectives can negatively impact the performance of high-priority tasks due to interference from lower-priority tasks. In this paper, we introduce a novel multi-task learning framework employing Lagrangian Differential Multiplier Methods for step-wise multi-task optimization. It is designed to boost the performance of high-priority tasks without interference from other tasks. Its primary advantage lies in its ability to automatically optimize multiple objectives without requiring balancing hyper-parameters for different tasks, thereby eliminating the need for manual tuning. Additionally, we provide theoretical analysis demonstrating that our method ensures optimization guarantees, enhancing the reliability of the process. We demonstrate its effectiveness through experiments on multiple public datasets and its application in Taobao search, a large-scale industrial search ranking system, resulting in significant improvements across various business metrics.

artificial intelligence, machine learning, objective, (17 more...)

arXiv.org Artificial Intelligence

2412.12092

Country: North America > United States (0.94)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Branches, Assemble! Multi-Branch Cooperation Network for Large-Scale Click-Through Rate Prediction at Taobao

Chen, Xu, Cheng, Zida, Pan, Yuangang, Xiao, Shuai, Liu, Xiaoming, Lan, Jinsong, Liu, Qingwen, Tsang, Ivor W.

arXiv.org Artificial IntelligenceNov-20-2024

Existing click-through rate (CTR) prediction works have studied the role of feature interaction through a variety of techniques. Each interaction technique exhibits its own strength, and solely using one type could constrain the model's capability to capture the complex feature relationships, especially for industrial large-scale data with enormous users and items. Recent research shows that effective CTR models often combine an MLP network with a dedicated feature interaction network in a two-parallel structure. However, the interplay and cooperative dynamics between different streams or branches remain under-researched. In this work, we introduce a novel Multi-Branch Cooperation Network (MBCnet) which enables multiple branch networks to collaborate with each other for better complex feature interaction modeling. Specifically, MBCnet consists of three branches: the Expert-based Feature Grouping and Crossing (EFGC) branch that promotes the model's memorization ability of specific feature fields, the low rank Cross Net branch and Deep branch to enhance both explicit and implicit feature crossing for improved generalization. Among branches, a novel cooperation scheme is proposed based on two principles: branch co-teaching and moderate differentiation. Branch co-teaching encourages well-learned branches to support poorly-learned ones on specific training samples. Moderate differentiation advocates branches to maintain a reasonable level of difference in their feature representations. The cooperation strategy improves learning through mutual knowledge sharing via co-teaching and boosts the discovery of diverse feature interactions across branches. Extensive experiments on large-scale industrial datasets and online A/B test demonstrate MBCnet's superior performance, delivering a 0.09 point increase in CTR, 1.49% growth in deals, and 1.62% rise in GMV. Core codes will be released soon.

artificial intelligence, machine learning, prediction, (14 more...)

arXiv.org Artificial Intelligence

2411.13057

Country:

Asia (0.68)
North America > United States > New York (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

NoiseBoost: Alleviating Hallucination with Noise Perturbation for Multimodal Large Language Models

Wu, Kai, Jiang, Boyuan, Jiang, Zhengkai, He, Qingdong, Luo, Donghao, Wang, Shengzhi, Liu, Qingwen, Wang, Chengjie

arXiv.org Artificial IntelligenceMay-31-2024

Multimodal large language models (MLLMs) contribute a powerful mechanism to understanding visual information building on large language models. However, MLLMs are notorious for suffering from hallucinations, especially when generating lengthy, detailed descriptions for images. Our analysis reveals that hallucinations stem from the inherent summarization mechanism of large language models, leading to excessive dependence on linguistic tokens while neglecting vision information. In this paper, we propose NoiseBoost, a broadly applicable and simple method for alleviating hallucinations for MLLMs through the integration of noise feature perturbations. Noise perturbation acts as a regularizer, facilitating a balanced distribution of attention weights among visual and linguistic tokens. Despite its simplicity, NoiseBoost consistently enhances the performance of MLLMs across common training strategies, including supervised fine-tuning and reinforcement learning. Further, NoiseBoost pioneerly enables semi-supervised learning for MLLMs, unleashing the power of unlabeled data. Comprehensive experiments demonstrate that NoiseBoost improves dense caption accuracy by 8.1% with human evaluation and achieves comparable results with 50% of the data by mining unlabeled data. Code and models are available at https://kaiwu5.github.io/noiseboost.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2405.20081

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

COURIER: Contrastive User Intention Reconstruction for Large-Scale Pre-Train of Image Features

Yang, Jia-Qi, Dai, Chenglei, Dan, OU, Huang, Ju, Zhan, De-Chuan, Liu, Qingwen, Zeng, Xiaoyi, Yang, Yang

arXiv.org Artificial IntelligenceJun-8-2023

With the development of the multi-media internet, visual characteristics have become an important factor affecting user interests. Thus, incorporating visual features is a promising direction for further performance improvements in click-through rate (CTR) prediction. However, we found that simply injecting the image embeddings trained with established pre-training methods only has marginal improvements. We attribute the failure to two reasons: First, The pre-training methods are designed for well-defined computer vision tasks concentrating on semantic features, and they cannot learn personalized interest in recommendations. Secondly, pre-trained image embeddings only containing semantic information have little information gain, considering we already have semantic features such as categories and item titles as inputs in the CTR prediction task. We argue that a pre-training method tailored for recommendation is necessary for further improvements. To this end, we propose a recommendation-aware image pre-training method that can learn visual features from user click histories. Specifically, we propose a user interest reconstruction module to mine visual features related to user interests from behavior histories. We further propose a contrastive training method to avoid collapsing of embedding vectors. We conduct extensive experiments to verify that our method can learn users' visual interests, and our method achieves $0.46\%$ improvement in offline AUC and $0.88\%$ improvement in Taobao online GMV with p-value$<0.01$.

information, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2306.05001

Country:

North America > United States (0.31)
Asia > China (0.29)

Genre: Research Report (1.00)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.88)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.66)

Add feedback

Exercise-Enhanced Sequential Modeling for Student Performance Prediction

Su, Yu (Anhui University) | Liu, Qingwen (iFLYTEK CO.,LTD. ) | Liu, Qi (iFLYTEK CO.,LTD.) | Huang, Zhenya (University of Science and Technology of China ) | Yin, Yu ( University of Science and Technology of China ) | Chen, Enhong ( University of Science and Technology of China ) | Ding, Chris ( University of Science and Technology of China ) | Wei, Si ( University of Science and Technology of China ) | Hu, Guoping (University of Texas at Arlington)

AAAI ConferencesFeb-8-2018

In online education systems, for offering proactive services to students (e.g., personalized exercise recommendation), a crucial demand is to predict student performance (e.g., scores) on future exercising activities. Existing prediction methods mainly exploit the historical exercising records of students, where each exercise is usually represented as the manually labeled knowledge concepts, and the richer information contained in the text description of exercises is still underexplored. In this paper, we propose a novel Exercise-Enhanced Recurrent Neural Network (EERNN) framework for student performance prediction by taking full advantage of both student exercising records and the text of each exercise. Specifically, for modeling the student exercising process, we first design a bidirectional LSTM to learn each exercise representation from its text description without any expertise and information loss. Then, we propose a new LSTM architecture to trace student states (i.e., knowledge states) in their sequential exercising process with the combination of exercise representations. For making final predictions, we design two strategies under EERNN, i.e., EERNNM with Markov property and EERNNA with Attention mechanism. Extensive experiments on large-scale real-world data clearly demonstrate the effectiveness of EERNN framework. Moreover, by incorporating the exercise correlations, EERNN can well deal with the cold start problems from both student and exercise perspectives.

computer based training, deep learning, student, (24 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: Asia > China (0.14)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.88)
Education > Assessment & Standards > Student Performance (0.83)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback