AITopics | Liu, Feng

Collaborating Authors

Liu, Feng

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Analyzable Chain-of-Musical-Thought Prompting for High-Fidelity Music Generation

Lam, Max W. Y., Xing, Yijin, You, Weiya, Wu, Jingcheng, Yin, Zongyu, Jiang, Fuqiang, Liu, Hangyu, Liu, Feng, Li, Xingda, Lu, Wei-Tsung, Chen, Hanyu, Feng, Tong, Zhao, Tianwei, Liu, Chien-Hung, Song, Xuchen, Li, Yang, Zhou, Yahui

arXiv.org Artificial IntelligenceMar-25-2025

Autoregressive (AR) models have demonstrated impressive capabilities in generating high-fidelity music. However, the conventional next-token prediction paradigm in AR models does not align with the human creative process in music composition, potentially compromising the musicality of generated samples. To overcome this limitation, we introduce MusiCoT, a novel chain-of-thought (CoT) prompting technique tailored for music generation. MusiCoT empowers the AR model to first outline an overall music structure before generating audio tokens, thereby enhancing the coherence and creativity of the resulting compositions. By leveraging the contrastive language-audio pretraining (CLAP) model, we establish a chain of "musical thoughts", making MusiCoT scalable and independent of human-labeled data, in contrast to conventional CoT methods. Moreover, MusiCoT allows for in-depth analysis of music structure, such as instrumental arrangements, and supports music referencing -- accepting variable-length audio inputs as optional style references. This innovative approach effectively addresses copying issues, positioning MusiCoT as a vital practical method for music prompting. Our experimental results indicate that MusiCoT consistently achieves superior performance across both objective and subjective metrics, producing music quality that rivals state-of-the-art generation models. Our samples are available at https://MusiCoT.github.io/.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2503.19611

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.46)
Overview > Innovation (0.34)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

MedPlan:A Two-Stage RAG-Based System for Personalized Medical Plan Generation

Hsu, Hsin-Ling, Dao, Cong-Tinh, Wang, Luning, Shuai, Zitao, Phan, Thao Nguyen Minh, Ding, Jun-En, Liao, Chun-Chieh, Hu, Pengfei, Han, Xiaoxue, Hsu, Chih-Ho, Luo, Dongsheng, Peng, Wen-Chih, Liu, Feng, Hung, Fang-Ming, Wu, Chenwei

arXiv.org Artificial IntelligenceMar-22-2025

Despite recent success in applying large language models (LLMs) to electronic health records (EHR), most systems focus primarily on assessment rather than treatment planning. We identify three critical limitations in current approaches: they generate treatment plans in a single pass rather than following the sequential reasoning process used by clinicians; they rarely incorporate patient-specific historical context; and they fail to effectively distinguish between subjective and objective clinical information. Motivated by the SOAP methodology (Subjective, Objective, Assessment, Plan), we introduce MedPlan, a novel framework that structures LLM reasoning to align with real-life clinician workflows. Our approach employs a two-stage architecture that first generates a clinical assessment based on patient symptoms and objective data, then formulates a structured treatment plan informed by this assessment and enriched with patient-specific information through retrieval-augmented generation. Comprehensive evaluation demonstrates that our method significantly outperforms baseline approaches in both assessment accuracy and treatment plan quality.

information, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2503.179

Genre: Research Report > Experimental Study (0.68)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.87)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

OThink-MR1: Stimulating multimodal generalized reasoning capabilities through dynamic reinforcement learning

Liu, Zhiyuan, Zhang, Yuting, Liu, Feng, Zhang, Changwang, Sun, Ying, Wang, Jun

arXiv.org Artificial IntelligenceMar-20-2025

Multimodal Language Models have gained significant traction for their ability to process diverse input data types and generate coherent, contextually relevant outputs across various applications. While supervised fine-tuning (SFT) has been the predominant approach to enhance MLLM capabilities in task-specific optimization, it often falls short in fostering crucial generalized reasoning abilities. Despite the potential of reinforcement learning (RL) to address these limitations, it faces two issues: (1) its generalized capabilities in multimodal tasks remain underexplored. (2) its training constraints such as constant Kullback-Leibler or clamp strategy easily lead to suboptimal bottleneck. To adress these issues, we introduce OThink-MR1, a framework that extends RL to MLLMs, enabling them to achieve deeper understanding and reasoning across multimodal tasks. We design a dynamic Kullback-Leibler strategy that significantly enhances RL performance, surpassing SFT in same-task evaluations. Also, we are the first to reveal that RL exhibits remarkable cross-task generalization capabilities, which shows that models post-trained with RL on one multimodal task can be effectively transfered to another tasks. Finally, extensive experiments demonstrate the great reasoning ability of our proposed OThink-MR1.

machine learning, reasoning task, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2503.16081

Country: Asia > China > Guangdong Province (0.15)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Add feedback

GRU: Mitigating the Trade-off between Unlearning and Retention for Large Language Models

Wang, Yue, Wang, Qizhou, Liu, Feng, Huang, Wei, Du, Yali, Du, Xiaojiang, Han, Bo

arXiv.org Artificial IntelligenceMar-12-2025

Large language model (LLM) unlearning has demonstrated its essential role in removing privacy and copyright-related responses, crucial for their legal and safe applications. However, the pursuit of complete unlearning often comes with substantial costs due to its compromises in their general functionality, leading to a notorious trade-off between unlearning and retention. In examining the update process for unlearning dynamically, we find gradients hold essential information for revealing this trade-off. In particular, we look at the varying relationship between retention performance and directional disparities between gradients during unlearning. It motivates the sculpting of an update mechanism derived from gradients from two sources, i.e., harmful for retention and useful for unlearning. Accordingly, we propose Gradient Rectified Unlearning (GRU), an enhanced unlearning framework controlling the updating gradients in a geometry-focused and optimization-driven manner such that their side impacts on other, unrelated responses can be minimized. Specifically, GRU derives a closed-form solution to project the unlearning gradient onto the orthogonal space of that gradient harmful for retention, ensuring minimal deviation from its original direction under the condition that overall performance is retained. Comprehensive experiments are conducted to demonstrate that GRU, as a general framework, is straightforward to implement and efficiently enhances a range of baseline methods through its adaptable and compatible characteristics. Additionally, experimental results show its broad effectiveness across a diverse set of benchmarks for LLM unlearning.

artificial intelligence, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2503.09117

Genre: Research Report > New Finding (1.00)

Industry: Law (0.88)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder

Zhang, Yitian, Mai, Long, Mahapatra, Aniruddha, Bourgin, David, Hong, Yicong, Casebeer, Jonah, Liu, Feng, Fu, Yun

arXiv.org Artificial IntelligenceMar-11-2025

We present a novel perspective on learning video embedders for generative modeling: rather than requiring an exact reproduction of an input video, an effective embedder should focus on synthesizing visually plausible reconstructions. This relaxed criterion enables substantial improvements in compression ratios without compromising the quality of downstream generative models. Specifically, we propose replacing the conventional encoder-decoder video embedder with an encoder-generator framework that employs a diffusion transformer (DiT) to synthesize missing details from a compact latent space. Therein, we develop a dedicated latent conditioning module to condition the DiT decoder on the encoded video latent embedding. Our experiments demonstrate that our approach enables superior encoding-decoding performance compared to state-of-the-art methods, particularly as the compression ratio increases. To demonstrate the efficacy of our approach, we report results from our video embedders achieving a temporal compression ratio of up to 32x (8x higher than leading video embedders) and validate the robustness of this ultra-compact latent space for text-to-video generation, providing a significant efficiency boost in latent diffusion model training and inference.

decoder, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2503.08665

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.88)

Add feedback

NeuroTree: Hierarchical Functional Brain Pathway Decoding for Mental Health Disorders

Ding, Jun-En, Luo, Dongsheng, Zilverstand, Anna, Liu, Feng

arXiv.org Artificial IntelligenceMar-9-2025

Analyzing functional brain networks using functional magnetic resonance imaging (fMRI) is crucial for understanding psychiatric disorders and addictive behaviors. While existing fMRI-based graph convolutional networks (GCNs) show considerable promise for feature extraction, they often fall short in characterizing complex relationships between brain regions and demographic factors and accounting for interpretable variables linked to psychiatric conditions. We propose NeuroTree to overcome these limitations, integrating a k-hop AGE-GCN with neural ordinary differential equations (ODEs). This framework leverages an attention mechanism to optimize functional connectivity (FC), thereby enhancing dynamic FC feature learning for brain disease classification. Furthermore, NeuroTree effectively decodes fMRI network features into tree structures, which improves the capture of high-order brain regional pathway features and enables the identification of hierarchical neural behavioral patterns essential for understanding disease-related brain subnetworks. Our empirical evaluations demonstrate that NeuroTree achieves state-of-the-art performance across two distinct mental disorder datasets and provides valuable insights into age-related deterioration patterns. These findings underscore the model's efficacy in predicting psychiatric disorders and elucidating their underlying neural mechanisms.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2502.18786

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Data Science > Data Mining (0.88)
(2 more...)

Add feedback

DDAD: A Two-pronged Adversarial Defense Based on Distributional Discrepancy

Zhang, Jiacheng, Rubinstein, Benjamin I. P., Zhang, Jingfeng, Liu, Feng

arXiv.org Artificial IntelligenceMar-3-2025

Statistical adversarial data detection (SADD) detects whether an upcoming batch contains adversarial examples (AEs) by measuring the distributional discrepancies between clean examples (CEs) and AEs. In this paper, we reveal the potential strength of SADD-based methods by theoretically showing that minimizing distributional discrepancy can help reduce the expected loss on AEs. Nevertheless, despite these advantages, SADD-based methods have a potential limitation: they discard inputs that are detected as AEs, leading to the loss of clean information within those inputs. To address this limitation, we propose a two-pronged adversarial defense method, named Distributional-Discrepancy-based Adversarial Defense (DDAD). In the training phase, DDAD first optimizes the test power of the maximum mean discrepancy (MMD) to derive MMD-OPT, and then trains a denoiser by minimizing the MMD-OPT between CEs and AEs. In the inference phase, DDAD first leverages MMD-OPT to differentiate CEs and AEs, and then applies a two-pronged process: (1) directly feeding the detected CEs into the classifier, and (2) removing noise from the detected AEs by the distributional-discrepancy-based denoiser. Extensive experiments show that DDAD outperforms current state-of-the-art (SOTA) defense methods by notably improving clean and robust accuracy on CIFAR-10 and ImageNet-1K against adaptive white-box attacks.

artificial intelligence, distributional discrepancy, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2503.02169

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Vision (0.68)
Information Technology > Sensing and Signal Processing (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

SeisMoLLM: Advancing Seismic Monitoring via Cross-modal Transfer with Pre-trained Large Language Model

Wang, Xinghao, Liu, Feng, Su, Rui, Wang, Zhihui, Bai, Lei, Ouyang, Wanli

arXiv.org Artificial IntelligenceFeb-27-2025

Recent advances in deep learning have revolutionized seismic monitoring, yet developing a foundation model that performs well across multiple complex tasks remains challenging, particularly when dealing with degraded signals or data scarcity. This work presents SeisMoLLM, the first foundation model that utilizes cross-modal transfer for seismic monitoring, to unleash the power of large-scale pre-training from a large language model without requiring direct pre-training on seismic datasets. Through elaborate waveform tokenization and fine-tuning of pre-trained GPT-2 model, SeisMoLLM achieves state-of-the-art performance on the DiTing and STEAD datasets across five critical tasks: back-azimuth estimation, epicentral distance estimation, magnitude estimation, phase picking, and first-motion polarity classification. It attains 36 best results out of 43 task metrics and 12 top scores out of 16 few-shot generalization metrics, with many relative improvements ranging from 10% to 50%. In addition to its superior performance, SeisMoLLM maintains efficiency comparable to or even better than lightweight models in both training and inference. These findings establish SeisMoLLM as a promising foundation model for practical seismic monitoring and highlight cross-modal transfer as an exciting new direction for earthquake studies, showcasing the potential of advanced deep learning techniques to propel seismology research forward.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.1996

Country: Asia > China (0.29)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Robust Dynamic Facial Expression Recognition

Liu, Feng, Wang, Hanyang, Shen, Siyuan

arXiv.org Artificial IntelligenceFeb-22-2025

The study of Dynamic Facial Expression Recognition (DFER) is a nascent field of research that involves the automated recognition of facial expressions in video data. Although existing research has primarily focused on learning representations under noisy and hard samples, the issue of the coexistence of both types of samples remains unresolved. In order to overcome this challenge, this paper proposes a robust method of distinguishing between hard and noisy samples. This is achieved by evaluating the prediction agreement of the model on different sampled clips of the video. Subsequently, methodologies that reinforce the learning of hard samples and mitigate the impact of noisy samples can be employed. Moreover, to identify the principal expression in a video and enhance the model's capacity for representation learning, comprising a key expression re-sampling framework and a dual-stream hierarchical network is proposed, namely Robust Dynamic Facial Expression Recognition (RDFER). The key expression re-sampling framework is designed to identify the key expression, thereby mitigating the potential confusion caused by non-target expressions. RDFER employs two sequence models with the objective of disentangling short-term facial movements and long-term emotional changes. The proposed method has been shown to outperform current State-Of-The-Art approaches in DFER through extensive experimentation on benchmark datasets such as DFEW and FERV39K. A comprehensive analysis provides valuable insights and observations regarding the proposed agreement. This work has significant implications for the field of dynamic facial expression recognition and promotes the further development of the field of noise-consistent robust learning in dynamic facial expression recognition. The code is available from [https://github.com/Cross-Innovation-Lab/RDFER].

artificial intelligence, expression, recognition, (16 more...)

arXiv.org Artificial Intelligence

2502.16129

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.93)

Technology: Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)

Add feedback

Membership Inference Attack Should Move On to Distributional Statistics for Distilled Generative Models

Li, Muxing, Ye, Zesheng, Li, Yixuan, Song, Andy, Zhang, Guangquan, Liu, Feng

arXiv.org Artificial IntelligenceFeb-5-2025

Membership inference attacks (MIAs) determine whether certain data instances were used to train a model by exploiting the differences in how the model responds to seen versus unseen instances. This capability makes MIAs important in assessing privacy leakage within modern generative AI systems. However, this paper reveals an oversight in existing MIAs against \emph{distilled generative models}: attackers can no longer detect a teacher model's training instances individually when targeting the distilled student model, as the student learns from the teacher-generated data rather than its original member data, preventing direct instance-level memorization. Nevertheless, we find that student-generated samples exhibit a significantly stronger distributional alignment with teacher's member data than non-member data. This leads us to posit that MIAs \emph{on distilled generative models should shift from instance-level to distribution-level statistics}. We thereby introduce a \emph{set-based} MIA framework that measures \emph{relative} distributional discrepancies between student-generated data\emph{sets} and potential member/non-member data\emph{sets}, Empirically, distributional statistics reliably distinguish a teacher's member data from non-member data through the distilled model. Finally, we discuss scenarios in which our setup faces limitations.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2502.0297

Country: North America > United States > Wisconsin (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Education > Educational Technology > Educational Software (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback