AITopics | Liu, Yukun

Collaborating Authors

Liu, Yukun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MTPareto: A MultiModal Targeted Pareto Framework for Fake News Detection

Yan, Kaiying, Liu, Moyang, Liu, Yukun, Fu, Ruibo, Wen, Zhengqi, Tao, Jianhua, Liu, Xuefei, Li, Guanjun

arXiv.org Artificial IntelligenceJan-12-2025

Multimodal fake news detection is essential for maintaining the authenticity of Internet multimedia information. Significant differences in form and content of multimodal information lead to intensified optimization conflicts, hindering effective model training as well as reducing the effectiveness of existing fusion methods for bimodal. To address this problem, we propose the MTPareto framework to optimize multimodal fusion, using a Targeted Pareto(TPareto) optimization algorithm for fusion-level-specific objective learning with a certain focus. Based on the designed hierarchical fusion network, the algorithm defines three fusion levels with corresponding losses and implements all-modal-oriented Pareto gradient integration for each. This approach accomplishes superior multimodal fusion by utilizing the information obtained from intermediate fusion to provide positive effects to the entire process. Experiment results on FakeSV and FVC datasets show that the proposed framework outperforms baselines and the TPareto optimization algorithm achieves 2.40% and 1.89% accuracy improvement respectively.

artificial intelligence, detection, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2501.06764

Country: Asia > China (0.30)

Genre: Research Report (0.82)

Industry: Media > News (0.89)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Fake News Detection and Manipulation Reasoning via Large Vision-Language Models

Jin, Ruihan, Fu, Ruibo, Wen, Zhengqi, Zhang, Shuai, Liu, Yukun, Tao, Jianhua

arXiv.org Artificial IntelligenceJul-2-2024

Fake news becomes a growing threat to information security and public opinion with the rapid sprawl of media manipulation. Therefore, fake news detection attracts widespread attention from academic community. Traditional fake news detection models demonstrate remarkable performance on authenticity binary classification but their ability to reason detailed faked traces based on the news content remains under-explored. Furthermore, due to the lack of external knowledge, the performance of existing methods on fact-related news is questionable, leaving their practical implementation unclear. In this paper, we propose a new multi-media research topic, namely manipulation reasoning. Manipulation reasoning aims to reason manipulations based on news content. To support the research, we introduce a benchmark for fake news detection and manipulation reasoning, referred to as Human-centric and Fact-related Fake News (HFFN). The benchmark highlights the centrality of human and the high factual relevance, with detailed manual annotations. HFFN encompasses four realistic domains with fake news samples generated through three manipulation approaches. Moreover, a Multi-modal news Detection and Reasoning langUage Model (M-DRUM) is presented not only to judge on the authenticity of multi-modal news, but also raise analytical reasoning about potential manipulations. On the feature extraction level, a cross-attention mechanism is employed to extract fine-grained fusion features from multi-modal inputs. On the reasoning level, a large vision-language model (LVLM) serves as the backbone to facilitate fact-related reasoning. A two-stage training framework is deployed to better activate the capacity of identification and reasoning. Comprehensive experiments demonstrate that our model outperforms state-of-the-art (SOTA) fake news detection models and powerful LVLMs like GPT-4 and LLaVA.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2407.02042

Country:

Oceania > Australia (0.16)
North America > United States (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Media > News (1.00)
Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.68)

Add feedback

ICAGC 2024: Inspirational and Convincing Audio Generation Challenge 2024

Fu, Ruibo, Liu, Rui, Qiang, Chunyu, Gao, Yingming, Lu, Yi, Wang, Tao, Li, Ya, Wen, Zhengqi, Zhang, Chen, Bu, Hui, Liu, Yukun, Shi, Shuchen, Qi, Xin, Li, Guanjun

arXiv.org Artificial IntelligenceJul-1-2024

The Inspirational and Convincing Audio Generation Challenge 2024 (ICAGC 2024) is part of the ISCSLP 2024 Competitions and Challenges track. While current text-to-speech (TTS) technology can generate high-quality audio, its ability to convey complex emotions and controlled detail content remains limited. This constraint leads to a discrepancy between the generated audio and human subjective perception in practical applications like companion robots for children and marketing bots. The core issue lies in the inconsistency between high-quality audio generation and the ultimate human subjective experience. Therefore, this challenge aims to enhance the persuasiveness and acceptability of synthesized audio, focusing on human alignment convincing and inspirational audio generation.

artificial intelligence, icagc 2024, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2407.12038

Country: Asia > China (0.47)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.30)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.30)

Add feedback

MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation

Fu, Ruibo, Shi, Shuchen, Guo, Hongming, Wang, Tao, Qiang, Chunyu, Wen, Zhengqi, Tao, Jianhua, Qi, Xin, Lu, Yi, Wang, Xiaopeng, Wang, Zhiyong, Liu, Yukun, Liu, Xuefei, Zhang, Shuai, Li, Guanjun

arXiv.org Artificial IntelligenceJun-15-2024

Foley audio, critical for enhancing the immersive experience in multimedia content, faces significant challenges in the AI-generated content (AIGC) landscape. Despite advancements in AIGC technologies for text and image generation, the foley audio dubbing remains rudimentary due to difficulties in cross-modal scene matching and content correlation. Current text-to-audio technology, which relies on detailed and acoustically relevant textual descriptions, falls short in practical video dubbing applications. Existing datasets like AudioSet, AudioCaps, Clotho, Sound-of-Story, and WavCaps do not fully meet the requirements for real-world foley audio dubbing task. To address this, we introduce the Multi-modal Image and Narrative Text Dubbing Dataset (MINT), designed to enhance mainstream dubbing tasks such as literary story audiobooks dubbing, image/silent video dubbing. Besides, to address the limitations of existing TTA technology in understanding and planning complex prompts, a Foley Audio Content Planning, Generation, and Alignment (CPGA) framework is proposed, which includes a content planning module leveraging large language models for complex multi-modal prompts comprehension. Additionally, the training process is optimized using Proximal Policy Optimization based reinforcement learning, significantly improving the alignment and auditory realism of generated foley audio. Experimental results demonstrate that our approach significantly advances the field of foley audio dubbing, providing robust solutions for the challenges of multi-modal dubbing. Even when utilizing the relatively lightweight GPT-2 model, our framework outperforms open-source multimodal large models such as LLaVA, DeepSeek-VL, and Moondream2. The dataset is available at https://github.com/borisfrb/MINT .

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2406.10591

Country:

Asia > China (0.46)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Media (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio

Xie, Yuankun, Lu, Yi, Fu, Ruibo, Wen, Zhengqi, Wang, Zhiyong, Tao, Jianhua, Qi, Xin, Wang, Xiaopeng, Liu, Yukun, Cheng, Haonan, Ye, Long, Sun, Yi

arXiv.org Artificial IntelligenceMay-15-2024

With the proliferation of Audio Language Model (ALM) based deepfake audio, there is an urgent need for generalized detection methods. ALM-based deepfake audio currently exhibits widespread, high deception, and type versatility, posing a significant challenge to current audio deepfake detection (ADD) models trained solely on vocoded data. To effectively detect ALM-based deepfake audio, we focus on the mechanism of the ALM-based audio generation method, the conversion from neural codec to waveform. We initially construct the Codecfake dataset, an open-source large-scale dataset, including 2 languages, over 1M audio samples, and various test conditions, focus on ALM-based audio detection. As countermeasure, to achieve universal detection of deepfake audio and tackle domain ascent bias issue of original SAM, we propose the CSAM strategy to learn a domain balanced and generalized minima. In our experiments, we first demonstrate that ADD model training with the Codecfake dataset can effectively detects ALM-based audio. Furthermore, our proposed generalization countermeasure yields the lowest average Equal Error Rate (EER) of 0.616% across all test conditions compared to baseline models. The dataset and associated code are available online.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2405.0488

Country: Asia > China (0.29)

Genre: Research Report (0.70)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback