AITopics | Wen, Jingyuan

Collaborating Authors

Wen, Jingyuan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Alleviating LLM-based Generative Retrieval Hallucination in Alipay Search

Shen, Yedan, Wu, Kaixin, Ding, Yuechen, Wen, Jingyuan, Liu, Hong, Zhong, Mingjie, Lin, Zhouhan, Xu, Jia, Mo, Linjian

arXiv.org Artificial IntelligenceMar-26-2025

Generative retrieval (GR) has revolutionized document retrieval with the advent of large language models (LLMs), and LLM-based GR is gradually being adopted by the industry. Despite its remarkable advantages and potential, LLM-based GR suffers from hallucination and generates documents that are irrelevant to the query in some instances, severely challenging its credibility in practical applications. We thereby propose an optimized GR framework designed to alleviate retrieval hallucination, which integrates knowledge distillation reasoning in model training and incorporate decision agent to further improve retrieval precision. Specifically, we employ LLMs to assess and reason GR retrieved query-document (q-d) pairs, and then distill the reasoning data as transferred knowledge to the GR model. Moreover, we utilize a decision agent as post-processing to extend the GR retrieved documents through retrieval model and select the most relevant ones from multi perspectives as the final generative retrieval result. Extensive offline experiments on real-world datasets and online A/B tests on Fund Search and Insurance Search in Alipay demonstrate our framework's superiority and effectiveness in improving search quality and conversion gains.

large language model, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2503.21098

Country: Asia (0.28)

Genre: Research Report (1.00)

Industry:

Banking & Finance > Financial Services (0.73)
Health & Medicine > Therapeutic Area (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real World

Lin, Hongpeng, Ruan, Ludan, Xia, Wenke, Liu, Peiyu, Wen, Jingyuan, Xu, Yixin, Hu, Di, Song, Ruihua, Zhao, Wayne Xin, Jin, Qin, Lu, Zhiwu

arXiv.org Artificial IntelligenceSep-8-2023

To facilitate the research on intelligent and human-like chatbots with multi-modal context, we introduce a new video-based multi-modal dialogue dataset, called TikTalk. We collect 38K videos from a popular video-sharing platform, along with 367K conversations posted by users beneath them. Users engage in spontaneous conversations based on their multi-modal experiences from watching videos, which helps recreate real-world chitchat context. Compared to previous multi-modal dialogue datasets, the richer context types in TikTalk lead to more diverse conversations, but also increase the difficulty in capturing human interests from intricate multi-modal information to generate personalized responses. Moreover, external knowledge is more frequently evoked in our dataset. These facts reveal new challenges for multi-modal dialogue models. We quantitatively demonstrate the characteristics of TikTalk, propose a video-based multi-modal chitchat task, and evaluate several dialogue baselines. Experimental results indicate that the models incorporating large language models (LLM) can generate more diverse responses, while the model utilizing knowledge graphs to introduce external knowledge performs the best overall. Furthermore, no existing model can solve all the above challenges well. There is still a large room for future improvements, even for LLM with visual extensions. Our dataset is available at \url{https://ruc-aimind.github.io/projects/TikTalk/}.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3581783.3612425

2301.0588

Country: Asia > China (0.15)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology (0.46)
Consumer Products & Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

WenLan 2.0: Make AI Imagine via a Multimodal Foundation Model

Fei, Nanyi, Lu, Zhiwu, Gao, Yizhao, Yang, Guoxing, Huo, Yuqi, Wen, Jingyuan, Lu, Haoyu, Song, Ruihua, Gao, Xin, Xiang, Tao, Sun, Hao, Wen, Ji-Rong

arXiv.org Artificial IntelligenceOct-27-2021

The fundamental goal of artificial intelligence (AI) is to mimic the core cognitive activities of human including perception, memory, and reasoning. Although tremendous success has been achieved in various AI research fields (e.g., computer vision and natural language processing), the majority of existing works only focus on acquiring single cognitive ability (e.g., image classification, reading comprehension, or visual commonsense reasoning). To overcome this limitation and take a solid step to artificial general intelligence (AGI), we develop a novel foundation model pre-trained with huge multimodal (visual and textual) data, which is able to be quickly adapted for a broad class of downstream cognitive tasks. Such a model is fundamentally different from the multimodal foundation models recently proposed in the literature that typically make strong semantic correlation assumption and expect exact alignment between image and text modalities in their pre-training data, which is often hard to satisfy in practice thus limiting their generalization abilities. To resolve this issue, we propose to pre-train our foundation model by self-supervised learning with weak semantic correlation data crawled from the Internet and show that state-of-the-art results can be obtained on a wide range of downstream tasks (both single-modal and cross-modal). Particularly, with novel model-interpretability tools developed in this work, we demonstrate that strong imagination ability (even with hints of commonsense) is now possessed by our foundation model. We believe our work makes a transformative stride towards AGI and will have broad impact on various AI+ fields (e.g., neuroscience and healthcare).

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2110.14378

Country:

North America > United States > California (0.14)
Europe > United Kingdom > England (0.14)
Asia > Middle East > Saudi Arabia (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.54)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback