Not enough data to create a plot.
Try a different view from the menu above.
Jiang, Yong
Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent
Li, Yangning, Li, Yinghui, Wang, Xinyu, Jiang, Yong, Zhang, Zhen, Zheng, Xinran, Wang, Hui, Zheng, Hai-Tao, Xie, Pengjun, Yu, Philip S., Huang, Fei, Zhou, Jingren
Multimodal Retrieval Augmented Generation (mRAG) plays an important role in mitigating the "hallucination" issue inherent in multimodal large language models (MLLMs). Although promising, existing heuristic mRAGs typically predefined fixed retrieval processes, which causes two issues: (1) Non-adaptive Retrieval Queries. (2) Overloaded Retrieval Queries. However, these flaws cannot be adequately reflected by current knowledge-seeking visual question answering (VQA) datasets, since the most required knowledge can be readily obtained with a standard two-step retrieval. To bridge the dataset gap, we first construct Dyn-VQA dataset, consisting of three types of "dynamic" questions, which require complex knowledge retrieval strategies variable in query, tool, and time: (1) Questions with rapidly changing answers. (2) Questions requiring multi-modal knowledge. (3) Multi-hop questions. Experiments on Dyn-VQA reveal that existing heuristic mRAGs struggle to provide sufficient and precisely relevant knowledge for dynamic questions due to their rigid retrieval processes. Hence, we further propose the first self-adaptive planning agent for multimodal retrieval, OmniSearch. The underlying idea is to emulate the human behavior in question solution which dynamically decomposes complex multimodal questions into sub-question chains with retrieval action. Extensive experiments prove the effectiveness of our OmniSearch, also provide direction for advancing mRAG. The code and dataset will be open-sourced at https://github.com/Alibaba-NLP/OmniSearch.
Exploring Knowledge Boundaries in Large Language Models for Retrieval Judgment
Zhang, Zhen, Wang, Xinyu, Jiang, Yong, Chen, Zhuo, Mu, Feiteng, Hu, Mengting, Xie, Pengjun, Huang, Fei
Large Language Models (LLMs) are increasingly recognized for their practical applications. However, these models often encounter challenges in dynamically changing knowledge, as well as in managing unknown static knowledge. Retrieval-Augmented Generation (RAG) tackles this challenge and has shown a significant impact on LLMs. Actually, we find that the impact of RAG on the question answering capabilities of LLMs can be categorized into three groups: beneficial, neutral, and harmful. By minimizing retrieval requests that yield neutral or harmful results, we can effectively reduce both time and computational costs, while also improving the overall performance of LLMs. This insight motivates us to differentiate between types of questions using certain metrics as indicators, to decrease the retrieval ratio without compromising performance. In our work, we propose a method that is able to identify different types of questions from this view by training a Knowledge Boundary Model (KBM). Experiments conducted on 11 English and Chinese datasets illustrate that the KBM effectively delineates the knowledge boundary, significantly decreasing the proportion of retrievals required for optimal end-to-end performance. Specifically, we evaluate the effectiveness of KBM in three complex scenarios: dynamic knowledge, long-tail static knowledge, and multi-hop problems, as well as its functionality as an external LLM plug-in. As Large Language Models (LLMs) evolve, their real-world applications expand, yet they often struggle with dynamically changing and unknown static knowledge, leading to inaccuracies or hallucinations (Rawte et al., 2023). Retrieval-Augmented Generation (RAG) effectively addresses these challenges by retrieving relevant external information in real time, enhancing LLMs' ability to provide accurate responses. While RAG can significantly boost performance, it also incurs costs, such as increased retrieval requests and longer response times.
Benchmarking Agentic Workflow Generation
Qiao, Shuofei, Fang, Runnan, Qiu, Zhisong, Wang, Xiaobin, Zhang, Ningyu, Jiang, Yong, Xie, Pengjun, Huang, Fei, Chen, Huajun
Large Language Models (LLMs), with their exceptional ability to handle a wide range of tasks, have driven significant advancements in tackling reasoning and planning tasks, wherein decomposing complex problems into executable workflows is a crucial step in this process. Existing workflow evaluation frameworks either focus solely on holistic performance or suffer from limitations such as restricted scenario coverage, simplistic workflow structures, and lax evaluation standards. To this end, we introduce WorFBench, a unified workflow generation benchmark with multi-faceted scenarios and intricate graph workflow structures. Additionally, we present WorFEval, a systemic evaluation protocol utilizing subsequence and subgraph matching algorithms to accurately quantify the LLM agent's workflow generation capabilities. Through comprehensive evaluations across different types of LLMs, we discover distinct gaps between the sequence planning capabilities and graph planning capabilities of LLM agents, with even GPT-4 exhibiting a gap of around 15%. We also train two open-source models and evaluate their generalization abilities on held-out tasks. Furthermore, we observe that the generated workflows can enhance downstream tasks, enabling them to achieve superior performance with less time during inference. Code and dataset are available at https://github.com/zjunlp/WorFBench.
An Adaptive Framework for Generating Systematic Explanatory Answer in Online Q&A Platforms
Chen, Ziyang, Wang, Xiaobin, Jiang, Yong, Liao, Jinzhi, Xie, Pengjun, Huang, Fei, Zhao, Xiang
Question Answering (QA) systems face challenges in handling complex questions that require multi-domain knowledge synthesis. The naive RAG models, although effective in information retrieval, struggle with complex questions that require comprehensive and in-depth answers. The pioneering task is defined as explanatory answer generation, which entails handling identified challenges such as the requirement for comprehensive information and logical coherence within the generated context. To address these issues, we refer to systematic thinking theory and propose SynthRAG, an innovative framework designed to enhance QA performance. SynthRAG improves on conventional models by employing adaptive outlines for dynamic content structuring, generating systematic information to ensure detailed coverage, and producing customized answers tailored to specific user inquiries. This structured approach guarantees logical coherence and thorough integration of information, yielding responses that are both insightful and methodically organized. Empirical evaluations underscore SynthRAG's effectiveness, demonstrating its superiority in handling complex questions, overcoming the limitations of naive RAG models, and significantly improving answer quality and depth. Furthermore, an online deployment on the Zhihu platform revealed that SynthRAG's answers achieved notable user engagement, with each response averaging 5.73 upvotes and surpassing the performance of 79.8% of human contributors, highlighting the practical relevance and impact of the proposed framework. Our code is available at https://github.com/czy1999/SynthRAG .
Adaptive Selection for Homogeneous Tools: An Instantiation in the RAG Scenario
Mu, Feiteng, Jiang, Yong, Zhang, Liwen, Liu, Chu, Li, Wenjie, Xie, Pengjun, Huang, Fei
Current research on tool learning primarily focuses on selecting the most effective tool from a wide array of options, often overlooking cost-effectiveness, a crucial factor in human problem-solving. In this paper, we address the selection of homogeneous tools by predicting both their performance and the associated cost required to accomplish a given task. We then assign queries to the optimal tools in a cost-effective manner. Our experimental results demonstrate that our method achieves higher performance at a lower cost compared to strong baseline approaches.
Improving Low-Resource Knowledge Tracing Tasks by Supervised Pre-training and Importance Mechanism Fine-tuning
Zhang, Hengyuan, Liu, Zitao, Huang, Shuyan, Shang, Chenming, Zhan, Bojun, Jiang, Yong
Knowledge tracing (KT) aims to estimate student's knowledge mastery based on their historical interactions. Recently, the deep learning based KT (DLKT) approaches have achieved impressive performance in the KT task. These DLKT models heavily rely on the large number of available student interactions. However, due to various reasons such as budget constraints and privacy concerns, observed interactions are very limited in many real-world scenarios, a.k.a, low-resource KT datasets. Directly training a DLKT model on a low-resource KT dataset may lead to overfitting and it is difficult to choose the appropriate deep neural architecture. Therefore, in this paper, we propose a low-resource KT framework called LoReKT to address above challenges. Inspired by the prevalent "pre-training and fine-tuning" paradigm, we aim to learn transferable parameters and representations from rich-resource KT datasets during the pre-training stage and subsequently facilitate effective adaptation to low-resource KT datasets. Specifically, we simplify existing sophisticated DLKT model architectures with purely a stack of transformer decoders. We design an encoding mechanism to incorporate student interactions from multiple KT data sources and develop an importance mechanism to prioritize updating parameters with high importance while constraining less important ones during the fine-tuning stage. We evaluate LoReKT on six public KT datasets and experimental results demonstrate the superiority of our approach in terms of AUC and Accuracy. To encourage reproducible research, we make our data and code publicly available at https://anonymous.4open.science/r/LoReKT-C619.
Improving Retrieval Augmented Open-Domain Question-Answering with Vectorized Contexts
Chen, Zhuo, Wang, Xinyu, Jiang, Yong, Xie, Pengjun, Huang, Fei, Tu, Kewei
In the era of large language models, applying techniques such as Retrieval Augmented Generation can better address Open-Domain Question-Answering problems. Due to constraints including model sizes and computing resources, the length of context is often limited, and it becomes challenging to empower the model to cover overlong contexts while answering questions from open domains. This paper proposes a general and convenient method to covering longer contexts in Open-Domain Question-Answering tasks. It leverages a small encoder language model that effectively encodes contexts, and the encoding applies cross-attention with origin inputs. With our method, the origin language models can cover several times longer contexts while keeping the computing requirements close to the baseline. Our experiments demonstrate that after fine-tuning, there is improved performance across two held-in datasets, four held-out datasets, and also in two In Context Learning settings.
Make Your Home Safe: Time-aware Unsupervised User Behavior Anomaly Detection in Smart Homes via Loss-guided Mask
Xiao, Jingyu, Xu, Zhiyao, Zou, Qingsong, Li, Qing, Zhao, Dan, Fang, Dong, Li, Ruoyu, Tang, Wenxin, Li, Kang, Zuo, Xudong, Hu, Penghui, Jiang, Yong, Weng, Zixuan, Lyv, Michael R.
Smart homes, powered by the Internet of Things, offer great convenience but also pose security concerns due to abnormal behaviors, such as improper operations of users and potential attacks from malicious attackers. Several behavior modeling methods have been proposed to identify abnormal behaviors and mitigate potential risks. However, their performance often falls short because they do not effectively learn less frequent behaviors, consider temporal context, or account for the impact of noise in human behaviors. In this paper, we propose SmartGuard, an autoencoder-based unsupervised user behavior anomaly detection framework. First, we design a Loss-guided Dynamic Mask Strategy (LDMS) to encourage the model to learn less frequent behaviors, which are often overlooked during learning. Second, we propose a Three-level Time-aware Position Embedding (TTPE) to incorporate temporal information into positional embedding to detect temporal context anomaly. Third, we propose a Noise-aware Weighted Reconstruction Loss (NWRL) that assigns different weights for routine behaviors and noise behaviors to mitigate the interference of noise behaviors during inference. Comprehensive experiments on three datasets with ten types of anomaly behaviors demonstrates that SmartGuard consistently outperforms state-of-the-art baselines and also offers highly interpretable results.
RTF: Region-based Table Filling Method for Relational Triple Extraction
An, Ning, Hei, Lei, Jiang, Yong, Meng, Weiping, Hu, Jingjing, Huang, Boran, Ren, Feiliang
Relational triple extraction is crucial work for the automatic construction of knowledge graphs. Existing methods only construct shallow representations from a token or token pair-level. However, previous works ignore local spatial dependencies of relational triples, resulting in a weakness of entity pair boundary detection. To tackle this problem, we propose a novel Region-based Table Filling method (RTF). We devise a novel region-based tagging scheme and bi-directional decoding strategy, which regard each relational triple as a region on the relation-specific table, and identifies triples by determining two endpoints of each region. We also introduce convolution to construct region-level table representations from a spatial perspective which makes triples easier to be captured. In addition, we share partial tagging scores among different relations to improve learning efficiency of relation classifier. Experimental results show that our method achieves state-of-the-art with better generalization capability on three variants of two widely used benchmark datasets.
Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model
Zhang, Hengyuan, Wu, Yanru, Li, Dawei, Yang, Sak, Zhao, Rui, Jiang, Yong, Tan, Fei
Aligned Large Language Models (LLMs) showcase remarkable versatility, capable of handling diverse real-world tasks. Meanwhile, aligned LLMs are also expected to exhibit speciality, excelling in specific applications. However, fine-tuning with extra data, a common practice to gain speciality, often leads to catastrophic forgetting (CF) of previously acquired versatility, hindering the model's performance across diverse tasks. In response to this challenge, we propose CoFiTune, a coarse to fine framework in an attempt to strike the balance between speciality and versatility. At the coarse-grained level, an empirical tree-search algorithm is utilized to pinpoint and update specific modules that are crucial for speciality, while keeping other parameters frozen; at the fine-grained level, a soft-masking mechanism regulates the update to the LLMs, mitigating the CF issue without harming speciality. In an overall evaluation of both speciality and versatility, CoFiTune consistently outperforms baseline methods across diverse tasks and model scales. Compared to the full-parameter SFT, CoFiTune leads to about 14% versatility improvement and marginal speciality loss on a 13B model. Lastly, based on further analysis, we provide a speculative insight into the information forwarding process in LLMs, which helps explain the effectiveness of the proposed method. The code is available at https://github.com/rattlesnakey/CoFiTune.