Goto

Collaborating Authors

 Fei, Weizhi


Efficient Prompt Compression with Evaluator Heads for Long-Context Transformer Inference

arXiv.org Artificial Intelligence

Large language models (LLMs) have exhibited exceptional capabilities in a variety of real-world tasks and applications, with an increasing need for processing long inputs in areas such as literary novels, legal documents, instruction manuals, and code documentation. Inference tasks that requires understanding of long contexts, such as long document summarization (Zhang et al., 2024), reasoning (Fei et al., 2024a), and autonomous agents (Singh et al., 2024; Chen et al., 2024), are of particular importance due to the high stakes in these scenarios. However, the deployment of LLMs is challenged by the computational and memory demands inherent to transformer-based architectures, resulting in increased latency, particularly when processing lengthy input prompts. Prompt compression, which entails substituting the input prompts provided to a language Table 1: Overall comparison of the proposed model with more succinct versions, has surfaced method in terms of average performance and latency as a promising strategy for enhancing long-text on the LongBench dataset, under the constraint understanding and mitigating associated costs. of a compressed prompt length of 2048 tokens. Current mainstream methods, such as Select-For comprehensive results, please see Table 4 Context (Li et al., 2023), LLMLingua (Jiang and Table 5. et al., 2023a) and LongLLMLingua (Jiang et al., 2023b), typically rely on pre-trained LLMs, utilizing the logits or perplexity of the prompts


Top Ten Challenges Towards Agentic Neural Graph Databases

arXiv.org Artificial Intelligence

Graph databases (GDBs) like Neo4j and TigerGraph excel at handling interconnected data but lack advanced inference capabilities. Neural Graph Databases (NGDBs) address this by integrating Graph Neural Networks (GNNs) for predictive analysis and reasoning over incomplete or noisy data. However, NGDBs rely on predefined queries and lack autonomy and adaptability. This paper introduces Agentic Neural Graph Databases (Agentic NGDBs), which extend NGDBs with three core functionalities: autonomous query construction, neural query execution, and continuous learning. We identify ten key challenges in realizing Agentic NGDBs: semantic unit representation, abductive reasoning, scalable query execution, and integration with foundation models like large language models (LLMs). By addressing these challenges, Agentic NGDBs can enable intelligent, self-improving systems for modern data-driven applications, paving the way for adaptable and autonomous data management solutions.


Retrieval Meets Reasoning: Dynamic In-Context Editing for Long-Text Understanding

arXiv.org Artificial Intelligence

Current Large Language Models (LLMs) face inherent limitations due to their pre-defined context lengths, which impede their capacity for multi-hop reasoning within extensive textual contexts. While existing techniques like Retrieval-Augmented Generation (RAG) have attempted to bridge this gap by sourcing external information, they fall short when direct answers are not readily available. We introduce a novel approach that re-imagines information retrieval through dynamic in-context editing, inspired by recent breakthroughs in knowledge editing. By treating lengthy contexts as malleable external knowledge, our method interactively gathers and integrates relevant information, thereby enabling LLMs to perform sophisticated reasoning steps. Experimental results demonstrate that our method effectively empowers context-limited LLMs, such as Llama2, to engage in multi-hop reasoning with improved performance, which outperforms state-of-the-art context window extrapolation methods and even compares favorably to more advanced commercial long-context models. Our interactive method not only enhances reasoning capabilities but also mitigates the associated training and computational costs, making it a pragmatic solution for enhancing LLMs' reasoning within expansive contexts.


Soft Reasoning on Uncertain Knowledge Graphs

arXiv.org Artificial Intelligence

The further possibilities in data management (Wang et al., 2022; uncertain nature of knowledge is widely observed Ren et al., 2023). in the real world, but does not align seamlessly with the first-order logic underpinning existing Uncertain knowledge is widely observed from the daily studies. To bridge this gap, we study the setting events (Zhang et al., 2020) to the interaction of biological of soft queries on uncertain knowledge, which systems (Szklarczyk et al., 2023). Besides, uncertainty is is motivated by the establishment of soft constraint also particularly pervasive in KGs because KGs are constructed programming. We further propose an MLbased by information extraction models that could introduce approach with both forward inference and errors (Angeli et al., 2015; Ponte & Croft, 2017) backward calibration to answer soft queries on and from large corpses that could be noisy (Carlson et al., large-scale, incomplete, and uncertain knowledge 2010). To represent the uncertain knowledge, confidence graphs. Theoretical discussions present that our values p are associated with triples in many well-established methods share the same complexity as state-ofthe-art KGs (Carlson et al., 2010; Speer et al., 2017; Szklarczyk inference algorithms for first-order queries.


Extending Context Window of Large Language Models via Semantic Compression

arXiv.org Artificial Intelligence

Transformer-based Large Language Models (LLMs) often impose limitations on the length of the text input to ensure the generation of fluent and relevant responses. This constraint restricts their applicability in scenarios involving long texts. We propose a novel semantic compression method that enables generalization to texts that are 6-8 times longer, without incurring significant computational costs or requiring fine-tuning. Our proposed framework draws inspiration from source coding in information theory and employs a pre-trained model to reduce the semantic redundancy of long inputs before passing them to the LLMs for downstream tasks. Experimental results demonstrate that our method effectively extends the context window of LLMs across a range of tasks including question answering, summarization, few-shot learning, and information retrieval. Furthermore, the proposed semantic compression method exhibits consistent fluency in text generation while reducing the associated computational overhead.


$\text{EFO}_{k}$-CQA: Towards Knowledge Graph Complex Query Answering beyond Set Operation

arXiv.org Artificial Intelligence

To answer complex queries on knowledge graphs, logical reasoning over incomplete knowledge is required due to the open-world assumption. Learning-based methods are essential because they are capable of generalizing over unobserved knowledge. Therefore, an appropriate dataset is fundamental to both obtaining and evaluating such methods under this paradigm. In this paper, we propose a comprehensive framework for data generation, model training, and method evaluation that covers the combinatorial space of Existential First-order Queries with multiple variables ($\text{EFO}_{k}$). The combinatorial query space in our framework significantly extends those defined by set operations in the existing literature. Additionally, we construct a dataset, $\text{EFO}_{k}$-CQA, with 741 types of query for empirical evaluation, and our benchmark results provide new insights into how query hardness affects the results. Furthermore, we demonstrate that the existing dataset construction process is systematically biased that hinders the appropriate development of query-answering methods, highlighting the importance of our work. Our code and data are provided in~\url{https://github.com/HKUST-KnowComp/EFOK-CQA}.


Wasserstein-Fisher-Rao Embedding: Logical Query Embeddings with Local Comparison and Global Transport

arXiv.org Artificial Intelligence

Answering complex queries on knowledge graphs is important but particularly challenging because of the data incompleteness. Query embedding methods address this issue by learning-based models and simulating logical reasoning with set operators. Previous works focus on specific forms of embeddings, but scoring functions between embeddings are underexplored. In contrast to existing scoring functions motivated by local comparison or global transport, this work investigates the local and global trade-off with unbalanced optimal transport theory. Specifically, we embed sets as bounded measures in $\real$ endowed with a scoring function motivated by the Wasserstein-Fisher-Rao metric. Such a design also facilitates closed-form set operators in the embedding space. Moreover, we introduce a convolution-based algorithm for linear time computation and a block-diagonal kernel to enforce the trade-off. Results show that WFRE can outperform existing query embedding methods on standard datasets, evaluation sets with combinatorially complex queries, and hierarchical knowledge graphs. Ablation study shows that finding a better local and global trade-off is essential for performance improvement.