AITopics | Liu, Zhenghao

Collaborating Authors

Liu, Zhenghao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents

Yu, Shi, Tang, Chaoyue, Xu, Bokai, Cui, Junbo, Ran, Junhao, Yan, Yukun, Liu, Zhenghao, Wang, Shuo, Han, Xu, Liu, Zhiyuan, Sun, Maosong

arXiv.org Artificial IntelligenceOct-14-2024

Retrieval-augmented generation (RAG) is an effective technique that enables large language models (LLMs) to utilize external knowledge sources for generation. However, current RAG systems are solely based on text, rendering it impossible to utilize vision information like layout and images that play crucial roles in realworld multi-modality documents. In this paper, we introduce VisRAG, which tackles this issue by establishing a vision-language model (VLM)-based RAG pipeline. In this pipeline, instead of first parsing the document to obtain text, the document is directly embedded using a VLM as an image and then retrieved to enhance the generation of a VLM. Compared to traditional text-based RAG, VisRAG maximizes the retention and utilization of the data information in the original documents, eliminating the information loss introduced during the parsing process. We collect both open-source and synthetic data to train the retriever in VisRAG and explore a variety of generation methods. Experiments demonstrate that VisRAG outperforms traditional RAG in both the retrieval and generation stages, achieving a 25-39% end-to-end performance gain over traditional textbased RAG pipeline. Further analysis reveals that VisRAG is effective in utilizing training data and demonstrates strong generalization capability, positioning it as a promising solution for RAG on multi-modality documents. Our code and data are available at https://github.com/openbmb/visrag. Trained on massive data, large language models (LLMs) like GPT-4 (Achiam et al., 2023) have shown strong abilities in common NLP tasks using their parametric knowledge (Wei et al., 2022; Zhao et al., 2023). Retrieval-augmented generation (RAG) alleviates this problem by using a knowledge retriever, which has access to a custom outer knowledge base, to supply the LLM with the necessary information for generating outputs (Guu et al., 2020; Lewis et al., 2020; Yu et al., 2023). Opensource RAG frameworks like llamaindex (Liu, 2022) have been developed to facilitate the research and deployment of common RAG pipelines. Typical retrieval-augmented generation (RAG) pipelines are text-based, operating on segmented texts as retrieval units (Yu et al., 2023; Asai et al., 2024; Yan et al., 2024), which we refer to as TextRAG.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2410.10594

Country: North America > United States (0.67)

Genre: Research Report > Promising Solution (0.34)

Industry: Transportation > Air (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Retriever-and-Memory: Towards Adaptive Note-Enhanced Retrieval-Augmented Generation

Wang, Ruobing, Zha, Daren, Yu, Shi, Zhao, Qingfei, Chen, Yuxuan, Wang, Yixuan, Wang, Shuo, Yan, Yukun, Liu, Zhenghao, Han, Xu, Liu, Zhiyuan, Sun, Maosong

arXiv.org Artificial IntelligenceOct-11-2024

Retrieval-Augmented Generation (RAG) mitigates issues of the factual errors and hallucinated outputs generated by Large Language Models (LLMs) in open-domain question-answering tasks (OpenQA) via introducing external knowledge. For complex QA, however, existing RAG methods use LLMs to actively predict retrieval timing and directly use the retrieved information for generation, regardless of whether the retrieval timing accurately reflects the actual information needs, or sufficiently considers prior retrieved knowledge, which may result in insufficient information gathering and interaction, yielding low-quality answers. To address these, we propose a generic RAG approach called Adaptive Note-Enhanced RAG (Adaptive-Note) for complex QA tasks, which includes the iterative information collector, adaptive memory reviewer, and task-oriented generator, while following a new Retriever-and-Memory paradigm. Specifically, Adaptive-Note introduces an overarching view of knowledge growth, iteratively gathering new information in the form of notes and updating them into the existing optimal knowledge structure, enhancing high-quality knowledge interactions. In addition, we employ an adaptive, note-based stop-exploration strategy to decide "what to retrieve and when to stop" to encourage sufficient knowledge exploration. We conduct extensive experiments on five complex QA datasets, and the results demonstrate the superiority and effectiveness of our method and its components. The code and data are at https://github.com/thunlp/Adaptive-Note.

information, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2410.08821

Country:

Europe (1.00)
Asia (0.69)
North America > United States > Maryland (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DyGPrompt: Learning Feature and Time Prompts on Dynamic Graphs

Yu, Xingtong, Liu, Zhenghao, Fang, Yuan, Zhang, Xinming

arXiv.org Artificial IntelligenceJul-2-2024

Dynamic graphs are pervasive in the real world, modeling dynamic relations between objects across various fields. For dynamic graph modeling, dynamic graph neural networks (DGNNs) have emerged as a mainstream technique, which are generally pre-trained on the link prediction task, leaving a significant gap from the objectives of downstream tasks such as node classification. To bridge the gap, prompt-based learning has gained traction on graphs. However, existing efforts focus on static graphs, neglecting the evolution of dynamic graphs. In this paper, we propose DyGPrompt, a novel pre-training and prompting framework for dynamic graph modeling. First, we design dual prompts to address the gap in both task objectives and dynamic variations across pre-training and downstream tasks. Second, we recognize that node and time features mutually characterize each other, and propose dual condition-nets to model the evolving node-time patterns in downstream tasks. Finally, we thoroughly evaluate and analyze DyGPrompt through extensive experiments on three public datasets.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2405.13937

Country: Asia (0.28)

Genre:

Research Report (1.00)
Overview (0.68)

Industry: Education (0.32)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Cleaner Pretraining Corpus Curation with Neural Web Scraping

Xu, Zhipeng, Liu, Zhenghao, Yan, Yukun, Liu, Zhiyuan, Yu, Ge, Xiong, Chenyan

arXiv.org Artificial IntelligenceJun-14-2024

The web contains large-scale, diverse, and abundant information to satisfy the information-seeking needs of humans. Through meticulous data collection, preprocessing, and curation, webpages can be used as a fundamental data resource for language model pretraining. However, when confronted with the progressively revolutionized and intricate nature of webpages, rule-based/feature-based web scrapers are becoming increasingly inadequate. This paper presents a simple, fast, and effective Neural web Scraper (NeuScraper) to help extract primary and clean text contents from webpages. Experimental results show that NeuScraper surpasses the baseline scrapers by achieving more than a 20% improvement, demonstrating its potential in extracting higher-quality data to facilitate the language model pretraining. All of the code is available at https://github.com/OpenMatch/NeuScraper.

data mining, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2402.14652

Country:

Europe (0.68)
Asia > China (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Web (0.95)
(2 more...)

Add feedback

Multi-Evidence based Fact Verification via A Confidential Graph Neural Network

Lan, Yuqing, Liu, Zhenghao, Gu, Yu, Yi, Xiaoyuan, Li, Xiaohua, Yang, Liner, Yu, Ge

arXiv.org Artificial IntelligenceMay-16-2024

Fact verification tasks aim to identify the integrity of textual contents according to the truthful corpus. Existing fact verification models usually build a fully connected reasoning graph, which regards claim-evidence pairs as nodes and connects them with edges. They employ the graph to propagate the semantics of the nodes. Nevertheless, the noisy nodes usually propagate their semantics via the edges of the reasoning graph, which misleads the semantic representations of other nodes and amplifies the noise signals. To mitigate the propagation of noisy semantic information, we introduce a Confidential Graph Attention Network (CO-GAT), which proposes a node masking mechanism for modeling the nodes. Specifically, CO-GAT calculates the node confidence score by estimating the relevance between the claim and evidence pieces. Then, the node masking mechanism uses the node confidence scores to control the noise information flow from the vanilla node to the other graph nodes. CO-GAT achieves a 73.59% FEVER score on the FEVER dataset and shows the generalization ability by broadening the effectiveness to the science-specific domain.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2405.10481

Country: Asia > China > Liaoning Province (0.14)

Genre: Research Report (1.00)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

Advancing LLM Reasoning Generalists with Preference Trees

Yuan, Lifan, Cui, Ganqu, Wang, Hanbin, Ding, Ning, Wang, Xingyao, Deng, Jia, Shan, Boji, Chen, Huimin, Xie, Ruobing, Lin, Yankai, Liu, Zhenghao, Zhou, Bowen, Peng, Hao, Liu, Zhiyuan, Sun, Maosong

arXiv.org Artificial IntelligenceApr-2-2024

We introduce Eurus, a suite of large language models (LLMs) optimized for reasoning. Finetuned from Mistral-7B and CodeLlama-70B, Eurus models achieve state-of-the-art results among open-source models on a diverse set of benchmarks covering mathematics, code generation, and logical reasoning problems. Notably, Eurus-70B beats GPT-3.5 Turbo in reasoning through a comprehensive benchmarking across 12 tests covering five tasks, and achieves a 33.3% pass@1 accuracy on LeetCode and 32.6% on TheoremQA, two challenging benchmarks, substantially outperforming existing open-source models by margins more than 13.3%. The strong performance of Eurus can be primarily attributed to UltraInteract, our newly-curated large-scale, high-quality alignment dataset specifically designed for complex reasoning tasks. UltraInteract can be used in both supervised fine-tuning and preference learning. For each instruction, it includes a preference tree consisting of (1) reasoning chains with diverse planning strategies in a unified format, (2) multi-turn interaction trajectories with the environment and the critique, and (3) pairwise data to facilitate preference learning. UltraInteract allows us to conduct an in-depth exploration of preference learning for reasoning tasks. Our investigation reveals that some well-established preference learning algorithms may be less suitable for reasoning tasks compared to their effectiveness in general conversations. Inspired by this, we derive a novel reward modeling objective which, together with UltraInteract, leads to a strong reward model.

arxiv preprint, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2404.02078

Country: North America > United States > Illinois (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization

Yang, Zhiyu, Zhou, Zihan, Wang, Shuo, Cong, Xin, Han, Xu, Yan, Yukun, Liu, Zhenghao, Tan, Zhixing, Liu, Pengyuan, Yu, Dong, Liu, Zhiyuan, Shi, Xiaodong, Sun, Maosong

arXiv.org Artificial IntelligenceMar-19-2024

Scientific data visualization plays a crucial role in research by enabling the direct display of complex information and assisting researchers in identifying implicit patterns. Despite its importance, the use of Large Language Models (LLMs) for scientific data visualization remains rather unexplored. In this study, we introduce MatPlotAgent, an efficient model-agnostic LLM agent framework designed to automate scientific data visualization tasks. Leveraging the capabilities of both code LLMs and multi-modal LLMs, MatPlotAgent consists of three core modules: query understanding, code generation with iterative debugging, and a visual feedback mechanism for error correction. To address the lack of benchmarks in this field, we present MatPlotBench, a high-quality benchmark consisting of 100 human-verified test cases. Additionally, we introduce a scoring approach that utilizes GPT-4V for automatic evaluation. Experimental results demonstrate that MatPlotAgent can improve the performance of various LLMs, including both commercial and open-source models. Furthermore, the proposed evaluation method shows a strong correlation with human-annotated scores.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2402.11453

Country:

North America (0.46)
Europe (0.46)
Asia > China (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Add feedback

Modeling User Viewing Flow Using Large Language Models for Article Recommendation

Liu, Zhenghao, Chen, Zulong, Zhang, Moufeng, Duan, Shaoyang, Wen, Hong, Li, Liangyue, Li, Nan, Gu, Yu, Yu, Ge

arXiv.org Artificial IntelligenceMar-7-2024

This paper proposes the User Viewing Flow Modeling (SINGLE) method for the article recommendation task, which models the user constant preference and instant interest from user-clicked articles. Specifically, we first employ a user constant viewing flow modeling method to summarize the user's general interest to recommend articles. In this case, we utilize Large Language Models (LLMs) to capture constant user preferences from previously clicked articles, such as skills and positions. Then we design the user instant viewing flow modeling method to build interactions between user-clicked article history and candidate articles. It attentively reads the representations of user-clicked articles and aims to learn the user's different interest views to match the candidate article. Our experimental results on the Alibaba Technology Association (ATA) website show the advantage of SINGLE, achieving a 2.4% improvement over previous baseline models in the online A/B test. Our further analyses illustrate that SINGLE has the ability to build a more tailored recommendation system by mimicking different article viewing behaviors of users and recommending more appropriate and diverse articles to match user interests.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2311.07619

Country: Asia > China > Liaoning Province (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Say More with Less: Understanding Prompt Learning Behaviors through Gist Compression

Li, Xinze, Liu, Zhenghao, Xiong, Chenyan, Yu, Shi, Yan, Yukun, Wang, Shuo, Yu, Ge

arXiv.org Artificial IntelligenceFeb-25-2024

Large language models (LLMs) require lengthy prompts as the input context to produce output aligned with user intentions, a process that incurs extra costs during inference. In this paper, we propose the Gist COnditioned deCOding (Gist-COCO) model, introducing a novel method for compressing prompts which also can assist the prompt interpretation and engineering. Gist-COCO employs an encoder-decoder based language model and then incorporates an additional encoder as a plugin module to compress prompts with inputs using gist tokens. It finetunes the compression plugin module and uses the representations of gist tokens to emulate the raw prompts in the vanilla language model. By verbalizing the representations of gist tokens into gist prompts, the compression ability of Gist-COCO can be generalized to different LLMs with high compression rates. Our experiments demonstrate that Gist-COCO outperforms previous prompt compression models in both passage and instruction compression tasks. Further analysis on gist verbalization results suggests that our gist prompts serve different functions in aiding language models. They may directly provide potential answers, generate the chain-of-thought, or simply repeat the inputs. All data and codes are available at https://github.com/OpenMatch/Gist-COCO .

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2402.16058

Country:

Asia > China (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

From Text to CQL: Bridging Natural Language and Corpus Search Engine

Lu, Luming, An, Jiyuan, Wang, Yujie, yang, Liner, Kong, Cunliang, Liu, Zhenghao, Wang, Shuo, Lin, Haozhe, Fang, Mingwei, Huang, Yaping, Yang, Erhong

arXiv.org Artificial IntelligenceFeb-21-2024

Natural Language Processing (NLP) technologies have revolutionized the way we interact with information systems, with a significant focus on converting natural language queries into formal query languages such as SQL. However, less emphasis has been placed on the Corpus Query Language (CQL), a critical tool for linguistic research and detailed analysis within text corpora. The manual construction of CQL queries is a complex and time-intensive task that requires a great deal of expertise, which presents a notable challenge for both researchers and practitioners. This paper presents the first text-to-CQL task that aims to automate the translation of natural language into CQL. We present a comprehensive framework for this task, including a specifically curated large-scale dataset and methodologies leveraging large language models (LLMs) for effective text-to-CQL task. In addition, we established advanced evaluation metrics to assess the syntactic and semantic accuracy of the generated queries. We created innovative LLM-based conversion approaches and detailed experiments. The results demonstrate the efficacy of our methods and provide insights into the complexities of text-to-CQL task.

artificial intelligence, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2402.1374

Country:

Asia > China (0.14)
North America > United States (0.14)
Europe > Belgium (0.14)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback