AITopics | Ji, Zhiyuan

Collaborating Authors

Ji, Zhiyuan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MoC: Mixtures of Text Chunking Learners for Retrieval-Augmented Generation System

Zhao, Jihao, Ji, Zhiyuan, Fan, Zhaoxin, Wang, Hanyu, Niu, Simin, Tang, Bo, Xiong, Feiyu, Li, Zhiyu

arXiv.org Artificial IntelligenceMar-12-2025

Retrieval-Augmented Generation (RAG), while serving as a viable complement to large language models (LLMs), often overlooks the crucial aspect of text chunking within its pipeline. This paper initially introduces a dual-metric evaluation method, comprising Boundary Clarity and Chunk Stickiness, to enable the direct quantification of chunking quality. Leveraging this assessment method, we highlight the inherent limitations of traditional and semantic chunking in handling complex contextual nuances, thereby substantiating the necessity of integrating LLMs into chunking process. To address the inherent trade-off between computational efficiency and chunking precision in LLM-based approaches, we devise the granularity-aware Mixture-of-Chunkers (MoC) framework, which consists of a three-stage processing mechanism. Notably, our objective is to guide the chunker towards generating a structured list of chunking regular expressions, which are subsequently employed to extract chunks from the original text. Extensive experiments demonstrate that both our proposed metrics and the MoC framework effectively settle challenges of the chunking task, revealing the chunking kernel while enhancing the performance of the RAG system.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.096

Country: Asia > China (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception

Zhao, Jihao, Ji, Zhiyuan, Feng, Yuchen, Qi, Pengnian, Niu, Simin, Tang, Bo, Xiong, Feiyu, Li, Zhiyu

arXiv.org Artificial IntelligenceNov-25-2024

Retrieval-Augmented Generation (RAG), while serving as a viable complement to large language models (LLMs), often overlooks the crucial aspect of text chunking within its pipeline, which impacts the quality of knowledge-intensive tasks. This paper introduces the concept of Meta-Chunking, which refers to a granularity between sentences and paragraphs, consisting of a collection of sentences within a paragraph that have deep linguistic logical connections. To implement Meta-Chunking, we designed Perplexity (PPL) Chunking, which balances performance and speed, and precisely identifies the boundaries of text chunks by analyzing the characteristics of context perplexity distribution. Additionally, considering the inherent complexity of different texts, we propose a strategy that combines PPL Chunking with dynamic merging to achieve a balance between fine-grained and coarse-grained text chunking. Experiments conducted on eleven datasets demonstrate that Meta-Chunking can more efficiently improve the performance of singlehop and multi-hop question answering based on RAG. For instance, on the 2Wiki-MultihopQA dataset, it outperforms similarity chunking by 1.32 while only consuming 45.8% of the time. Furthermore, through the analysis of models of various scales and types, we observed that PPL Chunking exhibits notable flexibility and adaptability. This is particularly relevant in knowledge-intensive tasks like open-domain question answering (Lazaridou et al., 2022). By integrating two key components: the retriever and the generator, this technology enables more precise responses to input queries (Singh et al., 2021; Lin et al., 2023). While the feasibility of the retrieval-augmentation strategy has been widely demonstrated through practice, its effectiveness heavily relies on the relevance and accuracy of the retrieved documents (Li et al., 2022; Tan et al., 2022). The introduction of excessive redundant or incomplete information through retrieval not only fails to enhance the performance of the generation model but may also lead to a decline in answer quality (Shi et al., 2023; Yan et al., 2024).

chunking, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2410.12788

Country: Asia > China (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback