AITopics | semantic chunk

Collaborating Authors

semantic chunk

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SCAN: Semantic Document Layout Analysis for Textual and Visual Retrieval-Augmented Generation

Dong, Yuyang, Ueda, Nobuhiro, Boros, Krisztián, Ito, Daiki, Sera, Takuya, Oyamada, Masafumi

arXiv.org Artificial IntelligenceDec-12-2025

With the increasing adoption of Large Language Models (LLMs) and Vision-Language Models (VLMs), rich document analysis technologies for applications like Retrieval-Augmented Generation (RAG) and visual RAG are gaining significant attention. Recent research indicates that using VLMs yields better RAG performance, but processing rich documents remains a challenge since a single page contains large amounts of information. In this paper, we present SCAN (SemantiC Document Layout ANalysis), a novel approach that enhances both textual and visual Retrieval-Augmented Generation (RAG) systems that work with visually rich documents. It is a VLM-friendly approach that identifies document components with appropriate semantic granularity, balancing context preservation with processing efficiency. SCAN uses a coarse-grained semantic approach that divides documents into coherent regions covering contiguous components. We trained the SCAN model by fine-tuning object detection models on an annotated dataset. Our experimental results across English and Japanese datasets demonstrate that applying SCAN improves end-to-end textual RAG performance by up to 9.4 points and visual RAG performance by up to 10.4 points, outperforming conventional approaches and even commercial document processing solutions.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2505.14381

Genre: Research Report > New Finding (0.88)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Semantic Parsing for Question Answering over Knowledge Graphs

Wei, Sijia, Zhang, Wenwen, Li, Qisong, Zhao, Jiang

arXiv.org Artificial IntelligenceJan-27-2024

In this paper, we introduce a novel method with graph-to-segment mapping for question answering over knowledge graphs, which helps understanding question utterances. This method centers on semantic parsing, a key approach for interpreting these utterances. The challenges lie in comprehending implicit entities, relationships, and complex constraints like time, ordinality, and aggregation within questions, contextualized by the knowledge graph. Our framework employs a combination of rule-based and neural-based techniques to parse and construct highly accurate and comprehensive semantic segment sequences. These sequences form semantic query graphs, effectively representing question utterances. We approach question semantic parsing as a sequence generation task, utilizing an encoder-decoder neural network to transform natural language questions into semantic segments. Moreover, to enhance the parsing of implicit entities and relations, we incorporate a graph neural network that leverages the context of the knowledge graph to better understand question representations. Our experimental evaluations on two datasets demonstrate the effectiveness and superior performance of our model in semantic parsing for question answering.

graph, information, sequence, (13 more...)

arXiv.org Artificial Intelligence

2401.06772

Country:

North America > United States > Alaska (0.05)
North America > United States > Texas (0.05)
North America > United States > Rhode Island (0.04)
(11 more...)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Abstractive Summarization of Large Document Collections Using GPT

Liu, Sengjie, Healey, Christopher G.

arXiv.org Artificial IntelligenceOct-9-2023

This paper proposes a method of abstractive summarization designed to scale to document collections instead of individual documents. Our approach applies a combination of semantic clustering, document size reduction within topic clusters, semantic chunking of a cluster's documents, GPT-based summarization and concatenation, and a combined sentiment and text visualization of each topic to support exploratory data analysis. Statistical comparison of our results to existing state-of-the-art systems BART, BRIO, PEGASUS, and MoCa using ROGUE summary scores showed statistically equivalent performance with BART and PEGASUS on the CNN/Daily Mail test dataset, and with BART on the Gigaword test dataset. This finding is promising since we view document collection summarization as more challenging than individual document summarization. We conclude with a discussion of how issues of scale are

abstractive summarization, semantic chunk, summarization, (13 more...)

arXiv.org Artificial Intelligence

2310.0569

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(20 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.48)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Education (0.67)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
(11 more...)

Add feedback