AITopics | structured document

Collaborating Authors

structured document

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Large Language Models are Pattern Matchers: Editing Semi-Structured and Structured Documents with ChatGPT

Weber, Irene

arXiv.org Artificial IntelligenceSep-11-2024

Large Language Models (LLMs) are extensive artificial neural networks trained on vast amounts of textual data to generate coherent continuations of given prompts. The initial training, which is time-consuming and computationally intensive, is typically followed by additional training phases. Fine-tuning with specific tasks and example responses enables LLMs to solve particular types of problems, while Reinforcement Learning with Human Feedback focuses them on delivering high-quality and socially preferred responses. Research has shown that LLMs can not only produce correct natural and formal language texts conveying plausible contents, but are also capable of reasoning, planning, and simulating other forms of intelligent behaviors. Thus, LLMs offer a wide range of potential applications, the extent of which is still not fully explored. Frequently, LLMs are applied for creating and processing texts, for communicating, planning, and computer programming. LLMs require that all tasks and inputs are provided in a textual format. For many applications, LLMs are prompted with freely phrased, natural language text or program code. Yet, they are also capable of processing texts that are structured such that they represent data or formatted documents.

experiment, llm, seehuber, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.18420/AKWI2024-001

2409.07732

Country:

Europe > Germany > North Rhine-Westphalia > Düsseldorf Region > Düsseldorf (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

PDFTriage: Question Answering over Long, Structured Documents

Saad-Falcon, Jon, Barrow, Joe, Siu, Alexa, Nenkova, Ani, Yoon, David Seunghyun, Rossi, Ryan A., Dernoncourt, Franck

arXiv.org Artificial IntelligenceNov-8-2023

Large Language Models (LLMs) have issues with document question answering (QA) in situations where the document is unable to fit in the small context length of an LLM. To overcome this issue, most existing works focus on retrieving the relevant context from the document, representing them as plain text. However, documents such as PDFs, web pages, and presentations are naturally structured with different pages, tables, sections, and so on. Representing such structured documents as plain text is incongruous with the user's mental model of these documents with rich structure. When a system has to query the document for context, this incongruity is brought to the fore, and seemingly trivial questions can trip up the QA system. To bridge this fundamental gap in handling structured documents, we propose an approach called PDFTriage that enables models to retrieve the context based on either structure or content. Our experiments demonstrate the effectiveness of the proposed PDFTriage-augmented models across several classes of questions where existing retrieval-augmented LLMs fail. To facilitate further research on this fundamental problem, we release our benchmark dataset consisting of 900+ human-generated questions over 80 structured documents from 10 different categories of question types for document QA. Our code and datasets will be released soon on Github.

pdftriage, structured document

arXiv.org Artificial Intelligence

2309.08872

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.60)

Add feedback

From Texts to Structured Documents: The Case of Health Practice Guidelines

Bouffier, Amanda

arXiv.org Artificial IntelligenceSep-25-2007

This paper describes a system capable of semi-automatically filling an XML template from free texts in the clinical domain (practice guidelines). The XML template includes semantic information not explicitly encoded in the text (pairs of conditions and actions/recommendations). Therefore, there is a need to compute the exact scope of conditions over text sequences expressing the required actions. We present in this paper the rules developed for this task. We show that the system yields good performance when applied to the analysis of French practice guidelines.

artificial intelligence, guideline, natural language, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-540-76298-0_69

0709.4015

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report (0.40)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.70)

Add feedback