AITopics | table extraction

Collaborating Authors

table extraction

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

RAPTOR: Refined Approach for Product Table Object Recognition

Thomas, Eliott, Coustaty, Mickael, Joseph, Aurelie, Deloin, Gaspar, Carel, Elodie, D'Andecy, Vincent Poulain, Ogier, Jean-Marc

arXiv.org Artificial IntelligenceFeb-24-2025

Extracting tables from documents is a critical task across various industries, especially on business documents like invoices and reports. Existing systems based on DEtection TRansformer (DETR) such as TAble TRansformer (TATR), offer solutions for T able Detection (TD) and T able Structure Recognition (TSR) but face challenges with diverse table formats and common errors like incorrect area detection and overlapping columns. This research introduces RAPTOR, a modular post-processing system designed to enhance state-of-the-art models for improved table extraction, particularly for product tables. RAPTOR, addresses recurrent TD and TSR issues, improving both precision and structural predictions. F or TD, we use DETR (trained on ICDAR 2019) and TATR (trained on PubT ables-1M and FinT abNet), while TSR only relies on TATR. A Genetic Algorithm is incorporated to optimize RAPTOR's module parameters, using a private dataset of product tables to align with industrial needs. W e evaluate our method on two private datasets of product tables, the public DOCILE dataset (which contains tables similar to our target product tables), and the ICDAR 2013 and ICDAR 2019 datasets. The results demonstrate that while our approach excels at product tables, it also maintains reasonable performance across diverse table formats.

corpusid, dataset, semanticscholar, (14 more...)

arXiv.org Artificial Intelligence

2502.14918

Country: Europe > France (0.05)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

H-STAR: LLM-driven Hybrid SQL-Text Adaptive Reasoning on Tables

Abhyankar, Nikhil, Gupta, Vivek, Roth, Dan, Reddy, Chandan K.

arXiv.org Artificial IntelligenceJun-29-2024

Tabular reasoning involves interpreting unstructured queries against structured tables, requiring a synthesis of textual understanding and symbolic reasoning. Existing methods rely on either of the approaches and are constrained by their respective limitations. Textual reasoning excels in semantic interpretation unlike symbolic reasoning (SQL logic), but falls short in mathematical reasoning where SQL excels. In this paper, we introduce a novel algorithm H-STAR, comprising table extraction and adaptive reasoning, integrating both symbolic and semantic (text-based) approaches. To enhance evidence extraction, H-STAR employs a multi-view approach, incorporating step-by-step row and column retrieval. It also adapts reasoning strategies based on question types, utilizing symbolic reasoning for quantitative and logical tasks, and semantic reasoning for direct lookup and complex lexical queries. Our extensive experiments demonstrate that H-STAR significantly outperforms state-of-the-art methods across three tabular question-answering (QA) and fact-verification datasets, underscoring its effectiveness and efficiency.

dataset, h-star, reasoning, (14 more...)

arXiv.org Artificial Intelligence

2407.05952

Country:

North America > United States > New York (0.05)
North America > United States > Virginia (0.04)
North America > United States > Pennsylvania (0.04)
(2 more...)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

Table Detection, Information Extraction and Structuring using Deep Learning

#artificialintelligenceApr-7-2021, 10:37:58 GMT

The amount of data being collected is drastically increasing day-by-day with lots of applications, tools, and online platforms booming in the present technological era. To handle and access this humongous data productively, it's necessary to develop valuable information extraction tools. One of the sub-areas that's demanding attention in the Information Extraction field is the fetching and accessing of data from tabular forms. To explain this in a subtle way, imagine you have lots of paperwork and documents where you would be using tables, and using the same, you would like to manipulate data. Conventionally, you can copy them manually (onto a paper) or load them into excel sheets. However, with table extraction, no sooner have you sent tables as pictures to the computer than it extracts all the information and stacks them into a neat document. This saves an ample of time and is less erroneous. As discussed in the previous section, tables are used frequently to represent data in a clean format. We can see them so often across several areas, from organizing our work by structuring data across tables to storing huge assets of companies.

extraction, information, table extraction, (14 more...)

#artificialintelligence

Genre: Workflow (0.70)

Industry: Health & Medicine > Therapeutic Area (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.91)
Information Technology > Data Science > Data Mining > Text Mining (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Bringing IBM NLP capabilities to the CORD-19 Dataset

#artificialintelligenceJul-14-2020, 20:00:55 GMT

To assist in the fight against the COVID-19 pandemic, prominent research institutes led by Allen Institute for AI (AI2) released earlier this year the COVID-19 Open Research Dataset (CORD-19). Comprised of scientific articles related to COVID-19, Sars-Cov-2, and related coronaviruses, the dataset (which at the time of writing this contains more than 75,000 full text scientific papers) is intended to mobilize researchers to apply recent advances in natural language processing to generate new insights in support of the fight against this infectious disease (1,2). While a tremendous resource, the dataset initially did not include information found in tables due to the difficulty of extracting tabular data. However, following the launch of the Kaggle challenge associated with CORD-19, table information rose to become the most requested feature by challenge participants. Recognizing that critical scientific facts and data are often organized in a tabular format, IBM Research AI offered to apply our extensive experience in document and table conversion to update the CORD-19 dataset and, in turn, open up additional critical information to the global science and medical community in efforts to fight COVID-19.

artificial intelligence, information, natural language, (14 more...)

#artificialintelligence

Country:

North America > United States > New York (0.05)
Europe > United Kingdom > England > Greater London > London (0.05)
Europe > Switzerland > Geneva > Geneva (0.05)
Asia > India (0.05)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback