AITopics | extraction task

Collaborating Authors

extraction task

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ClinBench: A Standardized Multi-Domain Framework for Evaluating Large Language Models in Clinical Information Extraction

Neural Information Processing SystemsJun-12-2026, 11:35:12 GMT

Large Language Models (LLMs) offer substantial promise for clinical natural language processing (NLP); however, a lack of standardized benchmarking methodologies limits their objective evaluation and practical translation. To address this gap, we introduce ClinBench, an open-source, multi-model, multi-domain benchmarking framework. ClinBench is designed for the rigorous evaluation of LLMs on important structured information extraction tasks (e.g., tumor staging, histologic diagnoses, atrial fibrillation, and social determinants of health) from unstructured clinical notes. The framework standardizes the evaluation pipeline by: (i) operating on consistently structured input datasets; (ii) employing dynamic, YAML-based prompting for uniform task definition; and (iii) enforcing output validation via JSON schemas, supporting robust comparison across diverse LLM architectures. We demonstrate ClinBench through a large-scale study of 11 prominent LLMs (e.g., GPT-4o series, LLaMA3 variants, Mixtral) across three clinical domains using configurations of public datasets (TCGA for lung cancer, MIMIC-IV-ECG for atrial fibrillation, and MIMIC notes for SDOH). Our results reveal significant performance-efficiency trade-offs. For example, when averaged across the four benchmarked clinical extraction tasks, GPT-3.5-turbo

large language model, machine learning, natural language, (12 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)

Add feedback

d4e1c24ac41ff0b82ca1b171731f0b23-Paper-Conference.pdf

Neural Information Processing SystemsApr-29-2026, 21:49:51 GMT

computational linguistic, large language model, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment > Sports > Football (1.00)
Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)

Add feedback

From Cloze to Comprehension: Retrofitting Pre-trained Masked Language Models to Pre-trained Machine Reader

Neural Information Processing SystemsFeb-17-2026, 07:57:02 GMT

Discriminative methods were used to execute such tasks and achieved state-of-the-art performance.

computational linguistic, large language model, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > Syria (0.05)
Asia > Japan (0.05)
(8 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment > Sports > Football (1.00)
Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)

Add feedback

Supervised Fine Tuning of Large Language Models for Domain Specific Knowledge Graph Construction:A Case Study on Hunan's Historical Celebrities

Hao, Junjie, Wang, Chun, Qiao, Ying, Zuo, Qiuyue, Song, Qiya, Ma, Hua, Gao, Xieping

arXiv.org Artificial IntelligenceNov-24-2025

Large language models and knowledge graphs hold broad application potential in the field of historical culture, facilitating the excavation, research, and comprehension of cultural heritage. Taking Hunan's historical celebrities emerging from modern Huxiang culture as a case, pre-trained large models can assist researchers in rapidly extracting specific historical figure information from literature--including basic details, life events, and social relationships--and constructing structured knowledge graphs, thereby supporting related research. Currently, systematic data collection on Hunan's historical celebrities remains scarce. Moreover, general-purpose large language models often exhibit insufficient domain knowledge extraction accuracy and weak structured output capabilities in such low-resource scenarios. Therefore, this paper proposes a supervised fine-tuning approach for domain-specific large models to enhance the quality and efficiency of information extraction regarding Hunan's historical celebrities. Specifically, this paper first designs a fine-grained schema-guided instruction fine-tuning template for the Hunan's historical celebrities domain. Using this template, we construct an instruction fine-tuning dataset, addressing the current lack of instruction datasets in domain-specific model fine-tuning. Second,we conducted parameter-efficient instruction fine-tuning on four publicly available large language models--Qwen2.5-7B, Qwen3-8B, DeepSeek-R1-Distill-Qwen-7B, and Llama-3.1-8B-Instruct--using the proposed instruction dataset, and established evaluation criteria for assessing their performance in character information extraction. Experimental results demonstrate that the performance of all four base models significantly improved after domain-specific fine-tuning. Among them, Qwen3-8B achieved the best performance after training with 100 samples and 50 fine-tuning iterations, scoring 89.3866 on the evaluation metrics. This research offers new insights for fine-tuning vertical large models tailored to regional historical and cultural domains, holding significant implications for promoting the cost-effective application of large models and knowledge graphs in the field of historical and cultural heritage. Introduction With the rapid advancement of large language models (LLMs), unprecedented opportunities have emerged for the in-depth exploration, systematic research, and widespread dissemination of Huxiang culture. Simultaneously, this presents new challenges for the digital transformation of traditional cultural resources[1].

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2511.17012

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.87)

Industry:

Health & Medicine (0.46)
Government (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Evaluating Open-Weight Large Language Models for Structured Data Extraction from Narrative Medical Reports Across Multiple Use Cases and Languages

Spaanderman, Douwe J., Prathaban, Karthik, Zelina, Petr, Mouheb, Kaouther, Hejtmánek, Lukáš, Marzetti, Matthew, Schurink, Antonius W., Chan, Damian, Niemantsverdriet, Ruben, Hartmann, Frederik, Qian, Zhen, Thomeer, Maarten G. J., Holub, Petr, Akram, Farhan, Wolters, Frank J., Vernooij, Meike W., Verhoef, Cornelis, Bron, Esther E., Nováček, Vít, Grünhagen, Dirk J., Niessen, Wiro J., Starmans, Martijn P. A., Klein, Stefan

arXiv.org Artificial IntelligenceNov-17-2025

Large language models (LLMs) are increasingly used to extract structured information from free-text clinical records, but prior work often focuses on single tasks, limited models, and English-language reports. We evaluated 15 open-weight LLMs on pathology and radiology reports across six use cases, colorectal liver metastases, liver tumours, neurodegenerative diseases, soft-tissue tumours, melanomas, and sarcomas, at three institutes in the Netherlands, UK, and Czech Republic. Models included general-purpose and medical-specialised LLMs of various sizes, and six prompting strategies were compared: zero-shot, one-shot, few-shot, chain-of-thought, self-consistency, and prompt graph. Performance was assessed using task-appropriate metrics, with consensus rank aggregation and linear mixed-effects models quantifying variance. Top-ranked models achieved macro-average scores close to inter-rater agreement across tasks. Small-to-medium general-purpose models performed comparably to large models, while tiny and specialised models performed worse. Prompt graph and few-shot prompting improved performance by ~13%. Task-specific factors, including variable complexity and annotation variability, influenced results more than model size or prompting strategy. These findings show that open-weight LLMs can extract structured data from clinical reports across diseases, languages, and institutions, offering a scalable approach for clinical data curation.

large language model, machine learning, use case, (21 more...)

arXiv.org Artificial Intelligence

2511.10658

Country: Europe > United Kingdom > England (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Health Care Technology > Medical Record (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Towards Competent AI for Fundamental Analysis in Finance: A Benchmark Dataset and Evaluation

Wu, Zonghan, Zou, Congyuan, Wang, Junlin, Wang, Chenhan, Yang, Hangjing, Shao, Yilei

arXiv.org Artificial IntelligenceNov-11-2025

Generative AI, particularly large language models (LLMs), is beginning to transform the financial industry by automating tasks and helping to make sense of complex financial information. One especially promising use case is the automatic creation of fundamental analysis reports, which are essential for making informed investment decisions, evaluating credit risks, guiding corporate mergers, etc. While LLMs attempt to generate these reports from a single prompt, the risks of inaccuracy are significant. Poor analysis can lead to misguided investments, regulatory issues, and loss of trust. Existing financial benchmarks mainly evaluate how well LLMs answer financial questions but do not reflect performance in real-world tasks like generating financial analysis reports. In this paper, we propose FinAR-Bench, a solid benchmark dataset focusing on financial statement analysis, a core competence of fundamental analysis. To make the evaluation more precise and reliable, we break this task into three measurable steps: extracting key information, calculating financial indicators, and applying logical reasoning. This structured approach allows us to objectively assess how well LLMs perform each step of the process. Our findings offer a clear understanding of LLMs current strengths and limitations in fundamental analysis and provide a more practical way to benchmark their performance in real-world financial settings.

computational linguistic, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2506.07315

Country:

North America (0.67)
Asia > Middle East (0.28)

Genre: Research Report > New Finding (0.66)

Industry: Banking & Finance > Trading (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Metadata Extraction Leveraging Large Language Models

Han, Cuize, Jalagam, Sesh

arXiv.org Machine LearningOct-23-2025

The advent of Large Language Models has revolutionized tasks across domains, including the automation of legal document analysis, a critical component of modern contract management systems. This paper presents a comprehensive implementation of LLM-enhanced metadata extraction for contract review, focusing on the automatic detection and annotation of salient legal clauses. Leveraging both the publicly available Contract Understanding Atticus Dataset (CUAD) and proprietary contract datasets, our work demonstrates the integration of advanced LLM methodologies with practical applications. We identify three pivotal elements for optimizing metadata extraction: robust text conversion, strategic chunk selection, and advanced LLM-specific techniques, including Chain of Thought (CoT) prompting and structured tool calling. The results from our experiments highlight the substantial improvements in clause identification accuracy and efficiency. Our approach shows promise in reducing the time and cost associated with contract review while maintaining high accuracy in legal clause identification. The results suggest that carefully optimized LLM systems could serve as valuable tools for legal professionals, potentially increasing access to efficient contract review services for organizations of all sizes.

large language model, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2510.19334

Country:

North America > United States > Illinois > Cook County > Chicago (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Mateo County > Redwood City (0.04)
Asia > China > Liaoning Province > Dalian (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

PARSE: LLM Driven Schema Optimization for Reliable Entity Extraction

Shrimal, Anubhav, Jain, Aryan, Chowdhury, Soumyajit, Yenigalla, Promod

arXiv.org Artificial IntelligenceOct-13-2025

Structured information extraction from unstructured text is critical for emerging Software 3.0 systems where LLM agents autonomously interact with APIs and tools. Recent approaches apply large language models directly to extraction tasks using existing JSON schemas, often with constraint decoding or reinforcement learning approaches to ensure syntactic validity, but treat JSON schemas as static contracts designed for human developers, leading to suboptimal extraction performance, frequent hallucinations, and unreliable agent behavior when schemas contain ambiguous or incomplete specifications. We recognize that JSON schemas themselves are a form of natural language understanding contract that encodes rules, relationships, and expectations about data structure contracts that LLMs should be able to both interpret and systematically improve. Consequently, we develop PARSE (Parameter Automated Refinement and Schema Extraction), a novel system with two synergistic components: ARCHITECT, which autonomously optimizes JSON schemas for LLM consumption while maintaining backward compatibility through RELAY (an integrated code generation system), and SCOPE, which implements reflection-based extraction with combined static and LLM-based guardrails. We evaluate PARSE qualitatively and quantitatively on three datasets including Schema-Guided Dialogue (SGD), Structured Web Data Extraction (SWDE), and internal retail conversation data, and find that it achieves up to 64.7% improvement in extraction accuracy on SWDE with combined framework improvements reaching 10% across models, while reducing extraction errors by 92% within the first retry and and maintaining practical latency.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.08623

Country:

North America > United States (0.68)
Asia > Middle East > UAE (0.28)

Genre: Overview (0.93)

Industry:

Transportation > Passenger (0.47)
Transportation > Ground > Road (0.47)
Automobiles & Trucks (0.47)
Consumer Products & Services (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Benchmarking Agentic Systems in Automated Scientific Information Extraction with ChemX

Vepreva, Anastasia, Razlivina, Julia, Eremeeva, Maria, Gubina, Nina, Orlova, Anastasia, Dmitrenko, Aleksei, Kapranova, Ksenya, Jyakhwo, Susan, Vasilev, Nikita, Sarkisyan, Arsen, Chernyshov, Ivan Yu., Vinogradov, Vladimir, Dmitrenko, Andrei

arXiv.org Artificial IntelligenceOct-2-2025

The emergence of agent-based systems represents a significant advancement in artificial intelligence, with growing applications in automated data extraction. However, chemical information extraction remains a formidable challenge due to the inherent heterogeneity of chemical data. Current agent-based approaches, both general-purpose and domain-specific, exhibit limited performance in this domain. To address this gap, we present ChemX, a comprehensive collection of 10 manually curated and domain-expert-validated datasets focusing on nanomaterials and small molecules. These datasets are designed to rigorously evaluate and enhance automated extraction methodologies in chemistry. To demonstrate their utility, we conduct an extensive benchmarking study comparing existing state-of-the-art agentic systems such as ChatGPT Agent and chemical-specific data extraction agents. Additionally, we introduce our own single-agent approach that enables precise control over document preprocessing prior to extraction. We further evaluate the performance of modern baselines, such as GPT-5 and GPT-5 Thinking, to compare their capabilities with agentic approaches. Our empirical findings reveal persistent challenges in chemical information extraction, particularly in processing domain-specific terminology, complex tabular and schematic representations, and context-dependent ambiguities. The ChemX benchmark serves as a critical resource for advancing automated information extraction in chemistry, challenging the generalization capabilities of existing methods, and providing valuable insights into effective evaluation strategies.

data mining, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2510.00795

Country:

Europe > Switzerland (0.28)
Europe > Russia (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Materials > Chemicals (0.93)
Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Data Science > Data Mining > Text Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Extract-0: A Specialized Language Model for Document Information Extraction

Godoy, Henrique

arXiv.org Artificial IntelligenceSep-30-2025

This paper presents Extract-0, a 7-billion parameter language model specifically optimized for document information extraction that achieves performance exceeding models with parameter counts several orders of magnitude larger. Through a novel combination of synthetic data generation, supervised fine-tuning with Low-Rank Adaptation (LoRA), and reinforcement learning via Group Relative Policy Optimization (GRPO), Extract-0 achieves a mean reward of 0.573 on a benchmark of 1,000 diverse document extraction tasks, outperforming GPT-4.1 (0.457), o3 (0.464), and GPT-4.1-2025 (0.459). The training methodology employs a memory-preserving synthetic data generation pipeline that produces 280,128 training examples from diverse document sources, followed by parameterefficient fine-tuning that modifies only 0.53% of model weights (40.4M out of 7.66B parameters). The reinforcement learning phase introduces a novel semantic similarity-based reward function that handles the inherent ambiguity in information extraction tasks. This research demonstrates that task-specific optimization can yield models that surpass general-purpose systems while requiring substantially fewer computational resource.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.22906

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry:

Law (0.47)
Government (0.46)
Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback