AITopics | extraction rule

Collaborating Authors

extraction rule

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Finding Answers in Thought Matters: Revisiting Evaluation on Large Language Models with Reasoning

Jo, Hwiyeol, Lee, Joosung, Lee, Jaehone, Lee, Sang-Woo, Park, Joonsuk, Yoo, Kang Min

arXiv.org Artificial IntelligenceOct-17-2025

Evaluating generative models, such as large language models (LLMs), commonly involves question-answering tasks where the final answer is selected based on probability of answer choices. On the other hand, for models requiring reasoning, the method of answer extraction plays a critical role. Our research reveals that the performance of reasoning models and their final answer distributions are highly sensitive to the answer extraction algorithm employed. In order to mitigate this, we propose a basic framework: Answer Regeneration. The method uses an additional model inference, providing the prior input and output prefaced by the prompt "Answer:". The final answer is then selected or extracted from the regenerated output. We show that this extraction-rule-agnostic approach exhibits improved performance and enhanced robustness. Furthermore, we have applied this framework to general math problems and open-ended question answering tasks. Our analysis and this framework could offer a more reliable results for model evaluation.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2510.14773

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Education (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Benchmarking Large Language Models with Augmented Instructions for Fine-grained Information Extraction

Gao, Jun, Zhao, Huan, Zhang, Yice, Wang, Wei, Yu, Changlong, Xu, Ruifeng

arXiv.org Artificial IntelligenceOct-8-2023

Information Extraction (IE) is an essential task in Natural Language Processing. Traditional methods have relied on coarse-grained extraction with simple instructions. However, with the emergence of Large Language Models (LLMs), there is a need to adapt IE techniques to leverage the capabilities of these models. This paper introduces a fine-grained IE benchmark dataset tailored for LLMs, employing augmented instructions for each information type, which includes task descriptions, extraction rules, output formats, and examples. Through extensive evaluations, we observe that encoder-decoder models, particularly T5 and FLAN-T5, perform well in generalizing to unseen information types, while ChatGPT exhibits greater adaptability to new task forms. Our results also indicate that performance is not solely dictated by model scale, and highlight the significance of architecture, data diversity, and learning techniques. This work paves the way for a more refined and versatile utilization of LLMs in Information Extraction.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2310.05092

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)

Add feedback

Human-in-the-loop Text Extraction System

#artificialintelligenceAug-25-2022, 15:45:46 GMT

In this article, we will talk in-depth about an interactive, human-in-the-loop tool called SEER. SEER helps users who work with such text datasets extract relevant data from them. A user in SEER would highlight examples of text they wish to extract. Positive examples are texts they wish to extract. Negative examples are texts they do not wish to extract.

extraction rule, positive example, seer, (13 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.41)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.41)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.37)

Add feedback

Lessons Learned from Integrating the Human for Data Analytics

#artificialintelligenceDec-11-2021, 04:35:52 GMT

For most technical folks, coding things up is easy. If there is a tool you are not happy with, you can hack one up yourself without much of a hassle. If you want to extract data, you can quickly write up some regular expressions. If you want to combine some CSV files together, you can quickly create the Python script for that. If you need to debug a program, you know the tools and the ins and outs of debugging tools to be able to diagnose the fault of your programs. These technical folks are often the same folks the develop a lot of the software that end-users use.

opération, output datapoint, visualization, (12 more...)

#artificialintelligence

Technology:

Information Technology > Software (0.50)
Information Technology > Data Science (0.41)
Information Technology > Communications > Social Media (0.40)
Information Technology > Artificial Intelligence (0.35)

Add feedback

An Ontology-Based Information Extraction System for Residential Land Use Suitability Analysis

Al-Ageili, Munira, Mouhoub, Malek

arXiv.org Artificial IntelligenceSep-15-2021

We propose an Ontology-Based Information Extraction (OBIE) system to automate the extraction of the criteria and values applied in Land Use Suitability Analysis (LUSA) from bylaw and regulation documents related to the geographic area of interest. The results obtained by our proposed LUSA OBIE system (land use suitability criteria and their values) are presented as an ontology populated with instances of the extracted criteria and property values. This latter output ontology is incorporated into a Multi-Criteria Decision Making (MCDM) model applied for constructing suitability maps for different kinds of land uses. The resulting maps may be the final desired product or can be incorporated into the cellular automata urban modeling and simulation for predicting future urban growth. A case study has been conducted where the output from LUSA OBIE is applied to help produce a suitability map for the City of Regina, Saskatchewan, to assist in the identification of suitable areas for residential development. A set of Saskatchewan bylaw and regulation documents were downloaded and input to the LUSA OBIE system. We accessed the extracted information using both the populated LUSA ontology and the set of annotated documents. In this regard, the LUSA OBIE system was effective in producing a final suitability map.

annotation, information, ontology, (14 more...)

arXiv.org Artificial Intelligence

2109.07672

Country:

North America > Canada > Saskatchewan > Regina (0.34)
North America > United States > Maine (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry: Law > Real Estate Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)

Add feedback

A logic-based relational learning approach to relation extraction: The OntoILPER system

Lima, Rinaldo, Espinasse, Bernard, Freitas, Fred

arXiv.org Artificial IntelligenceJan-13-2020

Relation Extraction (RE), the task of detecting and characterizing semantic relations between entities in text, has gained much importance in the last two decades, mainly in the biomedical domain. Many papers have been published on Relation Extraction using supervised machine learning techniques. Most of these techniques rely on statistical methods, such as feature-based and tree-kernels-based methods. Such statistical learning techniques are usually based on a propositional hypothesis space for representing examples, i.e., they employ an attribute-value representation of features. This kind of representation has some drawbacks, particularly in the extraction of complex relations which demand more contextual information about the involving instances, i.e., it is not able to effectively capture structural information from parse trees without loss of information. In this work, we present OntoILPER, a logic-based relational learning approach to Relation Extraction that uses Inductive Logic Programming for generating extraction models in the form of symbolic extraction rules. OntoILPER takes profit of a rich relational representation of examples, which can alleviate the aforementioned drawbacks. The proposed relational approach seems to be more suitable for Relation Extraction than statistical ones for several reasons that we argue. Moreover, OntoILPER uses a domain ontology that guides the background knowledge generation process and is used for storing the extracted relation instances. The induced extraction rules were evaluated on three protein-protein interaction datasets from the biomedical domain. The performance of OntoILPER extraction models was compared with other state-of-the-art RE systems. The encouraging results seem to demonstrate the effectiveness of the proposed solution.

kernel, ontoilper, relation, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.engappai.2018.11.001

2001.04192

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
South America > Brazil > Pernambuco > Recife (0.04)
(18 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.48)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
(5 more...)

Add feedback

Querying Knowledge via Multi-Hop English Questions

Gao, Tiantian, Fodor, Paul, Kifer, Michael

arXiv.org Artificial IntelligenceJul-18-2019

The inherent difficulty of knowledge specification and the lack of trained specialists are some of the key obstacles on the way to making intelligent systems based on the knowledge representation and reasoning (KRR) paradigm commonplace. Knowledge and query authoring using natural language, especially controlled natural language (CNL), is one of the promising approaches that could enable domain experts, who are not trained logicians, to both create formal knowledge and query it. In previous work, we introduced the KALM system (Knowledge Authoring Logic Machine) that supports knowledge authoring (and simple querying) with very high accuracy that at present is unachievable via machine learning approaches. The present paper expands on the question answering aspect of KALM and introduces KALM-QA (KALM for Question Answering) that is capable of answering much more complex English questions. We show that KALM-QA achieves 100% accuracy on an extensive suite of movie-related questions, called MetaQA, which contains almost 29,000 test questions and over 260,000 training questions. We contrast this with a published machine learning approach, which falls far short of this high mark. It is under consideration for acceptance in TPLP.

machine learning, natural language, question answering, (19 more...)

arXiv.org Artificial Intelligence

1907.08176

Country:

North America > United States > New York > Suffolk County > Stony Brook (0.04)
Asia > China > Beijing > Beijing (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(9 more...)

Genre: Research Report (0.84)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
(2 more...)

Add feedback

Neural Aspect and Opinion Term Extraction with Mined Rules as Weak Supervision

Dai, Hongliang, Song, Yangqiu

arXiv.org Machine LearningJul-7-2019

Lack of labeled training data is a major bottleneck for neural network based aspect and opinion term extraction on product reviews. To alleviate this problem, we first propose an algorithm to automatically mine extraction rules from existing training examples based on dependency parsing results. The mined rules are then applied to label a large amount of auxiliary data. Finally, we study training procedures to train a neural model which can learn from both the data automatically labeled by the rules and a small amount of data accurately annotated by human. Experimental results show that although the mined rules themselves do not perform well due to their limited flexibility, the combination of human annotated data and rule labeled auxiliary data can improve the neural model and allow it to achieve performance better than or comparable with the current state-of-the-art.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

1907.0375

Country:

Europe > France > Pays de la Loire > Loire-Atlantique > Nantes (0.04)
Europe > Belarus (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

The Stanford Natural Language Processing Group

@machinelearnbotApr-18-2017, 20:55:05 GMT

TokensRegex is a generic framework included in Stanford CoreNLP for defining patterns over text (sequences of tokens) and mapping it to semantic objects represented as Java objects. TokensRegex emphasizes describing text as a sequence of tokens (words, punctuation marks, etc.), which may have additional attributes, and writing patterns over those tokens, rather than working at the character level, as with standard regular expression packages. TokensRegex was used to develop SUTime, a rule-based temporal tagger for recognizing and normalizing temporal expressions. An included set of slides provides an overview of this package. There is quite detailed Javadoc for several of the key classes: for the matching patterns, see the Javadoc for TokenSequencePattern and for actions, see the Javadoc for Expressions.

artificial intelligence, expression, natural language, (15 more...)

@machinelearnbot

Country: North America > United States > California > Santa Clara County > Palo Alto (0.41)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.41)

Add feedback

The Stanford Natural Language Processing Group

@machinelearnbotMay-7-2016, 17:25:30 GMT

TokensRegex is a generic framework included in Stanford CoreNLP for defining patterns over text (sequences of tokens) and mapping it to semantic objects represented as Java objects. TokensRegex emphasizes describing text as a sequence of tokens (words, punctuation marks, etc.), which may have additional attributes, and writing patterns over those tokens, rather than working at the character level, as with standard regular expression packages. TokensRegex was used to develop SUTime, a rule-based temporal tagger for recognizing and normalizing temporal expressions. An included set of slides and the javadoc for TokenSequencePattern provide an overview of this package. Some additional information is available in some older slides.

artificial intelligence, expression, natural language, (15 more...)

@machinelearnbot

Country: North America > United States > California > Santa Clara County > Palo Alto (0.41)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.42)

Add feedback