AITopics | automated extraction

Collaborating Authors

automated extraction

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

What Are the Facts? Automated Extraction of Court-Established Facts from Criminal-Court Opinions

Bendová, Klára, Knap, Tomáš, Černý, Jan, Pour, Vojtěch, Savelka, Jaromir, Kvapilíková, Ivana, Drápal, Jakub

arXiv.org Artificial IntelligenceNov-10-2025

Criminal justice administrative data contain only a limited amount of information about the committed offense. However, there is an unused source of extensive information in continental European courts' decisions: descriptions of criminal behaviors in verdicts by which offenders are found guilty. In this paper, we study the feasibility of extracting these descriptions from publicly available court decisions from Slovakia. We use two different approaches for retrieval: regular expressions and large language models (LLMs). Our baseline was a simple method employing regular expressions to identify typical words occurring before and after the description. The advanced regular expression approach further focused on "sparing" and its normalization (insertion of spaces between individual letters), typical for delineating the description. The LLM approach involved prompting the Gemini Flash 2.0 model to extract the descriptions using predefined instructions. Although the baseline identified descriptions in only 40.5% of verdicts, both methods significantly outperformed it, achieving 97% with advanced regular expressions and 98.75% with LLMs, and 99.5% when combined. Evaluation by law students showed that both advanced methods matched human annotations in about 90% of cases, compared to just 34.5% for the baseline. LLMs fully matched human-labeled descriptions in 91.75% of instances, and a combination of advanced regular expressions with LLMs reached 92%.

artificial intelligence, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.0532

Country:

North America > United States (0.46)
Europe > Czechia (0.29)

Genre: Research Report (1.00)

Industry: Law > Criminal Law (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Automated Extraction of Spatio-Semantic Graphs for Identifying Cognitive Impairment

Ng, Si-Ioi, Ambadi, Pranav S., Mueller, Kimberly D., Liss, Julie, Berisha, Visar

arXiv.org Artificial IntelligenceFeb-2-2025

Existing methods for analyzing linguistic content from picture descriptions for assessment of cognitive-linguistic impairment often overlook the participant's visual narrative path, which typically requires eye tracking to assess. Spatio-semantic graphs are a useful tool for analyzing this narrative path from transcripts alone, however they are limited by the need for manual tagging of content information units (CIUs). In this paper, we propose an automated approach for estimation of spatio-semantic graphs (via automated extraction of CIUs) from the Cookie Theft picture commonly used in cognitive-linguistic analyses. The method enables the automatic characterization of the visual semantic path during picture description. Experiments demonstrate that the automatic spatio-semantic graphs effectively differentiate between cognitively impaired and unimpaired speakers. Statistical analyses reveal that the features derived by the automated method produce comparable results to the manual method, with even greater group differences between clinical groups of interest. These results highlight the potential of the automated approach for extracting spatio-semantic features in developing clinical speech models for cognitive impairment assessment.

artificial intelligence, natural language, text processing, (15 more...)

arXiv.org Artificial Intelligence

2502.01685

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > Arizona (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.69)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.78)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

LLMCloudHunter: Harnessing LLMs for Automated Extraction of Detection Rules from Cloud-Based CTI

Schwartz, Yuval, Benshimol, Lavi, Mimran, Dudu, Elovici, Yuval, Shabtai, Asaf

arXiv.org Artificial IntelligenceJul-6-2024

As the number and sophistication of cyber attacks have increased, threat hunting has become a critical aspect of active security, enabling proactive detection and mitigation of threats before they cause significant harm. Open-source cyber threat intelligence (OS-CTI) is a valuable resource for threat hunters, however, it often comes in unstructured formats that require further manual analysis. Previous studies aimed at automating OSCTI analysis are limited since (1) they failed to provide actionable outputs, (2) they did not take advantage of images present in OSCTI sources, and (3) they focused on on-premises environments, overlooking the growing importance of cloud environments. To address these gaps, we propose LLMCloudHunter, a novel framework that leverages large language models (LLMs) to automatically generate generic-signature detection rule candidates from textual and visual OSCTI data. We evaluated the quality of the rules generated by the proposed framework using 12 annotated real-world cloud threat reports. The results show that our framework achieved a precision of 92% and recall of 98% for the task of accurately extracting API calls made by the threat actor and a precision of 99% with a recall of 98% for IoCs. Additionally, 99.18% of the generated detection rule candidates were successfully compiled and converted into Splunk queries.

api call, llmcloudhunter, sigma candidate, (14 more...)

arXiv.org Artificial Intelligence

2407.05194

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.49)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

GLOCON Database: Design Decisions and User Manual (v1.0)

Hürriyetoğlu, Ali, Mutlu, Osman, Duruşan, Fırat, Yörük, Erdem

arXiv.org Artificial IntelligenceMay-28-2024

GLOCON is a database of contentious events automatically extracted from national news sources from various countries in multiple languages. National news sources are utilized, and complete news archives are processed to create an event list for each source. Automation is achieved using a gold standard corpus sampled randomly from complete news archives (Yörük et al. 2022) and all annotated by at least two domain experts based on the event definition provided in Duruşan et al. (2022). The database consists of the following countries and sources provided in Table 1 as of May 2024.

information, news article, place name, (13 more...)

arXiv.org Artificial Intelligence

2405.18613

Country:

Asia > India (0.07)
South America > Brazil (0.05)
South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
(2 more...)

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.48)

Add feedback

Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2023): Workshop and Shared Task Report

Hürriyetoğlu, Ali, Tanev, Hristo, Mutlu, Osman, Thapa, Surendrabikram, Tan, Fiona Anting, Yörük, Erdem

arXiv.org Artificial IntelligenceDec-2-2023

We provide a summary of the sixth edition of the CASE workshop that is held in the scope of RANLP 2023. The workshop consists of regular papers, three keynotes, working papers of shared task participants, and shared task overview papers. This workshop series has been bringing together all aspects of event information collection across technical and social science fields. In addition to contributing to the progress in text based event extraction, the workshop provides a space for the organization of a multimodal event information collection task.

automated extraction, challenge and application, computational linguistic, (13 more...)

arXiv.org Artificial Intelligence

2312.01244

Country:

Europe > Ukraine (0.14)
Asia > Russia (0.14)
Europe > Bulgaria > Varna Province > Varna (0.07)
(15 more...)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.88)

Industry:

Education (0.47)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
(2 more...)

Add feedback

A Neural Expert System with Automated Extraction of Fuzzy If-Then Rules and Its Application to Medical Diagnosis

Neural Information Processing SystemsApr-6-2023, 19:36:48 GMT

This paper proposes ajuzzy neural expert system (FNES) with the following two functions: (1) Generalization of the information derived from the training data and embodiment of knowledge in the form of the fuzzy neural network; (2) Extraction of fuzzy If-Then rules with linguistic relative importance of each proposition in an antecedent (I f -part) from a trained neural network. This paper also gives a method to extract automatically fuzzy If-Then rules from the trained neural network. To prove the effectiveness and validity of the proposed fuzzy neural expert system.

automated extraction, fuzzy if-then rule, neural expert system, (4 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Diagnostic Medicine (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)

Add feedback

Automated Extraction of Fine-Grained Standardized Product Information from Unstructured Multilingual Web Data

Flick, Alexander, Jäger, Sebastian, Trajanovska, Ivana, Biessmann, Felix

arXiv.org Artificial IntelligenceFeb-23-2023

Extracting structured information from unstructured data is one of the key challenges in modern information retrieval applications, including e-commerce. Here, we demonstrate how recent advances in machine learning, combined with a recently published multilingual data set with standardized fine-grained product category information, enable robust product attribute extraction in challenging transfer learning settings. Our models can reliably predict product attributes across online shops, languages, or both. Furthermore, we show that our models can be used to match product taxonomies between online retailers.

category, information retrieval, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2302.12139

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > New York > New York County > New York City (0.05)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry: Retail (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.37)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.30)

Add feedback

ClassBases at CASE-2022 Multilingual Protest Event Detection Tasks: Multilingual Protest News Detection and Automatically Replicating Manually Created Event Datasets

Wiriyathammabhum, Peratham

arXiv.org Artificial IntelligenceJan-16-2023

In this report, we describe our ClassBases submissions to a shared task on multilingual protest event detection. For the multilingual protest news detection, we participated in subtask-1, subtask-2, and subtask-4, which are document classification, sentence classification, and token classification. In subtask-1, we compare XLM-RoBERTa-base, mLUKE-base, and XLM-RoBERTa-large on finetuning in a sequential classification setting. We always use a combination of the training data from every language provided to train our multilingual models. We found that larger models seem to work better and entity knowledge helps but at a non-negligible cost. For subtask-2, we only submitted an mLUKE-base system for sentence classification. For subtask-4, we only submitted an XLM-RoBERTa-base for token classification system for sequence labeling. For automatically replicating manually created event datasets, we participated in COVID-related protest events from the New York Times news corpus. We created a system to process the crawled data into a dataset of protest events.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2301.06617

Country:

North America > United States (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Event Causality Identification with Causal News Corpus -- Shared Task 3, CASE 2022

Tan, Fiona Anting, Hettiarachchi, Hansi, Hürriyetoğlu, Ali, Caselli, Tommaso, Uca, Onur, Liza, Farhana Ferdousi, Oostdijk, Nelleke

arXiv.org Artificial IntelligenceNov-22-2022

The Event Causality Identification Shared Task of CASE 2022 involved two subtasks working on the Causal News Corpus. Subtask 1 required participants to predict if a sentence contains a causal relation or not. This is a supervised binary classification task. Subtask 2 required participants to identify the Cause, Effect and Signal spans per causal sentence. This could be seen as a supervised sequence labeling task. For both subtasks, participants uploaded their predictions for a held-out test set, and ranking was done based on binary F1 and macro F1 scores for Subtask 1 and 2, respectively. This paper summarizes the work of the 17 teams that submitted their results to our competition and 12 system description papers that were received. The best F1 scores achieved for Subtask 1 and 2 were 86.19% and 54.15%, respectively. All the top-performing approaches involved pre-trained language models fine-tuned to the targeted task. We further discuss these approaches and analyze errors across participants' systems in this paper.

computational linguistic, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2211.12154

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Netherlands (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
(18 more...)

Genre: Research Report (0.50)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2022): Workshop and Shared Task Report

Hürriyetoğlu, Ali, Tanev, Hristo, Zavarella, Vanni, Yeniterzi, Reyyan, Mutlu, Osman, Yörük, Erdem

arXiv.org Artificial IntelligenceNov-21-2022

We provide a summary of the fifth edition of the CASE workshop that is held in the scope of EMNLP 2022. The workshop consists of regular papers, two keynotes, working papers of shared task participants, and task overview papers. This workshop has been bringing together all aspects of event information collection across technical and social science fields. In addition to the progress in depth, the submission and acceptance of multimodal approaches show the widening of this interdisciplinary research topic.

artificial intelligence, computational linguistic, natural language, (16 more...)

arXiv.org Artificial Intelligence

2211.11359

Country:

Europe > Ukraine (0.14)
Asia > Russia (0.14)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.05)
(6 more...)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback