AITopics

2508.14801

Country: North America > United States (1.00)

Genre:

Research Report (1.00)
Instructional Material (0.68)

Industry:

Energy (0.68)
Health & Medicine > Therapeutic Area > Oncology (0.67)
Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(3 more...)

Romberg, Julia, Schröder, Christopher, Gonsior, Julius, Tomanek, Katrin, Olsson, Fredrik

Have LLMs Made Active Learning Obsolete? Surveying the NLP Community

arXiv.org Artificial IntelligenceMar-12-2025

Supervised learning relies on annotated data, which is expensive to obtain. A longstanding strategy to reduce annotation costs is active learning, an iterative process, in which a human annotates only data instances deemed informative by a model. Large language models (LLMs) have pushed the effectiveness of active learning, but have also improved methods such as few- or zero-shot learning, and text synthesis - thereby introducing potential alternatives. This raises the question: has active learning become obsolete? To answer this fully, we must look beyond literature to practical experiences. We conduct an online survey in the NLP community to collect previously intangible insights on the perceived relevance of data annotation, particularly focusing on active learning, including best practices, obstacles and expected future developments. Our findings show that annotated data remains a key factor, and active learning continues to be relevant. While the majority of active learning users find it effective, a comparison with a community survey from over a decade ago reveals persistent challenges: setup complexity, estimation of cost reduction, and tooling. We publish an anonymized version of the collected dataset

active learning, computational linguistic, proceedings, (13 more...)

2503.09701

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
North America > Canada > Ontario > Toronto (0.04)
(21 more...)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Education (0.93)
Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Darji, Harshil, Mitrović, Jelena, Granitzer, Michael

Challenges and Considerations in Annotating Legal Data: A Comprehensive Overview

arXiv.org Artificial IntelligenceJul-5-2024

The process of annotating data within the legal sector is filled with distinct challenges that differ from other fields, primarily due to the inherent complexities of legal language and documentation. The initial task usually involves selecting an appropriate raw dataset that captures the intricate aspects of legal texts. Following this, extracting text becomes a complicated task, as legal documents often have complex structures, footnotes, references, and unique terminology. The importance of data cleaning is magnified in this context, ensuring that redundant information is eliminated while maintaining crucial legal details and context. Creating comprehensive yet straightforward annotation guidelines is imperative, as these guidelines serve as the road map for maintaining uniformity and addressing the subtle nuances of legal terminology. Another critical aspect is the involvement of legal professionals in the annotation process. Their expertise is valuable in ensuring that the data not only remains contextually accurate but also adheres to prevailing legal standards and interpretations. This paper provides an expanded view of these challenges and aims to offer a foundational understanding and guidance for researchers and professionals engaged in legal data annotation projects. In addition, we provide links to our created and fine-tuned datasets and language models. These resources are outcomes of our discussed projects and solutions to challenges faced while working on them.

annotation, dataset, legal text, (17 more...)

2407.17503

Country:

Europe > Serbia (0.05)
North America > United States (0.04)
Europe > Germany > Hesse > Darmstadt Region > Frankfurt (0.04)

Genre: Overview (0.70)

Industry: Law (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.71)

Jukić, Josip, Jelenić, Fran, Bićanić, Miroslav, Šnajder, Jan

ALANNO: An Active Learning Annotation System for Mortals

arXiv.org Artificial IntelligenceFeb-21-2023

Supervised machine learning has become the cornerstone of today's data-driven society, increasing the need for labeled data. However, the process of acquiring labels is often expensive and tedious. One possible remedy is to use active learning (AL) -- a special family of machine learning algorithms designed to reduce labeling costs. Although AL has been successful in practice, a number of practical challenges hinder its effectiveness and are often overlooked in existing AL annotation tools. To address these challenges, we developed ALANNO, an open-source annotation system for NLP tasks equipped with features to make AL effective in real-world annotation projects. ALANNO facilitates annotation management in a multi-annotator setup and supports a variety of AL methods and underlying models, which are easily configurable and extensible.

annotator, artificial intelligence, machine learning, (16 more...)

2211.06224

Country:

North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
(6 more...)

Genre: Research Report (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

#artificialintelligenceJan-3-2023, 13:25:22 GMT

Spark NLP Training

Data Annotation is an important part of Natural Language Processing (NLP) projects. To train a successful NLP model, it is necessary to extract data in an accurate and consistent way, combining different features such as Named-Entity Recognition (NER), Assertion Status Detection, Relation Extraction, and Text Classification. During this training, you will develop key skills to carry out a complete annotation project using John Snow Labs' high-productivity annotation tool: The Annotation Lab. You will also learn and practice how to develop effective Annotation Guidelines, best practices for leading a team of annotators to ensure accurate results, and how to track your project's progress and the quality of your annotations. The instructors have led multiple large data annotation projects and will be available during the assignments to answer questions.

annotation project, spark nlp training

Genre: Instructional Material > Course Syllabus & Notes (0.87)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.66)

arXiv.org Artificial IntelligenceSep-6-2021

LightTag: Text Annotation Platform

Perry, Tal

Text annotation tools assume that their user's goal is to create a labeled corpus. However, users view annotation as a necessary evil on the way to deliver business value through NLP. Thus an annotation tool should optimize for the throughput of the global NLP process, not only the productivity of individual annotators. LightTag is a text annotation tool designed and built on that principle. This paper shares our design rationale, data modeling choices, and user interface decisions then illustrates how those choices serve the full NLP lifecycle.

annotation, annotation tool, lighttag, (15 more...)

2109.0232

Country: North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.40)

Industry:

Government (0.69)
Information Technology > Security & Privacy (0.47)
Health & Medicine > Health Care Technology > Medical Record (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.95)
Information Technology > Information Management (0.89)

#artificialintelligenceOct-4-2019, 23:10:03 GMT

The 5 Pitfalls of Document Labeling -- And How to Avoid Them -- TagWorks

Don't let your annotation project bury you. Whether you call it "content analysis," "textual data labeling," "hand-coding," or "tagging," a lot more researchers and data science teams are starting up annotation projects these days. Many want human judgment labeled onto text so they train AI (via supervised machine learning approaches). Others have tried automated text analysis and found it wanting. Now they're looking for ways to label text that aren't so hard to interpret and explain.

annotation project, annotator, pitfall, (12 more...)

Country:

North America > United States > California > Alameda County > Oakland (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.35)

#artificialintelligenceSep-11-2019, 02:33:22 GMT

NLP, AI, and Social Science are About to Get A Lot Better

If robots can do backflips and cars can nearly drive themselves, why can't Siri and Alexa carry their side of a simple conversation? And how come there's no artificial intelligence (AI) able to read through all of our news and policy discussions to solve our social and economic problems? The answer is simpler than you might think. As it happens, human languages create very noisy data. Our ambiguous words, metaphors, and idioms make for beautiful poetry, but computers were built to compute math and logic on unambiguous numbers and categories.

artificial intelligence, natural language, tagwork, (14 more...)

Country: North America > United States > California > Alameda County (0.16)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

#artificialintelligenceSep-10-2019, 23:07:28 GMT

The five pitfalls of document labeling - and how to avoid them -- SAGE Ocean Big Data, New Tech, Social Science

Whether you call it'content analysis', 'textual data labeling', 'hand-coding', or'tagging', a lot more researchers and data science teams are starting up annotation projects these days. Many want human judgment labeled onto text to train AI (via supervised machine learning approaches). Others have tried automated text analysis and found it wanting. Now they're looking for ways to label text that aren't so hard to interpret and explain. Some just want what social scientists have always wanted: a way to analyze massive archives of human behavior (like the Supreme Court's transcripts or diplomatic correspondence) at high scales.

annotation project, sage ocean big data, variable label, (7 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.98)
Information Technology > Data Science > Data Mining > Big Data (0.40)