AITopics | annotation model

Collaborating Authors

annotation model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

VGR: Visual Grounded Reasoning

Wang, Jiacong, Kang, Zijian, Wang, Haochen, Jiang, Haiyong, Li, Jiawen, Wu, Bohong, Wang, Ya, Ran, Jiao, Liang, Xiao, Feng, Chao, Xiao, Jun

arXiv.org Artificial IntelligenceJun-17-2025

In the field of multimodal chain-of-thought (CoT) reasoning, existing approaches predominantly rely on reasoning on pure language space, which inherently suffers from language bias and is largely confined to math or science domains. This narrow focus limits their ability to handle complex visual reasoning tasks that demand comprehensive understanding of image details. To address these limitations, this paper introduces VGR, a novel reasoning multimodal large language model (MLLM) with enhanced fine-grained visual perception capabilities. Unlike traditional MLLMs that answer the question or reasoning solely on the language space, our VGR first detects relevant regions that may help to solve problems, and then provides precise answers based on replayed image regions. To achieve this, we conduct a large-scale SFT dataset called VGR -SFT that contains reasoning data with mixed vision grounding and language deduction. The inference pipeline of VGR allows the model to choose bounding boxes for visual reference and a replay stage is introduced to integrates the corresponding regions into the reasoning process, enhancing multimodel comprehension. Experiments on the LLaVA-NeXT-7B baseline show that VGR achieves superior performance on multi-modal benchmarks requiring comprehensive image detail understanding. Compared to the baseline, VGR uses only 30\% of the image token count while delivering scores of +4.1 on MMStar, +7.1 on AI2D, and a +12.9 improvement on ChartQA.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2506.11991

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Grapheme-Coherent Phonemic and Prosodic Annotation of Speech by Implicit and Explicit Grapheme Conditioning

Ohnaka, Hien, Shirahata, Yuma, Park, Byeongseon, Yamamoto, Ryuichi

arXiv.org Artificial IntelligenceJun-6-2025

We propose a model to obtain phonemic and prosodic labels of speech that are coherent with graphemes. Unlike previous methods that simply fine-tune a pre-trained ASR model with the labels, the proposed model conditions the label generation on corresponding graphemes by two methods: 1) Add implicit grapheme conditioning through prompt encoder using pre-trained BERT features. 2) Explicitly prune the label hypotheses inconsistent with the grapheme during inference. These methods enable obtaining parallel data of speech, the labels, and graphemes, which is applicable to various downstream tasks such as text-to-speech and accent estimation from text. Experiments showed that the proposed method significantly improved the consistency between graphemes and the predicted labels. Further, experiments on accent estimation task confirmed that the created parallel data by the proposed method effectively improve the estimation accuracy.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2506.04527

Country: Asia > Japan (0.14)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.37)

Add feedback

DarkBench: Benchmarking Dark Patterns in Large Language Models

Kran, Esben, Nguyen, Hieu Minh "Jord", Kundu, Akash, Jawhar, Sami, Park, Jinsuk, Jurewicz, Mateusz Maria

arXiv.org Artificial IntelligenceMar-13-2025

Measuring these dark patterns is essential for understanding and mitigating the potential manipulative behaviors of LLMs. While some patterns, like Brand Bias and User Retention, were adapted directly from known dark patterns in UI/UX, others, like Harmful Generation and Anthropomorphization, represent critical risks not explicitly addressed in Brignull and Darlo (2010)'s taxonomy. Table 4 demonstrates how these categories map to or expand on established dark patterns, providing a foundation for their inclusion. However, some risks, particularly Anthropomorphization and Harmful Generation, require additional justification. Anthropomorphization, the attribution of human-like characteristics to AI systems, has been identified as a key factor in enhancing user engagement and trust.

benchmark, conference paper, dark pattern, (16 more...)

arXiv.org Artificial Intelligence

2503.10728

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Singapore (0.04)

Genre: Research Report (1.00)

Industry:

Law (0.93)
Health & Medicine > Consumer Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Estimating Agreement by Chance for Sequence Annotation

Li, Diya, Rosé, Carolyn, Yuan, Ao, Zhou, Chunxiao

arXiv.org Artificial IntelligenceJul-16-2024

In the field of natural language processing, correction of performance assessment for chance agreement plays a crucial role in evaluating the reliability of annotations. However, there is a notable dearth of research focusing on chance correction for assessing the reliability of sequence annotation tasks, despite their widespread prevalence in the field. To address this gap, this paper introduces a novel model for generating random annotations, which serves as the foundation for estimating chance agreement in sequence annotation tasks. Utilizing the proposed randomization model and a related comparison approach, we successfully derive the analytical form of the distribution, enabling the computation of the probable location of each annotated text segment and subsequent chance agreement estimation. Through a combination simulation and corpus-based evaluation, we successfully assess its applicability and validate its accuracy and efficacy.

annotation, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2407.11371

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Illinois (0.04)

Genre: Research Report (1.00)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

ARAIDA: Analogical Reasoning-Augmented Interactive Data Annotation

Huang, Chen, Jin, Yiping, Ilievski, Ilija, Lei, Wenqiang, Lv, Jiancheng

arXiv.org Artificial IntelligenceJun-1-2024

Human annotation is a time-consuming task that requires a significant amount of effort. To address this issue, interactive data annotation utilizes an annotation model to provide suggestions for humans to approve or correct. However, annotation models trained with limited labeled data are prone to generating incorrect suggestions, leading to extra human correction effort. To tackle this challenge, we propose Araida, an analogical reasoning-based approach that enhances automatic annotation accuracy in the interactive data annotation setting and reduces the need for human corrections. Araida involves an error-aware integration strategy that dynamically coordinates an annotation model and a k-nearest neighbors (KNN) model, giving more importance to KNN's predictions when predictions from the annotation model are deemed inaccurate. Empirical studies demonstrate that Araida is adaptable to different annotation tasks and models. On average, it reduces human correction labor by 11.02% compared to vanilla interactive data annotation methods.

annotation, annotation model, raida, (16 more...)

arXiv.org Artificial Intelligence

2405.11912

Country:

Asia > China (0.04)
North America > United States > Oregon (0.04)
North America > United States > New Mexico (0.04)
(11 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.46)
Media > Music (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

Add feedback

Wakeword Detection under Distribution Shifts

Parthasarathi, Sree Hari Krishnan, Zeng, Lu, Jose, Christin, Wang, Joseph

arXiv.org Artificial IntelligenceJul-13-2022

We propose a novel approach for semi-supervised learning (SSL) designed to overcome distribution shifts between training and real-world data arising in the keyword spotting (KWS) task. Shifts from training data distribution are a key challenge for real-world KWS tasks: when a new model is deployed on device, the gating of the accepted data undergoes a shift in distribution, making the problem of timely updates via subsequent deployments hard. Despite the shift, we assume that the marginal distributions on labels do not change. We utilize a modified teacher/student training framework, where labeled training data is augmented with unlabeled data. Note that the teacher does not have access to the new distribution as well. To train effectively with a mix of human and teacher labeled data, we develop a teacher labeling strategy based on confidence heuristics to reduce entropy on the label distribution from the teacher model; the data is then sampled to match the marginal distribution on the labels. Large scale experimental results show that a convolutional neural network (CNN) trained on far-field audio, and evaluated on far-field audio drawn from a different distribution, obtains a 14.3% relative improvement in false discovery rate (FDR) at equal false reject rate (FRR), while yielding a 5% improvement in FDR under no distribution shift. Under a more severe distribution shift from far-field to near-field audio with a smaller fully connected network (FCN) our approach achieves a 52% relative improvement in FDR at equal FRR, while yielding a 20% relative improvement in FDR on the original distribution.

baseline model, distribution shift, relative improvement, (15 more...)

arXiv.org Artificial Intelligence

2207.06423

Country: North America > United States (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Education (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Joint Multi-Dimensional Model for Global and Time-Series Annotations

Ramakrishna, Anil, Gupta, Rahul, Narayanan, Shrikanth

arXiv.org Machine LearningMay-6-2020

Crowdsourcing is a popular approach to collect annotations for unlabeled data instances. It involves collecting a large number of annotations from several, often naive untrained annotators for each data instance which are then combined to estimate the ground truth. Further, annotations for constructs such as affect are often multi-dimensional with annotators rating multiple dimensions, such as valence and arousal, for each instance. Most annotation fusion schemes however ignore this aspect and model each dimension separately. In this work we address this by proposing a generative model for multi-dimensional annotation fusion, which models the dimensions jointly leading to more accurate ground truth estimates. The model we propose is applicable to both global and time series annotation fusion problems and treats the ground truth as a latent variable distorted by the annotators. The model parameters are estimated using the Expectation-Maximization algorithm and we evaluate its performance using synthetic data and real emotion corpora as well as on an artificial task with human annotations

annotation, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2005.03117

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
(4 more...)

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine (1.00)
Media > Film (0.93)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
(4 more...)

Add feedback

GPM: A Generic Probabilistic Model to Recover Annotator's Behavior and Ground Truth Labeling

Li, Jing, Ling, Suiyi, Wang, Junle, Li, Zhi, Callet, Patrick Le

arXiv.org Artificial IntelligenceMar-1-2020

In the big data era, data labeling can be obtained through crowdsourcing. Nevertheless, the obtained labels are generally noisy, unreliable or even adversarial. In this paper, we propose a probabilistic graphical annotation model to infer the underlying ground truth and annotator's behavior. To accommodate both discrete and continuous application scenarios (e.g., classifying scenes vs. rating videos on a Likert scale), the underlying ground truth is considered following a distribution rather than a single value. In this way, the reliable but potentially divergent opinions from "good" annotators can be recovered. The proposed model is able to identify whether an annotator has worked diligently towards the task during the labeling procedure, which could be used for further selection of qualified annotators. Our model has been tested on both simulated data and real-world data, where it always shows superior performance than the other state-of-the-art models in terms of accuracy and robustness.

annotator, categorical distribution, ground truth, (16 more...)

arXiv.org Artificial Intelligence

2003.00475

Country: Europe > France > Pays de la Loire > Loire-Atlantique > Nantes (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry:

Media > Film (0.68)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Data Science > Data Mining (0.88)
Information Technology > Communications > Social Media > Crowdsourcing (0.68)
(3 more...)

Add feedback

Kill Two Birds With One Stone: Weakly-Supervised Neural Network for Image Annotation and Tag Refinement

Zhang, Junjie (University of Technology Sydney) | Wu, Qi (University of Adelaide) | Zhang, Jian (University of Technology Sydney) | Shen, Chunhua (University of Adelaide) | Lu, Jianfeng (Nanjing University of Science and Technology)

AAAI ConferencesFeb-8-2018

The number of social images has exploded by the wide adoption of social networks, and people like to share their comments about them. These comments can be a description of the image, or some objects, attributes, scenes in it, which are normally used as the user-provided tags. However, it is well-known that user-provided tags are incomplete and imprecise to some extent. Directly using them can damage the performance of related applications, such as the image annotation and retrieval. In this paper, we propose to learn an image annotation model and refine the user-provided tags simultaneously in a weakly-supervised manner. The deep neural network is utilized as the image feature learning and backbone annotation model, while visual consistency, semantic dependency, and user-error sparsity are introduced as the constraints at the batch level to alleviate the tag noise. Therefore, our model is highly flexible and stable to handle large-scale image sets. Experimental results on two benchmark datasets indicate that our proposed model achieves the best performance compared to the state-of-the-art methods.

artificial intelligence, constraint, machine learning, (18 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country:

Asia (0.46)
Oceania > Australia (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry: Information Technology > Services (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback