AITopics | collective outlier

Collaborating Authors

collective outlier

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learnability Matters: Active Learning for Video Captioning

Neural Information Processing SystemsMar-19-2026, 23:44:44 GMT

This work focuses on the active learning in video captioning. In particular, we propose to address the learnability problem in active learning, which has been brought up by collective outliers in video captioning and neglected in the literature. To start with, we conduct a comprehensive study of collective outliers, exploring their hard-to-learn property and concluding that ground truth inconsistency is one of the main causes. Motivated by this, we design a novel active learning algorithm that takes three complementary aspects, namely learnability, diversity, and uncertainty, into account. Ideally, learnability is reflected by ground truth consistency. Under the active learning scenario where ground truths are not available until human involvement, we measure the consistency on estimated ground truths, where predictions from off-the-shelf models are utilized as approximations to ground truths. These predictions are further used to estimate sample frequency and reliability, evincing the diversity and uncertainty respectively. With the help of our novel caption-wise active learning protocol, our algorithm is capable of leveraging knowledge from humans in a more effective yet intellectual manner. Results on publicly available video captioning datasets with diverse video captioning models demonstrate that our algorithm outperforms SOTA active learning methods by a large margin, e.g.

artificial intelligence, machine learning, proceedings, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Learnability Matters: Active Learning for Video Captioning

Neural Information Processing SystemsFeb-11-2026, 23:48:03 GMT

This work focuses on the active learning in video captioning.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(23 more...)

Genre:

Research Report > Experimental Study (0.93)
Overview (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.92)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

Learnability Matters: Active Learning for Video Captioning

Neural Information Processing SystemsOct-10-2025, 00:37:57 GMT

This work focuses on the active learning in video captioning.

annotation, collective outlier, video, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(23 more...)

Genre:

Research Report > Experimental Study (0.93)
Overview (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.92)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

Learnability Matters: Active Learning for Video Captioning

Neural Information Processing SystemsMay-26-2025, 22:55:16 GMT

artificial intelligence, learnability matter, machine learning, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Investigating Multi-source Active Learning for Natural Language Inference

Snijders, Ard, Kiela, Douwe, Margatina, Katerina

arXiv.org Artificial IntelligenceFeb-14-2023

In recent years, active learning has been successfully applied to an array of NLP tasks. However, prior work often assumes that training and test data are drawn from the same distribution. This is problematic, as in real-life settings data may stem from several sources of varying relevance and quality. We show that four popular active learning schemes fail to outperform random selection when applied to unlabelled pools comprised of multiple data sources on the task of natural language inference. We reveal that uncertainty-based strategies perform poorly due to the acquisition of collective outliers, i.e., hard-to-learn instances that hamper learning and generalization. When outliers are removed, strategies are found to recover and outperform random baselines. In further analysis, we find that collective outliers vary in form between sources, and show that hard-to-learn data is not always categorically harmful. Lastly, we leverage dataset cartography to introduce difficulty-stratified testing and find that different strategies are affected differently by example learnability and difficulty.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2302.06976

Country:

Asia > Middle East > Jordan (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(7 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Film (1.00)
Leisure & Entertainment > Sports > Baseball (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

C-AllOut: Catching & Calling Outliers by Type

Silva, Guilherme D. F., Akoglu, Leman, Cordeiro, Robson L. F.

arXiv.org Artificial IntelligenceOct-13-2021

Given an unlabeled dataset, wherein we have access only to pairwise similarities (or distances), how can we effectively (1) detect outliers, and (2) annotate/tag the outliers by type? Outlier detection has a large literature, yet we find a key gap in the field: to our knowledge, no existing work addresses the outlier annotation problem. Outliers are broadly classified into 3 types, representing distinct patterns that could be valuable to analysts: (a) global outliers are severe yet isolate cases that do not repeat, e.g., a data collection error; (b) local outliers diverge from their peers within a context, e.g., a particularly short basketball player; and (c) collective outliers are isolated micro-clusters that may indicate coalition or repetitions, e.g., frauds that exploit the same loophole. This paper presents C-AllOut: a novel and effective outlier detector that annotates outliers by type. It is parameter-free and scalable, besides working only with pairwise similarities (or distances) when it is needed. We show that C-AllOut achieves on par or significantly better performance than state-of-the-art detectors when spotting outliers regardless of their type. It is also highly effective in annotating outliers of particular types, a task that none of the baselines can perform.

c-allout, collective outlier, outlier, (14 more...)

arXiv.org Artificial Intelligence

2110.08257

Country:

South America > Brazil > São Paulo (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering

Karamcheti, Siddharth, Krishna, Ranjay, Fei-Fei, Li, Manning, Christopher D.

arXiv.org Artificial IntelligenceJul-5-2021

Active learning promises to alleviate the massive data needs of supervised machine learning: it has successfully improved sample efficiency by an order of magnitude on traditional tasks like topic classification and object recognition. However, we uncover a striking contrast to this promise: across 5 models and 4 datasets on the task of visual question answering, a wide variety of active learning approaches fail to outperform random selection. To understand this discrepancy, we profile 8 active learning methods on a per-example basis, and identify the problem as collective outliers -- groups of examples that active learning methods prefer to acquire but models fail to learn (e.g., questions that ask about text in images or require external knowledge). Through systematic ablation experiments and qualitative visualizations, we verify that collective outliers are a general phenomenon responsible for degrading pool-based active learning. Notably, we show that active learning sample efficiency increases significantly as the number of collective outliers in the active learning pool decreases. We conclude with a discussion and prescriptive recommendations for mitigating the effects of these outliers in future work.

dataset, learning, training example number, (11 more...)

arXiv.org Artificial Intelligence

2107.02331

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report (0.83)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
(2 more...)

Add feedback