AITopics | concept frequency

Collaborating Authors

concept frequency

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

715b78ccfb6f4cada5528ac9b5278def-Paper-Conference.pdf

Neural Information Processing SystemsFeb-15-2026, 19:20:06 GMT

large language model, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Kentucky (0.04)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Law (0.67)
Information Technology > Services (0.46)
Transportation > Passenger (0.46)
Transportation > Ground (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

715b78ccfb6f4cada5528ac9b5278def-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 05:52:51 GMT

concept frequency, dataset, experiment, (9 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Kentucky (0.04)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Transportation > Passenger (0.67)
Health & Medicine (0.67)
Law (0.67)
(2 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance

Udandarao, Vishaal, Prabhu, Ameya, Ghosh, Adhiraj, Sharma, Yash, Torr, Philip H. S., Bibi, Adel, Albanie, Samuel, Bethge, Matthias

arXiv.org Artificial IntelligenceApr-8-2024

Web-crawled pretraining datasets underlie the impressive "zero-shot" evaluation performance of multimodal models, such as CLIP for classification/retrieval and Stable-Diffusion for image generation. However, it is unclear how meaningful the notion of "zero-shot" generalization is for such multimodal models, as it is not known to what extent their pretraining datasets encompass the downstream concepts targeted for during "zero-shot" evaluation. In this work, we ask: How is the performance of multimodal models on downstream concepts influenced by the frequency of these concepts in their pretraining datasets? We comprehensively investigate this question across 34 models and five standard pretraining datasets (CC-3M, CC-12M, YFCC-15M, LAION-400M, LAION-Aesthetics), generating over 300GB of data artifacts. We consistently find that, far from exhibiting "zero-shot" generalization, multimodal models require exponentially more data to achieve linear improvements in downstream "zero-shot" performance, following a sample inefficient log-linear scaling trend. This trend persists even when controlling for sample-level similarity between pretraining and downstream datasets, and testing on purely synthetic data distributions. Furthermore, upon benchmarking models on long-tailed data sampled based on our analysis, we demonstrate that multimodal models across the board perform poorly. We contribute this long-tail test set as the "Let it Wag!" benchmark to further research in this direction. Taken together, our study reveals an exponential need for training data which implies that the key to "zero-shot" generalization capabilities under large-scale training paradigms remains to be found.

arxiv preprint arxiv, concept frequency, dataset, (7 more...)

arXiv.org Artificial Intelligence

2404.04125

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A proposed new metric for the conceptual diversity of a text

Phd, İlknur Dönmez, Phd, Mehmet Haklıdır

arXiv.org Artificial IntelligenceDec-27-2023

A word may contain one or more hidden concepts. While the "animal" word evokes many images in our minds and encapsulates many concepts (birds, dogs, cats, crocodiles, etc.), the `parrot' word evokes a single image (a colored bird with a short, hooked beak and the ability to mimic sounds). In spoken or written texts, we use some words in a general sense and some in a detailed way to point to a specific object. Until now, a text's conceptual diversity value cannot be determined using a standard and precise technique. This research contributes to the natural language processing field of AI by offering a standardized method and a generic metric for evaluating and comparing concept diversity in different texts and domains. It also contributes to the field of semantic research of languages. If we give examples for the diversity score of two sentences, "He discovered an unknown entity." has a high conceptual diversity score (16.6801), and "The endoplasmic reticulum forms a series of flattened sacs within the cytoplasm of eukaryotic cells." sentence has a low conceptual diversity score which is 3.9068.

conceptual diversity, diversity, metric, (15 more...)

arXiv.org Artificial Intelligence

2312.16548

Country:

Asia > Middle East > Republic of Türkiye (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback