AITopics | random topic

Collaborating Authors

random topic

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Price of Format: Diversity Collapse in LLMs

Yun, Longfei, An, Chenyang, Wang, Zilong, Peng, Letian, Shang, Jingbo

arXiv.org Artificial IntelligenceMay-27-2025

Instruction-tuned large language models (LLMs) employ structured templates, such as role markers and special tokens, to enforce format consistency during inference. However, we identify a critical limitation of such formatting: it induces a phenomenon we term diversity collapse, where the model generates semantically similar outputs for open-ended inputs, undermining creativity and variability. We systematically evaluate this effect across tasks like story completion and free-form generation, finding that (1) diversity collapse persists even under high-temperature sampling, and (2) structural tokens in templates significantly constrain the model's output space. To contextualize these findings, we fine-tune the same model using a range of structured prompts and then evaluate them across three axes: downstream task performance, alignment behavior, and output diversity. Our analysis shows that format consistency between fine-tuning and inference is crucial for structure-sensitive tasks (e.g., GSM8K, IFEval), but has marginal influence on knowledge-heavy tasks (e.g., MMLU, WebQuestions). In contrast, output diversity is primarily governed by the presence or absence of structural tokens, with minimal formatting yielding the most diverse outputs. These findings reveal that current prompting conventions, while beneficial for alignment, may inadvertently suppress output diversity, underscoring the need for diversity-aware prompt design and instruction tuning.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2505.18949

Country: North America > United States (0.68)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

TopicsRanksDC: Distance-based Topic Ranking applied on Two-Class Data

Yousef, Malik, Qundus, Jamal Al, Peikert, Silvio, Paschke, Adrian

arXiv.org Artificial IntelligenceMay-17-2021

In this paper, we introduce a novel approach named TopicsRanksDC for topics ranking based on the distance between two clusters that are generated by each topic. We assume that our data consists of text documents that are associated with two-classes. Our approach ranks each topic contained in these text documents by its significance for separating the two-classes. Firstly, the algorithm detects topics using Latent Dirichlet Allocation (LDA). The words defining each topic are represented as two clusters, where each one is associated with one of the classes. We compute four distance metrics, Single Linkage, Complete Linkage, Average Linkage and distance between the centroid. We compare the results of LDA topics and random topics. The results show that the rank for LDA topics is much higher than random topics. The results of TopicsRanksDC tool are promising for future work to enable search engines to suggest related topics.

distance-based topic ranking, random topic, topicsranksdc, (14 more...)

arXiv.org Artificial Intelligence

2105.07826

Country:

Europe > Germany > Berlin (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.67)

Add feedback

053r - Chasing convex bodies and other random topics with Dr. Sébastien Bubeck

#artificialintelligenceDec-31-2019, 11:23:08 GMT

Dr. Sébastien Bubeck is a mathematician and a senior researcher in the Machine Learning and Optimization group at Microsoft Research. He's also a self-proclaimed "bandit" who claims that, despite all the buzz around AI, it's still a science in its infancy. That's why he's devoted his career to advancing the mathematical foundations behind the machine learning algorithms behind AI. Today, Dr. Bubeck explains the difficulty of the multi-armed bandit problem in the context of a parameter- and data-rich online world. He also discusses a host of topics from randomness and convex optimization to metrical task systems and log n competitiveness to the surprising connection between Gaussian kernels and what he calls some of the most beautiful objects in mathematics.

bastien bubeck, data mining, machine learning, (4 more...)

#artificialintelligence

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback