AITopics | diversity term

Collaborating Authors

diversity term

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Diverse Ensemble Evolution: Curriculum Data-Model Marriage

Neural Information Processing SystemsMar-16-2026, 19:33:22 GMT

We study a new method (``Diverse Ensemble Evolution (DivE$^2$)'') to train an ensemble of machine learning models that assigns data to models at each training epoch based on each model's current expertise and an intra-and inter-model diversity reward. DivE$^2$ schedules, over the course of training epochs, the relative importance of these characteristics; it starts by selecting easy samples for each model, and then gradually adjusts towards the models having specialized and complementary expertise on subsets of the training data, thereby encouraging high accuracy of the ensemble. We utilize an intra-model diversity term on data assigned to each model, and an inter-model diversity term on data assigned to pairs of models, to penalize both within-model and cross-model redundancy. We formulate the data-model marriage problem as a generalized bipartite matching, represented as submodular maximization subject to two matroid constraints. DivE$^2$ solves a sequence of continuous-combinatorial optimizations with slowly varying objectives and constraints. The combinatorial part handles the data-model marriage while the continuous part updates model parameters based on the assignments. In experiments, DivE$^2$ outperforms other ensemble training methods under a variety of model aggregation techniques, while also maintaining competitive efficiency.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Diverse Ensemble Evolution: Curriculum Data-Model Marriage

Neural Information Processing SystemsNov-20-2025, 22:00:56 GMT

curriculum data-model marriage, diverse ensemble evolution, name change, (4 more...)

Neural Information Processing Systems

Industry: Education > Educational Setting > Religious School (0.43)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsOct-3-2025, 00:31:22 GMT

Hence, many segmentations will only consist of background, such as in Figure 1.

diversity function, diversity term, segmentation, (14 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.97)

Add feedback

Diversity of Transformer Layers: One Aspect of Parameter Scaling Laws

Kamigaito, Hidetaka, Zhang, Ying, Kwon, Jingun, Hayashi, Katsuhiko, Okumura, Manabu, Watanabe, Taro

arXiv.org Artificial IntelligenceJun-10-2025

Transformers deliver outstanding performance across a wide range of tasks and are now a dominant backbone architecture for large language models (LLMs). Their task-solving performance is improved by increasing parameter size, as shown in the recent studies on parameter scaling laws. Although recent mechanistic-interpretability studies have deepened our understanding of the internal behavior of Transformers by analyzing their residual stream, the relationship between these internal mechanisms and the parameter scaling laws remains unclear. To bridge this gap, we focus on layers and their size, which mainly decide the parameter size of Transformers. For this purpose, we first theoretically investigate the layers within the residual stream through a bias-diversity decomposition. The decomposition separates (i) bias, the error of each layer's output from the ground truth, and (ii) diversity, which indicates how much the outputs of each layer differ from each other. Analyzing Transformers under this theory reveals that performance improves when individual layers make predictions close to the correct answer and remain mutually diverse. We show that diversity becomes especially critical when individual layers' outputs are far from the ground truth. Finally, we introduce an information-theoretic diversity and show our main findings that adding layers enhances performance only when those layers behave differently, i.e., are diverse. We also reveal the performance gains from increasing the number of layers exhibit submodularity: marginal improvements diminish as additional layers increase, mirroring the logarithmic convergence predicted by the parameter scaling laws. Experiments on multiple semantic-understanding tasks with various LLMs empirically confirm the theoretical properties derived in this study.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.24009

Country: North America > United States > Minnesota (0.28)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Reviews: Diverse Ensemble Evolution: Curriculum Data-Model Marriage

Neural Information Processing SystemsOct-7-2024, 07:38:55 GMT

This paper proposes a new technique for training ensembles of predictors for supervised-learning tasks. Their main insight is to train individual members of the ensemble in a manner such that they specialize on different parts of the dataset reducing redundancy amongst members and better utilizing the capacity of the individual members. The hope is that ensembles formed out of such predictors will perform better than traditional ensembling techniques. The proposed technique explicitly enforces diversity in two ways: 1. inter-model diversity which makes individual models (predictors) different from each other and 2. intra-model diversity which makes predictors choose data points which are not all similar to each other so that they don't specialize in a very narrow region of the data distribution. This is posed as a bipartite graph matching problem which aims to find a matching between samples and models by selecting edges such that the smallest sum of edge costs is chosen (this is inverted to a maximization problem by subtracting from the highest constant cost one can have on the edges.) To avoid degenerate assignments another matching constraint is introduced which restricts the size of samples selected by each model as well.

curriculum data-model marriage, dataset, diverse ensemble evolution, (8 more...)

Neural Information Processing Systems

Industry: Education > Educational Setting > Religious School (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.37)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.32)

Add feedback

Ensemble of Loss Functions to Improve Generalizability of Deep Metric Learning methods

Zabihzadeh, Davood

arXiv.org Artificial IntelligenceJul-2-2021

Deep Metric Learning (DML) learns a non-linear semantic embedding from input data that brings similar pairs together while keeps dissimilar data away from each other. To this end, many different methods are proposed in the last decade with promising results in various applications. The success of a DML algorithm greatly depends on its loss function. However, no loss function is perfect, and it deals only with some aspects of an optimal similarity embedding. Besides, the generalizability of the DML on unseen categories during the test stage is an important matter that is not considered by existing loss functions. To address these challenges, we propose novel approaches to combine different losses built on top of a shared deep feature extractor. The proposed ensemble of losses enforces the deep model to extract features that are consistent with all losses. Since the selected losses are diverse and each emphasizes different aspects of an optimal semantic embedding, our effective combining methods yield a considerable improvement over any individual loss and generalize well on unseen categories. Here, there is no limitation in choosing loss functions, and our methods can work with any set of existing ones. Besides, they can optimize each loss function as well as its weight in an end-to-end paradigm with no need to adjust any hyper-parameter. We evaluate our methods on some popular datasets from the machine vision domain in conventional Zero-Shot-Learning (ZSL) settings. The results are very encouraging and show that our methods outperform all baseline losses by a large margin in all datasets.

dataset, loss function, metric learning, (16 more...)

arXiv.org Artificial Intelligence

2107.0113

Country: Asia > Middle East > Iran (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Diverse Ensemble Evolution: Curriculum Data-Model Marriage

Zhou, Tianyi, Wang, Shengjie, Bilmes, Jeff A.

Neural Information Processing SystemsFeb-14-2020, 17:40:59 GMT

We study a new method ( Diverse Ensemble Evolution (DivE$ 2$)'') to train an ensemble of machine learning models that assigns data to models at each training epoch based on each model's current expertise and an intra- and inter-model diversity reward. DivE$ 2$ schedules, over the course of training epochs, the relative importance of these characteristics; it starts by selecting easy samples for each model, and then gradually adjusts towards the models having specialized and complementary expertise on subsets of the training data, thereby encouraging high accuracy of the ensemble. We utilize an intra-model diversity term on data assigned to each model, and an inter-model diversity term on data assigned to pairs of models, to penalize both within-model and cross-model redundancy. We formulate the data-model marriage problem as a generalized bipartite matching, represented as submodular maximization subject to two matroid constraints. DivE$ 2$ solves a sequence of continuous-combinatorial optimizations with slowly varying objectives and constraints.

curriculum data-model marriage, diverse ensemble evolution, diversity term, (2 more...)

Neural Information Processing Systems

Industry: Education > Educational Setting > Religious School (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Improving sample diversity of a pre-trained, class-conditional GAN by changing its class embeddings

Li, Qi, Mai, Long, Nguyen, Anh

arXiv.org Machine LearningOct-10-2019

Mode collapse is a well-known issue with Generative Adversarial Networks (GANs) and is a byproduct of unstable GAN training. We propose to improve the sample diversity of a pre-trained class-conditional generator by modifying its class embeddings in the direction of maximizing the log probability outputs of a classifier pre-trained on the same dataset. We improved the sample diversity of state-of-the-art ImageNet BigGANs at both 128x128 and 256x256 resolutions. By replacing the embeddings, we can also synthesize plausible images for Places365 using a BigGAN pre-trained on ImageNet.

biggan sample, diversity, sample diversity, (15 more...)

arXiv.org Machine Learning

1910.0476

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > France (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation > Ground (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)

Add feedback

Re-ranking Based Diversification: A Unifying View

Parambath, Shameem A Puthiya

arXiv.org Machine LearningJun-26-2019

We analyze different re-ranking algorithms for diversification and show that majority of them are based on maximizing submodular/modular functions from the class of parameterized concave/linear over modular functions. We study the optimality of such algorithms in terms of the `total curvature'. We also show that by adjusting the hyperparameter of the concave/linear composition to trade-off relevance and diversity, if any, one is in fact tuning the `total curvature' of the function for relevance-diversity trade-off.

algorithm, artificial intelligence, machine learning, (13 more...)

arXiv.org Machine Learning

1906.11285

Country: Asia > Middle East > Qatar (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback

Diversity-Inducing Policy Gradient: Using Maximum Mean Discrepancy to Find a Set of Diverse Policies

Masood, Muhammad A., Doshi-Velez, Finale

arXiv.org Machine LearningMay-31-2019

Standard reinforcement learning methods aim to master one way of solving a task whereas there may exist multiple near-optimal policies. Being able to identify this collection of near-optimal policies can allow a domain expert to efficiently explore the space of reasonable solutions. Unfortunately, existing approaches that quantify uncertainty over policies are not ultimately relevant to finding policies with qualitatively distinct behaviors. In this work, we formalize the difference between policies as a difference between the distribution of trajectories induced by each policy, which encourages diversity with respect to both state visitation and action choices. We derive a gradient-based optimization technique that can be combined with existing policy gradient methods to now identify diverse collections of well-performing policies. We demonstrate our approach on benchmarks and a healthcare task.

machine learning, reinforcement learning, trajectory, (15 more...)

arXiv.org Machine Learning

1906.00088

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback