AITopics

Country:

North America > United States (0.15)
Europe > Germany > Lower Saxony (0.04)
Asia > Taiwan (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
(2 more...)

Neural Information Processing SystemsFeb-10-2026, 16:15:33 GMT

Appendix

We trained ResNet-50, for 1.1m iterations. We used an SGD optimizer, with a 0.03 learning rate,32 batch size,0.9

artificial intelligence, machine learning, probability cover, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.35)

Neural Information Processing SystemsFeb-9-2026, 11:03:15 GMT

2b09bb02b90584e2be94ff3ae09289bc-Supplemental-Conference.pdf

budget, feature space, typiclust, (16 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Ono, Yuta, Nakamura, Hiroshi, Takase, Hideki

Exploring the Possibility of TypiClust for Low-Budget Federated Active Learning

arXiv.org Artificial IntelligenceNov-20-2025

--Federated Active Learning (F AL) seeks to reduce the burden of annotation under the realistic constraints of federated learning by leveraging Active Learning (AL). As F AL settings make it more expensive to obtain ground truth labels, F AL strategies that work well in low-budget regimes, where the amount of annotation is very limited, are needed. In this work, we investigate the effectiveness of TypiClust, a successful low-budget AL strategy, in low-budget F AL settings. Our empirical results show that TypiClust works well even in low-budget F AL settings contrasted with relatively low performances of other methods, although these settings present additional challenges, such as data heterogeneity, compared to AL. In addition, we show that F AL settings cause distribution shifts in terms of typicality, but TypiClust is not very vulnerable to the shifts. We also analyze the sensitivity of TypiClust to feature extraction methods, and it suggests a way to perform F AL even in limited data situations.

artificial intelligence, deep learning, machine learning, (15 more...)

doi: 10.1109/COMPSAC65507.2025.00087

2505.19404

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.15)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Neural Information Processing SystemsOct-8-2025, 08:52:28 GMT

2b09bb02b90584e2be94ff3ae09289bc-Supplemental-Conference.pdf

budget, feature space, typiclust, (16 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsAug-16-2025, 21:00:32 GMT

8c64bc3f7796d31caa7c3e6b969bf7da-Supplemental-Conference.pdf

artificial intelligence, machine learning, max probability cover, (17 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceJun-4-2025

No Free Lunch in Active Learning: LLM Embedding Quality Dictates Query Strategy Success

Rauch, Lukas, Wirth, Moritz, Huseljic, Denis, Herde, Marek, Sick, Bernhard, Aßenmacher, Matthias

The advent of large language models (LLMs) capable of producing general-purpose representations lets us revisit the practicality of deep active learning (AL): By leveraging frozen LLM embeddings, we can mitigate the computational costs of iteratively fine-tuning large backbones. This study establishes a benchmark and systematically investigates the influence of LLM embedding quality on query strategies in deep AL. We employ five top-performing models from the massive text embedding benchmark (MTEB) leaderboard and two baselines for ten diverse text classification tasks. Our findings reveal key insights: First, initializing the labeled pool using diversity-based sampling synergizes with high-quality embeddings, boosting performance in early AL iterations. Second, the choice of the optimal query strategy is sensitive to embedding quality. While the computationally inexpensive Margin sampling can achieve performance spikes on specific datasets, we find that strategies like Badge exhibit greater robustness across tasks. Importantly, their effectiveness is often enhanced when paired with higher-quality embeddings. Our results emphasize the need for context-specific evaluation of AL strategies, as performance heavily depends on embedding quality and the target task.

large language model, machine learning, query strategy, (17 more...)

2506.01992

Country: Europe (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Werner, Thorben, Burchert, Johannes, Stubbemann, Maximilian, Schmidt-Thieme, Lars

A Cross-Domain Benchmark for Active Learning

arXiv.org Artificial IntelligenceAug-1-2024

Active Learning (AL) deals with identifying the most informative samples for labeling to reduce data annotation costs for supervised learning tasks. AL research suffers from the fact that lifts from literature generalize poorly and that only a small number of repetitions of experiments are conducted. To overcome these obstacles, we propose \emph{CDALBench}, the first active learning benchmark which includes tasks in computer vision, natural language processing and tabular learning. Furthermore, by providing an efficient, greedy oracle, \emph{CDALBench} can be evaluated with 50 runs for each experiment. We show, that both the cross-domain character and a large amount of repetitions are crucial for sophisticated evaluation of AL research. Concretely, we show that the superiority of specific methods varies over the different domains, making it important to evaluate Active Learning with a cross-domain benchmark. Additionally, we show that having a large amount of runs is crucial. With only conducting three runs as often done in the literature, the superiority of specific methods can strongly vary with the specific runs. This effect is so strong, that, depending on the seed, even a well-established method's performance can be significantly better and significantly worse than random for the same dataset.

auc, dataset, query size, (16 more...)

2408.00426

Country:

Europe > Germany > Lower Saxony (0.04)
Asia > Taiwan (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Bae, Wonho, Noh, Junhyug, Sutherland, Danica J.

Generalized Coverage for More Robust Low-Budget Active Learning

arXiv.org Artificial IntelligenceJul-16-2024

The ProbCover method of Yehuda et al. is a well-motivated algorithm for active learning in low-budget regimes, which attempts to "cover" the data distribution with balls of a given radius at selected data points. We demonstrate, however, that the performance of this algorithm is extremely sensitive to the choice of this radius hyper-parameter, and that tuning it is quite difficult, with the original heuristic frequently failing. We thus introduce (and theoretically motivate) a generalized notion of "coverage," including ProbCover's objective as a special case, but also allowing smoother notions that are far more robust to hyper-parameter choice. We propose an efficient greedy method to optimize this coverage, generalizing ProbCover's algorithm; due to its close connection to kernel herding, we call it "MaxHerding." The objective can also be optimized non-greedily through a variant of $k$-medoids, clarifying the relationship to other low-budget active learning methods. In comprehensive experiments, MaxHerding surpasses existing active learning methods across multiple low-budget image classification benchmarks, and does so with less computational cost than most competitive methods.

algorithm, learning, maxherding, (14 more...)

2407.12212

Country:

North America > United States > New York (0.04)
North America > Canada > British Columbia (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(5 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Ono, Yuta, Aczel, Till, Estermann, Benjamin, Wattenhofer, Roger

SUPClust: Active Learning at the Boundaries

arXiv.org Artificial IntelligenceMar-6-2024

Active learning is a machine learning paradigm designed to optimize model performance in a setting where labeled data is expensive to acquire. In this work, we propose a novel active learning method called SUPClust that seeks to identify points at the decision boundary between classes. By targeting these points, SUP-Clust aims to gather information that is most informative for refining the model's prediction of complex decision regions. We demonstrate experimentally that labeling these points leads to strong model performance. This improvement is observed even in scenarios characterized by strong class imbalance.

boundary, decision boundary, supclust, (16 more...)

2403.03741

Country:

North America > United States > Florida > Miami-Dade County > Miami (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
(4 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)