AITopics | active sampling

Collaborating Authors

active sampling

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Informative Perturbation Selection for Uncertainty-Aware Post-hoc Explanations

Chugh, Sumedha, Prasad, Ranjitha, Shah, Nazreen

arXiv.org Machine LearningMar-18-2026

Trust and ethical concerns due to the widespread deployment of opaque machine learning (ML) models motivating the need for reliable model explanations. Post-hoc model-agnostic explanation methods addresses this challenge by learning a surrogate model that approximates the behavior of the deployed black-box ML model in the locality of a sample of interest. In post-hoc scenarios, neither the underlying model parameters nor the training are available, and hence, this local neighborhood must be constructed by generating perturbed inputs in the neighborhood of the sample of interest, and its corresponding model predictions. We propose \emph{Expected Active Gain for Local Explanations} (\texttt{EAGLE}), a post-hoc model-agnostic explanation framework that formulates perturbation selection as an information-theoretic active learning problem. By adaptively sampling perturbations that maximize the expected information gain, \texttt{EAGLE} efficiently learns a linear surrogate explainable model while producing feature importance scores along with the uncertainty/confidence estimates. Theoretically, we establish that cumulative information gain scales as $\mathcal{O}(d \log t)$, where $d$ is the feature dimension and $t$ represents the number of samples, and that the sample complexity grows linearly with $d$ and logarithmically with the confidence parameter $1/δ$. Empirical results on tabular and image datasets corroborate our theoretical findings and demonstrate that \texttt{EAGLE} improves explanation reproducibility across runs, achieves higher neighborhood stability, and improves perturbation sample quality as compared to state-of-the-art baselines such as Tilia, US-LIME, GLIME and BayesLIME.

artificial intelligence, machine learning, perturbation, (13 more...)

arXiv.org Machine Learning

2603.14894

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.64)

Industry:

Education (0.68)
Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Transfer Learning with Active Sampling for Rapid Training and Calibration in BCI-P300 Across Health States and Multi-centre Data

Flores, Christian, Contreras, Marcelo, Macedo, Ichiro, Andreu-Perez, Javier

arXiv.org Artificial IntelligenceDec-14-2024

Machine learning and deep learning advancements have boosted Brain-Computer Interface (BCI) performance, but their wide-scale applicability is limited due to factors like individual health, hardware variations, and cultural differences affecting neural data. Studies often focus on uniform single-site experiments in uniform settings, leading to high performance that may not translate well to real-world diversity. Deep learning models aim to enhance BCI classification accuracy, and transfer learning has been suggested to adapt models to individual neural patterns using a base model trained on others' data. This approach promises better generalizability and reduced overfitting, yet challenges remain in handling diverse and imbalanced datasets from different equipment, subjects, multiple centres in different countries, and both healthy and patient populations for effective model transfer and tuning. In a setting characterized by maximal heterogeneity, we proposed P300 wave detection in BCIs employing a convolutional neural network fitted with adaptive transfer learning based on Poison Sampling Disk (PDS) called Active Sampling (AS), which flexibly adjusts the transition from source data to the target domain. Our results reported for subject adaptive with 40% of adaptive fine-tuning that the averaged classification accuracy improved by 5.36% and standard deviation reduced by 12.22% using two distinct, internationally replicated datasets. These results outperformed in classification accuracy, computational time, and training efficiency, mainly due to the proposed Active Sampling (AS) method for transfer learning.

active sampling, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TNSRE.2024.3420960

2412.17833

Country:

South America (0.46)
North America (0.28)

Genre:

Research Report > New Finding (0.88)
Research Report > Experimental Study (0.68)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Personalized Federated Learning via Active Sampling

Jung, Alexander, SarcheshmehPour, Yasmin, Mohammadi, Amirhossein

arXiv.org Artificial IntelligenceSep-8-2024

Consider a collection of data generators which could represent, e.g., humans equipped with a smart-phone or wearables. We want to train a personalized (or tailored) model for each data generator even if they provide only small local datasets. The available local datasets might fail to provide sufficient statistical power to train high-dimensional models (such as deep neural networks) effectively. One possible solution is to identify similar data generators and pool their local datasets to obtain a sufficiently large training set. This paper proposes a novel method for sequentially identifying similar (or relevant) data generators. Our method is similar in spirit to active sampling methods but does not require exchange of raw data. Indeed, our method evaluates the relevance of a data generator by evaluating the effect of a gradient step using its local dataset. This evaluation can be performed in a privacy-friendly fashion without sharing raw data. We extend this method to non-parametric models by a suitable generalization of the gradient step to update a hypothesis using the local dataset provided by a data generator.

active sampling, personalized federated learning

arXiv.org Artificial Intelligence

2409.02064

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.53)

Add feedback

Sensitivity Analysis for Active Sampling, with Applications to the Simulation of Analog Circuits

Chhaibi, Reda, Gamboa, Fabrice, Oger, Christophe, Oliveira, Vinicius, Pellegrini, Clément, Remot, Damien

arXiv.org Machine LearningMay-13-2024

We propose an active sampling flow, with the use-case of simulating the impact of combined variations on analog circuits. In such a context, given the large number of parameters, it is difficult to fit a surrogate model and to efficiently explore the space of design features. By combining a drastic dimension reduction using sensitivity analysis and Bayesian surrogate modeling, we obtain a flexible active sampling flow. On synthetic and real datasets, this flow outperforms the usual Monte-Carlo sampling which often forms the foundation of design space exploration.

dataset, selection, variation, (13 more...)

arXiv.org Machine Learning

2405.07971

Country:

Europe > France > Occitanie > Haute-Garonne > Toulouse (0.07)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.34)

Add feedback

Bucketized Active Sampling for Learning ACOPF

Klamkin, Michael, Tanneau, Mathieu, Mak, Terrence W. K., Van Hentenryck, Pascal

arXiv.org Artificial IntelligenceOct-2-2023

This paper considers optimization proxies for Optimal Power Flow (OPF), i.e., machine-learning models that approximate the input/output relationship of OPF. Recent work has focused on showing that such proxies can be of high fidelity. However, their training requires significant data, each instance necessitating the (offline) solving of an OPF for a sample of the input distribution. To meet the requirements of market-clearing applications, this paper proposes Bucketized Active Sampling (BAS), a novel active learning framework that aims at training the best possible OPF proxy within a time limit. BAS partitions the input distribution into buckets and uses an acquisition function to determine where to sample next. By applying the same partitioning to the validation set, BAS leverages labeled validation samples in the selection of unlabeled samples. BAS also relies on an adaptive learning rate that increases and decreases over time. Experimental results demonstrate the benefits of BAS.

acopf, bas, international conference, (13 more...)

arXiv.org Artificial Intelligence

2208.07497

Country:

Europe > France (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Energy > Power Industry (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

GitRanking: A Ranking of GitHub Topics for Software Classification using Active Sampling

Sas, Cezar, Capiluppi, Andrea, Di Sipio, Claudio, Di Rocco, Juri, Di Ruscio, Davide

arXiv.org Artificial IntelligenceMay-19-2022

GitHub is the world's largest host of source code, with more than 150M repositories. However, most of these repositories are not labeled or inadequately so, making it harder for users to find relevant projects. There have been various proposals for software application domain classification over the past years. However, these approaches lack a well-defined taxonomy that is hierarchical, grounded in a knowledge base, and free of irrelevant terms. This work proposes GitRanking, a framework for creating a classification ranked into discrete levels based on how general or specific their meaning is. We collected 121K topics from GitHub and considered $60\%$ of the most frequent ones for the ranking. GitRanking 1) uses active sampling to ensure a minimal number of required annotations; and 2) links each topic to Wikidata, reducing ambiguities and improving the reusability of the taxonomy. Our results show that developers, when annotating their projects, avoid using terms with a high degree of specificity. This makes the finding and discovery of their projects more challenging for other users. Furthermore, we show that GitRanking can effectively rank terms according to their general or specific meaning. This ranking would be an essential asset for developers to build upon, allowing them to complement their annotations with more precise topics. Finally, we show that GitRanking is a dynamically extensible method: it can currently accept further terms to be ranked with a minimum number of annotations ($\sim$ 15). This paper is the first collective attempt to build a ground-up taxonomy of software domains.

annotation, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1002/spe.3238

2205.09379

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
(20 more...)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science (0.93)
(3 more...)

Add feedback

Marginal and Joint Cross-Entropies & Predictives for Online Bayesian Inference, Active Learning, and Active Sampling

Kirsch, Andreas, Kossen, Jannik, Gal, Yarin

arXiv.org Machine LearningMay-18-2022

Principled Bayesian deep learning (BDL) does not live up to its potential when we only focus on marginal predictive distributions (marginal predictives). Recent works have highlighted the importance of joint predictives for (Bayesian) sequential decision making from a theoretical and synthetic perspective. We provide additional practical arguments grounded in real-world applications for focusing on joint predictives: we discuss online Bayesian inference, which would allow us to make predictions while taking into account additional data without retraining, and we propose new challenging evaluation settings using active learning and active sampling. These settings are motivated by an examination of marginal and joint predictives, their respective cross-entropies, and their place in offline and online learning. They are more realistic than previously suggested ones, building on work by Wen et al. (2021) and Osband et al. (2022), and focus on evaluating the performance of approximate BNNs in an online supervised setting. Initial experiments, however, raise questions on the feasibility of these ideas in high-dimensional parameter spaces with current BDL inference techniques, and we suggest experiments that might help shed further light on the practicality of current research for these problems. Importantly, our work highlights previously unidentified gaps in current research and the need for better approximate joint predictives.

artificial intelligence, joint cross-entropy & predictive, machine learning, (3 more...)

arXiv.org Machine Learning

2205.08766

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.60)

Add feedback