Goto

Collaborating Authors

 Werner, Thorben


Bayesian Active Learning By Distribution Disagreement

arXiv.org Artificial Intelligence

The ever growing need for data for machine learning science and applications has fueled a long history of Active Learning (AL) research, as it is able to reduce the amount of annotations necessary to train strong models. However, most research was done for classification problems, as it is generally easier to derive uncertainty quantification (UC) from classification output without changing the model or training procedure. This feat is a lot less common for regression models, with few historic exceptions like Gaussian Processes. This leads to regression problems being under-researched in AL literature. In this paper, we are focusing specifically on the area of regression and recent models with uncertainty quantification (UC) in the architecture. Recently, two main approaches of UC for regression problems have been researched: Firstly, Gaussian neural networks (GNN) [6, 14], which use a neural network to parametrize ยต and ฯƒ parameters and build a Gaussian predictive distribution and secondly, Normalizing Flows [16, 4], which are parametrizing a free-form predictive distribution with invertible transformations to be able to model more complex target distributions. Their predictive distributions allow these models to not only be trained via Negative Log Likelihood (NLL), but also to draw samples from the predictive distribution as well as to compute the log likelihood of any given point y. Recent works [2, 1] have investigated the potential of uncertainty quantification with normalizing flows by experimenting on synthetic experiments with a known ground-truth uncertainty. Intuitively, a predictive distribution should inertly allow for a good uncertainty quantification (e.g.


Are EEG Sequences Time Series? EEG Classification with Time Series Models and Joint Subject Training

arXiv.org Artificial Intelligence

As with most other data domains, EEG data analysis relies on rich domain-specific preprocessing. Beyond such preprocessing, machine learners would hope to deal with such data as with any other time series data. For EEG classification many models have been developed with layer types and architectures we typically do not see in time series classification. Furthermore, typically separate models for each individual subject are learned, not one model for all of them. In this paper, we systematically study the differences between EEG classification models and generic time series classification models. We describe three different model setups to deal with EEG data from different subjects, subject-specific models (most EEG literature), subject-agnostic models and subject-conditional models. In experiments on three datasets, we demonstrate that off-the-shelf time series classification models trained per subject perform close to EEG classification models, but that do not quite reach the performance of domain-specific modeling. Additionally, we combine time-series models with subject embeddings to train one joint subject-conditional classifier on all subjects. The resulting models are competitive with dedicated EEG models in 2 out of 3 datasets, even outperforming all EEG methods on one of them.


Towards Comparable Active Learning

arXiv.org Machine Learning

Active Learning has received significant attention in the field of machine learning for its potential in selecting the most informative samples for labeling, thereby reducing data annotation costs. However, we show that the reported lifts in recent literature generalize poorly to other domains leading to an inconclusive landscape in Active Learning research. Furthermore, we highlight overlooked problems for reproducing AL experiments that can lead to unfair comparisons and increased variance in the results. This paper addresses these issues by providing an Active Learning framework for a fair comparison of algorithms across different tasks and domains, as well as a fast and performant oracle algorithm for evaluation. To the best of our knowledge, we propose the first AL benchmark that tests algorithms in 3 major domains: Tabular, Image, and Text. We report empirical results for 6 widely used algorithms on 7 real-world and 2 synthetic datasets and aggregate them into a domain-specific ranking of AL algorithms.