AITopics

2503.16531

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine > Therapeutic Area (0.69)
Health & Medicine > Health Care Technology > Medical Record (0.56)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)

arXiv.org Artificial IntelligenceMar-18-2024

TuneTables: Context Optimization for Scalable Prior-Data Fitted Networks

Feuer, Benjamin, Schirrmeister, Robin Tibor, Cherepanova, Valeriia, Hegde, Chinmay, Hutter, Frank, Goldblum, Micah, Cohen, Niv, White, Colin

While tabular classification has traditionally relied on from-scratch training, a recent breakthrough called prior-data fitted networks (PFNs) challenges this approach. Similar to large language models, PFNs make use of pretraining and in-context learning to achieve strong performance on new tasks in a single forward pass. However, current PFNs have limitations that prohibit their widespread adoption. Notably, TabPFN achieves very strong performance on small tabular datasets but is not designed to make predictions for datasets of size larger than 1000. In this work, we overcome these limitations and substantially improve the performance of PFNs by developing context optimization techniques for PFNs. Specifically, we propose TuneTables, a novel prompt-tuning strategy that compresses large datasets into a smaller learned context. TuneTables scales TabPFN to be competitive with state-of-the-art tabular classification methods on larger datasets, while having a substantially lower inference time than TabPFN. Furthermore, we show that TuneTables can be used as an interpretability tool and can even be used to mitigate biases by optimizing a fairness objective.

data mining, large language model, machine learning, (21 more...)

2402.11137

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.95)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
(2 more...)

arXiv.org Artificial IntelligenceAug-1-2023

Deep Riemannian Networks for EEG Decoding

Wilson, Daniel, Schirrmeister, Robin Tibor, Gemein, Lukas Alexander Wilhelm, Ball, Tonio

State-of-the-art performance in electroencephalography (EEG) decoding tasks is currently often achieved with either Deep-Learning (DL) or Riemannian-Geometry-based decoders (RBDs). Recently, there is growing interest in Deep Riemannian Networks (DRNs) possibly combining the advantages of both previous classes of methods. However, there are still a range of topics where additional insight is needed to pave the way for a more widespread application of DRNs in EEG. These include architecture design questions such as network size and end-to-end ability.How these factors affect model performance has not been explored. Additionally, it is not clear how the data within these networks is transformed, and whether this would correlate with traditional EEG decoding. Our study aims to lay the groundwork in the area of these topics through the analysis of DRNs for EEG with a wide range of hyperparameters. Networks were tested on two public EEG datasets and compared with state-of-the-art ConvNets. Here we propose end-to-end EEG SPDNet (EE(G)-SPDNet), and we show that this wide, end-to-end DRN can outperform the ConvNets, and in doing so use physiologically plausible frequency regions. We also show that the end-to-end approach learns more complex filters than traditional band-pass filters targeting the classical alpha, beta, and gamma frequency bands of the EEG, and that performance can benefit from channel specific filtering approaches. Additionally, architectural analysis revealed areas for further improvement due to the possible loss of Riemannian specific information throughout the network. Our study thus shows how to design and train DRNs to infer task-related information from the raw EEG without the need of handcrafted filterbanks and highlights the potential of end-to-end DRNs such as EE(G)-SPDNet for high-performance EEG decoding.

artificial intelligence, machine learning, matrix, (18 more...)

2212.10426

Country:

Europe > Germany > Baden-Württemberg (0.14)
North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Health Care Technology (0.87)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.93)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Software (0.92)

arXiv.org Artificial IntelligenceJul-16-2022

On the Importance of Hyperparameters and Data Augmentation for Self-Supervised Learning

Wagner, Diane, Ferreira, Fabio, Stoll, Danny, Schirrmeister, Robin Tibor, Müller, Samuel, Hutter, Frank

Self-Supervised Learning (SSL) has become a very active area of Deep Learning research where it is heavily used as a pre-training method for classification and other tasks. However, the rapid pace of advancements in this area comes at a price: training pipelines vary significantly across papers, which presents a potentially crucial confounding factor. Here, we show that, indeed, the choice of hyperparameters and data augmentation strategies can have a dramatic impact on performance. To shed light on these neglected factors and help maximize the power of SSL, we hyperparameterize these components and optimize them with Bayesian optimization, showing improvements across multiple datasets for the SimSiam SSL approach. Realizing the importance of data augmentations for SSL, we also introduce a new automated data augmentation algorithm, GroupAugment, which considers groups of augmentations and optimizes the sampling across groups. In contrast to algorithms designed for supervised learning, GroupAugment achieved consistently high linear evaluation accuracy across all datasets we considered. Overall, our results indicate the importance and likely underestimated role of data augmentation for SSL.

artificial intelligence, inductive learning, machine learning, (17 more...)

2207.07875

Country:

Europe > Germany (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

arXiv.org Machine LearningNov-2-2020

Understanding Anomaly Detection with Deep Invertible Networks through Hierarchies of Distributions and Features

Schirrmeister, Robin Tibor, Zhou, Yuxuan, Ball, Tonio, Zhang, Dan

Deep generative networks trained via maximum likelihood on a natural image dataset like CIFAR10 often assign high likelihoods to images from datasets with different objects (e.g., SVHN). We refine previous investigations of this failure at anomaly detection for invertible generative networks and provide a clear explanation of it as a combination of model bias and domain prior: Convolutional networks learn similar low-level feature distributions when trained on any natural image dataset and these low-level features dominate the likelihood. Hence, when the discriminative features between inliers and outliers are on a high-level, e.g., object shapes, anomaly detection becomes particularly challenging. To remove the negative impact of model bias and domain prior on detecting high-level differences, we propose two methods, first, using the log likelihood ratios of two identical models, one trained on the in-distribution data (e.g., CIFAR10) and the other one on a more general distribution of images (e.g., 80 Million Tiny Images). We also derive a novel outlier loss for the in-distribution network on samples from the more general distribution to further improve the performance. Secondly, using a multi-scale model like Glow, we show that low-level features are mainly captured at early scales. Therefore, using only the likelihood contribution of the final scale performs remarkably well for detecting high-level feature differences of the out-of-distribution and the in-distribution. This method is especially useful if one does not have access to a suitable general distribution. Overall, our methods achieve strong anomaly detection performance in the unsupervised setting, and only slightly underperform state-of-the-art classifier-based methods in the supervised setting. Code can be found at https://github.com/boschresearch/hierarchical_anomaly_detection.

deep learning, likelihood, neural network, (24 more...)

2006.10848

Country:

Europe > Germany (0.14)
North America > Canada (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (1.00)

arXiv.org Machine LearningJul-17-2019

Deep Invertible Networks for EEG-based brain-signal decoding

Schirrmeister, Robin Tibor, Ball, Tonio

Deep-learning-based brain-signal decoding has recently achieved competitive accuracies compared with traditional feature-based decoding approaches. For example, they were used to decode movement-related EEG signals with accuracies at least as good as well-established movement-decoding approaches (Schirrmeister et al., 2017a) and been applied to error or event-related-based decoding (Lawhern et al., 2018, Völker et al., 2018) as well as automatic diagnosis of pathologies (Schirrmeister et.

deep learning, invertible network, neural network, (20 more...)

1907.07746

Country: Europe > Germany (0.15)

Genre: Research Report (0.40)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.72)
Health & Medicine > Health Care Technology (0.72)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

arXiv.org Artificial IntelligenceJun-12-2018

Acting Thoughts: Towards a Mobile Robotic Service Assistant for Users with Limited Communication Skills

Burget, Felix, Fiederer, Lukas Dominique Josef, Kuhner, Daniel, Völker, Martin, Aldinger, Johannes, Schirrmeister, Robin Tibor, Do, Chau, Boedecker, Joschka, Nebel, Bernhard, Ball, Tonio, Burgard, Wolfram

As autonomous service robots become more affordable and thus available also for the general public, there is a growing need for user friendly interfaces to control the robotic system. Currently available control modalities typically expect users to be able to express their desire through either touch, speech or gesture commands. While this requirement is fulfilled for the majority of users, paralyzed users may not be able to use such systems. In this paper, we present a novel framework, that allows these users to interact with a robotic service assistant in a closed-loop fashion, using only thoughts. The brain-computer interface (BCI) system is composed of several interacting components, i.e., non-invasive neuronal signal recording and decoding, high-level task planning, motion and manipulation planning as well as environment perception. In various experiments, we demonstrate its applicability and robustness in real world scenarios, considering fetch-and-carry tasks and tasks involving human-robot interaction. As our results demonstrate, our system is capable of adapting to frequent changes in the environment and reliably completing given tasks within a reasonable amount of time. Combined with high-level planning and autonomous robotic systems, interesting new perspectives open up for non-invasive BCI-based human-robot interactions.

convnet, neural network, planning & scheduling, (19 more...)

doi: 10.1109/ECMR.2017.8098658

1707.06633

Country: Europe > Germany > Baden-Württemberg (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.94)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

arXiv.org Machine LearningJun-6-2018

Generative Reversible Networks

Schirrmeister, Robin Tibor, Chrabąszcz, Patryk, Hutter, Frank, Ball, Tonio

Generative models with an encoding component such as autoencoders currently receive great interest. However, training of autoencoders is typically complicated by the need to train a separate encoder and decoder model that have to be enforced to be reciprocal to each other. Here, we propose to use the by-design reversible neural networks (RevNets) as a new class of generative models. We investigate the generative performance of RevNets on the CelebA dataset, showing that generative RevNets can generate coherent faces with similar quality as Variational Autoencoders. This first attempt to use RevNets as a generative model slightly underperformed relative to recent advanced generative models using an autoencoder component on CelebA, but this gap may diminish with further optimization of the training setup of generative RevNets. In addition to the experiments on CelebA, we show a proof-of-principle experiment on the MNIST dataset suggesting that adversary-free trained RevNets can discover meaningful latent dimensions without pre-specifying the number of dimensions of the latent sampling distribution. In summary, this study shows that RevNets enable generative applications with an encoding component while overcoming the need to train a separate encoder and decoder model.

deep learning, dimension, neural network, (19 more...)

1806.0161

Country: North America > United States (0.28)

Genre: Research Report (0.85)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningJun-5-2018

EEG-GAN: Generative adversarial networks for electroencephalograhic (EEG) brain signals

Hartmann, Kay Gregor, Schirrmeister, Robin Tibor, Ball, Tonio

Generative adversarial networks (GANs) are recently highly successful in generative applications involving images and start being applied to time series data. Here we describe EEG-GAN as a framework to generate electroencephalographic (EEG) brain signals. We introduce a modification to the improved training of Wasserstein GANs to stabilize training and investigate a range of architectural choices critical for time series generation (most notably up- and down-sampling). For evaluation we consider and compare different metrics such as Inception score, Frechet inception distance and sliced Wasserstein distance, together showing that our EEG-GAN framework generated naturalistic EEG examples. It thus opens up a range of new generative application scenarios in the neuroscientific and neurological context, such as data augmentation in brain-computer interfacing tasks, EEG super-sampling, or restoration of corrupted data segments. The possibility to generate signals of a certain class and/or with specific properties may also open a new avenue for research into the underlying structure of brain signals.

generative adversarial network, neural network, neurology, (18 more...)

1806.01875

Country: Europe > Germany (0.14)

Genre: Research Report (0.67)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (0.92)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningJan-11-2018

Deep learning with convolutional neural networks for decoding and visualization of EEG pathology

Schirrmeister, Robin Tibor, Gemein, Lukas, Eggensperger, Katharina, Hutter, Frank, Ball, Tonio

We apply convolutional neural networks (ConvNets) to the task of distinguishing pathological from normal EEG recordings in the Temple University Hospital EEG Abnormal Corpus. We use two basic, shallow and deep ConvNet architectures recently shown to decode task-related information from EEG at least as well as established algorithms designed for this purpose. In decoding EEG pathology, both ConvNets reached substantially better accuracies (about 6% better, ~85% vs. ~79%) than the only published result for this dataset, and were still better when using only 1 minute of each recording for training and only six seconds of each recording for testing. We used automated methods to optimize architectural hyperparameters and found intriguingly different ConvNet architectures, e.g., with max pooling as the only nonlinearity. Visualizations of the ConvNet decoding behavior showed that they used spectral power changes in the delta (0-4 Hz) and theta (4-8 Hz) frequency range, possibly alongside other features, consistent with expectations derived from spectral analysis of the EEG data and from the textual medical reports. Analysis of the textual medical reports also highlighted the potential for accuracy increases by integrating contextual information, such as the age of subjects. In summary, the ConvNets and visualization techniques used in this study constitute a next step towards clinically useful automated EEG diagnosis and establish a new baseline for future work on this topic.

convnet, deep learning, neural network, (20 more...)

1708.08012

Country: Europe > Germany (0.15)

Genre: Research Report > New Finding (0.35)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Health Care Technology (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)