AITopics | Tschannen, Michael

Collaborating Authors

Tschannen, Michael

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Scaling Vision Transformers to 22 Billion Parameters

Dehghani, Mostafa, Djolonga, Josip, Mustafa, Basil, Padlewski, Piotr, Heek, Jonathan, Gilmer, Justin, Steiner, Andreas, Caron, Mathilde, Geirhos, Robert, Alabdulmohsin, Ibrahim, Jenatton, Rodolphe, Beyer, Lucas, Tschannen, Michael, Arnab, Anurag, Wang, Xiao, Riquelme, Carlos, Minderer, Matthias, Puigcerver, Joan, Evci, Utku, Kumar, Manoj, van Steenkiste, Sjoerd, Elsayed, Gamaleldin F., Mahendran, Aravindh, Yu, Fisher, Oliver, Avital, Huot, Fantine, Bastings, Jasmijn, Collier, Mark Patrick, Gritsenko, Alexey, Birodkar, Vighnesh, Vasconcelos, Cristina, Tay, Yi, Mensink, Thomas, Kolesnikov, Alexander, Pavetić, Filip, Tran, Dustin, Kipf, Thomas, Lučić, Mario, Zhai, Xiaohua, Keysers, Daniel, Harmsen, Jeremiah, Houlsby, Neil

arXiv.org Artificial IntelligenceFeb-10-2023

The scaling of Transformers has driven breakthrough capabilities for language models. At present, the largest large language models (LLMs) contain upwards of 100B parameters. Vision Transformers (ViT) have introduced the same architecture to image and video modelling, but these have not yet been successfully scaled to nearly the same degree; the largest dense ViT contains 4B parameters (Chen et al., 2022). We present a recipe for highly efficient and stable training of a 22B-parameter ViT (ViT-22B) and perform a wide variety of experiments on the resulting model. When evaluated on downstream tasks (often with a lightweight linear model on frozen features), ViT-22B demonstrates increasing performance with scale. We further observe other interesting benefits of scale, including an improved tradeoff between fairness and performance, state-of-the-art alignment to human visual perception in terms of shape/texture bias, and improved robustness. ViT-22B demonstrates the potential for "LLM-like" scaling in vision, and provides key steps towards getting there.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2302.05442

Country: North America (0.46)

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine > Therapeutic Area (0.68)
Transportation > Ground > Road (0.67)
Automobiles & Trucks (0.67)
Transportation > Passenger (0.45)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Visual Task Adaptation Benchmark

Zhai, Xiaohua, Puigcerver, Joan, Kolesnikov, Alexander, Ruyssen, Pierre, Riquelme, Carlos, Lucic, Mario, Djolonga, Josip, Pinto, Andre Susano, Neumann, Maxim, Dosovitskiy, Alexey, Beyer, Lucas, Bachem, Olivier, Tschannen, Michael, Michalski, Marcin, Bousquet, Olivier, Gelly, Sylvain, Houlsby, Neil

arXiv.org Machine LearningOct-1-2019

Representation learning promises to unlock deep learning for the long tail of vision tasks without expansive labelled datasets. Y et, the absence of a unified yardstick to evaluate general visual representations hinders progress. Many sub-fields promise representations, but each has different evaluation protocols that are either too constrained (linear classification), limited in scope (ImageNet, CIFAR, Pascal-VOC), or only loosely related to representation quality (generation). We present the Visual Task Adaptation Benchmark (VT AB): a diverse, realistic, and challenging benchmark to evaluate representations. VT AB embodies one principle: good representations adapt to unseen tasks with few examples . We run a large VT AB study of popular algorithms, answering questions like: How effective are ImageNet representation on nonstandard datasets? Is self-supervision useful if one already has labels? Deep learning has revolutionized computer vision. Distributed representations learned from ...

deep learning, neural network, representation, (22 more...)

arXiv.org Machine Learning

1910.04867

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area (0.47)
Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

On Mutual Information Maximization for Representation Learning

Tschannen, Michael, Djolonga, Josip, Rubenstein, Paul K., Gelly, Sylvain, Lucic, Mario

arXiv.org Machine LearningJul-31-2019

Many recent methods for unsupervised or self-supervised representation learning train feature extractors by maximizing an estimate of the mutual information (MI) between different views of the data. This comes with several immediate problems: For example, MI is notoriously hard to estimate, and using it as an objective for representation learning may lead to highly entangled representations due to its invariance under arbitrary invertible transformations. Nevertheless, these methods have been repeatedly shown to excel in practice. In this paper we argue, and provide empirical evidence, that the success of these methods might be only loosely attributed to the properties of MI, and that they strongly depend on the inductive bias in both the choice of feature extractor architectures and the parametrization of the employed MI estimators. Finally, we establish a connection to deep metric learning and argue that this interpretation may be a plausible explanation for the success of the recently introduced methods.

artificial intelligence, neural network, representation, (16 more...)

arXiv.org Machine Learning

1907.13625

Country: Europe (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)

Add feedback

Disentangling Factors of Variation Using Few Labels

Locatello, Francesco, Tschannen, Michael, Bauer, Stefan, Rätsch, Gunnar, Schölkopf, Bernhard, Bachem, Olivier

arXiv.org Machine LearningMay-3-2019

Learning disentangled representations is considered a cornerstone problem in representation learning. Recently, Locatello et al. (2019) demonstrated that unsupervised disentanglement learning without inductive biases is theoretically impossible and that existing inductive biases and unsupervised methods do not allow to consistently learn disentangled representations. However, in many practical settings, one might have access to a very limited amount of supervision, for example through manual labeling of training examples. In this paper, we investigate the impact of such supervision on state-of-the-art disentanglement methods and perform a large scale study, training over 29 000 models under well-defined and reproducible experimental conditions. We first observe that a very limited number of labeled examples (0.01-0.5% of the data set) is sufficient to perform model selection on state-of-the-art unsupervised models. Yet, if one has access to labels for supervised model selection, this raises the natural question of whether they should also be incorporated into the training process. As a case-study, we test the benefit of introducing (very limited) supervision into existing state-of-the-art unsupervised disentanglement methods exploiting both the values of the labels and the ordinal information that can be deduced from them. Overall, we empirically validate that with very little and potentially imprecise supervision it is possible to reliably learn disentangled representations.

neural network, representation, survey article, (18 more...)

arXiv.org Machine Learning

1905.01258

Country:

North America > United States (0.14)
Europe (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

High-Fidelity Image Generation With Fewer Labels

Lucic, Mario, Tschannen, Michael, Ritter, Marvin, Zhai, Xiaohua, Bachem, Olivier, Gelly, Sylvain

arXiv.org Machine LearningMar-6-2019

Deep generative models are becoming a cornerstone of modern machine learning. Recent work on conditional generative adversarial networks has shown that learning complex, high-dimensional distributions over natural images is within reach. While the latest models are able to generate high-fidelity, diverse natural images at high resolution, they rely on a vast quantity of labeled data. In this work we demonstrate how one can benefit from recent work on self- and semi-supervised learning to outperform state-of-the-art (SOTA) on both unsupervised ImageNet synthesis, as well as in the conditional setting. In particular, the proposed approach is able to match the sample quality (as measured by FID) of the current state-of-the art conditional model BigGAN on ImageNet using only 10% of the labels and outperform it using 20% of the labels.

deep learning, kernel, neural network, (19 more...)

arXiv.org Machine Learning

1903.02271

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Deep Generative Models for Distribution-Preserving Lossy Compression

Tschannen, Michael, Agustsson, Eirikur, Lucic, Mario

Neural Information Processing SystemsDec-31-2018

We propose and study the problem of distribution-preserving lossy compression. Motivated by recent advances in extreme image compression which allow to maintain artifact-free reconstructions even at very low bitrates, we propose to optimize the rate-distortion tradeoff under the constraint that the reconstructed samples follow the distribution of the training data. The resulting compression system recovers both ends of the spectrum: On one hand, at zero bitrate it learns a generative model of the data, and at high enough bitrates it achieves perfect reconstruction. Furthermore, for intermediate bitrates it smoothly interpolates between learning a generative model of the training data and perfectly reconstructing the training samples. We study several methods to approximately solve the proposed optimization problem, including a novel combination of Wasserstein GAN and Wasserstein Autoencoder, and present an extensive theoretical and empirical characterization of the proposed compression systems.

deep learning, neural network, wasserstein, (21 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.41)

Add feedback

Deep Generative Models for Distribution-Preserving Lossy Compression

Tschannen, Michael, Agustsson, Eirikur, Lucic, Mario

Neural Information Processing SystemsDec-31-2018

deep learning, neural network, wasserstein, (21 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.41)

Add feedback

Recent Advances in Autoencoder-Based Representation Learning

Tschannen, Michael, Bachem, Olivier, Lucic, Mario

arXiv.org Machine LearningDec-12-2018

Learning useful representations with little or no supervision is a key challenge in artificial intelligence. We provide an in-depth review of recent advances in representation learning with a focus on autoencoder-based models. To organize these results we make use of meta-priors believed useful for downstream tasks, such as disentanglement and hierarchical organization of features. In particular, we uncover three main mechanisms to enforce such properties, namely (i) regularizing the (approximate or aggregate) posterior distribution, (ii) factorizing the encoding and decoding distribution, or (iii) introducing a structured prior distribution. While there are some promising results, implicit or explicit supervision remains a key enabler and all current methods use strong inductive biases and modeling assumptions. Finally, we provide an analysis of autoencoder-based representation learning through the lens of rate-distortion theory and identify a clear tradeoff between the amount of prior knowledge available about the downstream tasks, and how useful the representation is for this task.

deep learning, neural network, representation, (21 more...)

arXiv.org Machine Learning

1812.05069

Country: North America > Canada (0.14)

Genre:

Overview (0.88)
Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Deep Generative Models for Distribution-Preserving Lossy Compression

Tschannen, Michael, Agustsson, Eirikur, Lucic, Mario

arXiv.org Machine LearningMay-28-2018

We propose and study the problem of distribution-preserving lossy compression. Motivated by the recent advances in extreme image compression which allow to maintain artifact-free reconstructions even at very low bitrates, we propose to optimize the rate-distortion tradeoff under the constraint that the reconstructed samples follow the distribution of the training data. Such a compression system recovers both ends of the spectrum: On one hand, at zero bitrate it learns a generative model of the data, and at high enough bitrates it achieves perfect reconstruction. Furthermore, for intermediate bitrates it smoothly interpolates between matching the distribution of the training data and perfectly reconstructing the training samples. We study several methods to approximately solve the proposed optimization problem, including a novel combination of Wasserstein GAN and Wasserstein Autoencoder, and present strong theoretical and empirical results for the proposed compression system.

deep learning, neural network, reconstruction, (22 more...)

arXiv.org Machine Learning

1805.11057

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.40)

Add feedback

Born Again Neural Networks

Furlanello, Tommaso, Lipton, Zachary C., Tschannen, Michael, Itti, Laurent, Anandkumar, Anima

arXiv.org Artificial IntelligenceMay-12-2018

Knowledge distillation (KD) consists of transferring knowledge from one machine learning model (the teacher}) to another (the student). Commonly, the teacher is a high-capacity model with formidable performance, while the student is more compact. By transferring knowledge, one hopes to benefit from the student's compactness. %we desire a compact model with performance close to the teacher's. We study KD from a new perspective: rather than compressing models, we train students parameterized identically to their teachers. Surprisingly, these {Born-Again Networks (BANs), outperform their teachers significantly, both on computer vision and language modeling tasks. Our experiments with BANs based on DenseNets demonstrate state-of-the-art performance on the CIFAR-10 (3.5%) and CIFAR-100 (15.5%) datasets, by validation error. Additional experiments explore two distillation objectives: (i) Confidence-Weighted by Teacher Max (CWTM) and (ii) Dark Knowledge with Permuted Predictions (DKPP). Both methods elucidate the essential components of KD, demonstrating a role of the teacher outputs on both predicted and non-predicted classes. We present experiments with students of various capacities, focusing on the under-explored case where students overpower teachers. Our experiments show significant advantages from transferring knowledge between DenseNets and ResNets in either direction.

deep learning, neural network, student, (17 more...)

arXiv.org Artificial Intelligence

1805.0477

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback