AITopics | Simon-Gabriel, Carl-Johann

Collaborating Authors

Simon-Gabriel, Carl-Johann

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Robust NAS under adversarial training: benchmark, theory, and beyond

Wu, Yongtao, Liu, Fanghui, Simon-Gabriel, Carl-Johann, Chrysos, Grigorios G, Cevher, Volkan

arXiv.org Machine LearningMar-19-2024

Recent developments in neural architecture search (NAS) emphasize the significance of considering robust architectures against malicious data. However, there is a notable absence of benchmark evaluations and theoretical guarantees for searching these robust architectures, especially when adversarial training is considered. In this work, we aim to address these two challenges, making twofold contributions. First, we release a comprehensive data set that encompasses both clean accuracy and robust accuracy for a vast array of adversarially trained networks from the NAS-Bench-201 search space on image datasets. Then, leveraging the neural tangent kernel (NTK) tool from deep learning theory, we establish a generalization theory for searching architecture in terms of clean accuracy and robust accuracy under multi-objective adversarial training. We firmly believe that our benchmark and theoretical insights will significantly benefit the NAS community through reliable reproducibility, efficient assessment, and theoretical foundation, particularly in the pursuit of robust architectures.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Machine Learning

2403.13134

Country: North America > United States > Wisconsin (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Targeted Separation and Convergence with Kernel Discrepancies

Barp, Alessandro, Simon-Gabriel, Carl-Johann, Girolami, Mark, Mackey, Lester

arXiv.org Machine LearningDec-6-2023

Maximum mean discrepancies (MMDs) like the kernel Stein discrepancy (KSD) have grown central to a wide range of applications, including hypothesis testing, sampler selection, distribution approximation, and variational inference. In each setting, these kernel-based discrepancy measures are required to (i) separate a target P from other probability measures or even (ii) control weak convergence to P. In this article we derive new sufficient and necessary conditions to ensure (i) and (ii). For MMDs on separable metric spaces, we characterize those kernels that separate Bochner embeddable measures and introduce simple conditions for separating all measures with unbounded kernels and for controlling convergence with bounded kernels. We use these results on $\mathbb{R}^d$ to substantially broaden the known conditions for KSD separation and convergence control and to develop the first KSDs known to exactly metrize weak convergence to P. Along the way, we highlight the implications of our results for hypothesis testing, measuring and improving sample quality, and sampling with Stein variational gradient descent.

artificial intelligence, machine learning, targeted separation and convergence, (1 more...)

arXiv.org Machine Learning

2209.12835

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.53)

Add feedback

Bridging the Gap to Real-World Object-Centric Learning

Seitzer, Maximilian, Horn, Max, Zadaianchuk, Andrii, Zietlow, Dominik, Xiao, Tianjun, Simon-Gabriel, Carl-Johann, He, Tong, Zhang, Zheng, Schölkopf, Bernhard, Brox, Thomas, Locatello, Francesco

arXiv.org Artificial IntelligenceMar-6-2023

Humans naturally decompose their environment into entities at the appropriate level of abstraction to act in the world. Allowing machine learning algorithms to derive this decomposition in an unsupervised way has become an important line of research. However, current methods are restricted to simulated data or require additional information in the form of motion or depth in order to successfully discover objects. In this work, we overcome this limitation by showing that reconstructing features from models trained in a self-supervised manner is a sufficient training signal for object-centric representations to arise in a fully unsupervised way. Our approach, DINOSAUR, significantly out-performs existing image-based object-centric learning models on simulated data and is the first unsupervised object-centric model that scales to real-world datasets such as COCO and PASCAL VOC. DINOSAUR is conceptually simple and shows competitive performance compared to more involved pipelines from the computer vision literature.

decoder, machine learning, object-oriented architecture, (19 more...)

arXiv.org Artificial Intelligence

2209.1486

Country: Europe > Germany (0.27)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

PopSkipJump: Decision-Based Attack for Probabilistic Classifiers

Simon-Gabriel, Carl-Johann, Sheikh, Noman Ahmed, Krause, Andreas

arXiv.org Machine LearningJun-14-2021

Most current classifiers are vulnerable to adversarial examples, small input perturbations that change the classification output. Many existing attack algorithms cover various settings, from white-box to black-box classifiers, but typically assume that the answers are deterministic and often fail when they are not. We therefore propose a new adversarial decision-based attack specifically designed for classifiers with probabilistic outputs. It is based on the HopSkipJump attack by Chen et al. (2019, arXiv:1904.02144v5 ), a strong and query efficient decision-based attack originally designed for deterministic classifiers. Our P(robabilisticH)opSkipJump attack adapts its amount of queries to maintain HopSkipJump's original output quality across various noise levels, while converging to its query efficiency as the noise level decreases. We test our attack on various noise models, including state-of-the-art off-the-shelf randomized defenses, and show that they offer almost no extra robustness to decision-based attacks. Code is available at https://github.com/cjsg/PopSkipJump .

deep learning, neural network, query, (17 more...)

arXiv.org Machine Learning

2106.07445

Genre: Research Report > New Finding (0.67)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Metrizing Weak Convergence with Maximum Mean Discrepancies

Simon-Gabriel, Carl-Johann, Barp, Alessandro, Mackey, Lester

arXiv.org Machine LearningJun-16-2020

Theorem 12 of Simon-Gabriel & Sch\"olkopf (JMLR, 2018) seemed to close a 40-year-old quest to characterize maximum mean discrepancies (MMD) that metrize the weak convergence of probability measures. We prove, however, that the theorem is incorrect and provide a correction. We show that, on a locally compact, non-compact, Hausdorff space, the MMD of a bounded continuous Borel measurable kernel k, whose RKHS-functions vanish at infinity, metrizes the weak convergence of probability measures if and only if k is continuous and integrally strictly positive definite (ISPD) over all signed, finite, regular Borel measures. We also show that, contrary to the claim of the aforementioned Theorem 12, there exist both bounded continuous ISPD kernels that do not metrize weak convergence and bounded continuous non-ISPD kernels that do metrize it.

artificial intelligence, convergence, machine learning, (15 more...)

arXiv.org Machine Learning

2006.09268

Country:

North America > United States (0.14)
Europe (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Adversarial Vulnerability of Neural Networks Increases With Input Dimension

Simon-Gabriel, Carl-Johann, Ollivier, Yann, Schölkopf, Bernhard, Bottou, Léon, Lopez-Paz, David

arXiv.org Machine LearningFeb-5-2018

Over the past four years, neural networks have proven vulnerable to adversarial images: targeted but imperceptible image perturbations lead to drastically different predictions. We show that adversarial vulnerability increases with the gradients of the training objective when seen as a function of the inputs. For most current network architectures, we prove that the $\ell_1$-norm of these gradients grows as the square root of the input-size. These nets therefore become increasingly vulnerable with growing image size. Over the course of our analysis we rediscover and generalize double-backpropagation, a technique that penalizes large gradients in the loss surface to reduce adversarial vulnerability and increase generalization performance. We show that this regularization-scheme is equivalent at first order to training with adversarial noise. Finally, we demonstrate that replacing strided by average-pooling layers decreases adversarial vulnerability. Our proofs rely on the network's weight-distribution at initialization, but extensive experiments confirm their conclusions after training.

adversarial vulnerability, artificial intelligence, neural network, (16 more...)

arXiv.org Machine Learning

1802.01421

Country:

North America > United States (0.14)
Europe > Germany (0.14)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

AdaGAN: Boosting Generative Models

Tolstikhin, Ilya, Gelly, Sylvain, Bousquet, Olivier, Simon-Gabriel, Carl-Johann, Schölkopf, Bernhard

arXiv.org Machine LearningMay-24-2017

Generative Adversarial Networks (GAN) (Goodfellow et al., 2014) are an effective method for training generative models of complex data such as natural images. However, they are notoriously hard to train and can suffer from the problem of missing modes where the model is not able to produce examples in certain regions of the space. We propose an iterative procedure, called AdaGAN, where at every step we add a new component into a mixture model by running a GAN algorithm on a reweighted sample. This is inspired by boosting algorithms, where many potentially weak individual predictors are greedily aggregated to form a strong composite predictor. We prove that such an incremental procedure leads to convergence to the true distribution in a finite number of steps if each step is optimal, and convergence at an exponential rate otherwise. We also illustrate experimentally that this procedure addresses the problem of missing modes.

algorithm, artificial intelligence, neural network, (20 more...)

arXiv.org Machine Learning

1701.02386

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.87)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.55)

Add feedback

From optimal transport to generative modeling: the VEGAN cookbook

Bousquet, Olivier, Gelly, Sylvain, Tolstikhin, Ilya, Simon-Gabriel, Carl-Johann, Schoelkopf, Bernhard

arXiv.org Machine LearningMay-22-2017

We study unsupervised generative modeling in terms of the optimal transport (OT) problem between true (but unknown) data distribution $P_X$ and the latent variable model distribution $P_G$. We show that the OT problem can be equivalently written in terms of probabilistic encoders, which are constrained to match the posterior and prior distributions over the latent space. When relaxed, this constrained optimization problem leads to a penalized optimal transport (POT) objective, which can be efficiently minimized using stochastic gradient descent by sampling from $P_X$ and $P_G$. We show that POT for the 2-Wasserstein distance coincides with the objective heuristically employed in adversarial auto-encoders (AAE) (Makhzani et al., 2016), which provides the first theoretical justification for AAEs known to the authors. We also compare POT to other popular techniques like variational auto-encoders (VAE) (Kingma and Welling, 2014). Our theoretical results include (a) a better understanding of the commonly observed blurriness of images generated by VAEs, and (b) establishing duality between Wasserstein GAN (Arjovsky and Bottou, 2017) and POT for the 1-Wasserstein distance.

generative modeling, neural network, optimization problem, (20 more...)

arXiv.org Machine Learning

1705.07642

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Consistent Kernel Mean Estimation for Functions of Random Variables

Simon-Gabriel, Carl-Johann, Scibior, Adam, Tolstikhin, Ilya O., Schölkopf, Bernhard

Neural Information Processing SystemsDec-31-2016

We provide a theoretical foundation for non-parametric estimation of functions of random variables using kernel mean embeddings. We show that for any continuous function f, consistent estimators of the mean embedding of a random variable X lead to consistent estimators of the mean embedding of f(X). For Matern kernels and sufficiently smooth functions we also provide rates of convergence. Our results extend to functions of multiple random variables. If the variables are dependent, we require an estimator of the mean embedding of their joint distribution as a starting point; if they are independent, it is sufficient to have separate estimators of the mean embeddings of their marginal distributions. In either case, our results cover both mean embeddings based on i.i.d. samples as well as "reduced set" expansions in terms of dependent expansion points. The latter serves as a justification for using such expansions to limit memory resources when applying the approach as a basis for probabilistic programming.

artificial intelligence, estimator, machine learning, (16 more...)

Neural Information Processing Systems

Country: Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Genre: Research Report > New Finding (0.55)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

Consistent Kernel Mean Estimation for Functions of Random Variables

Simon-Gabriel, Carl-Johann, Ścibior, Adam, Tolstikhin, Ilya, Schölkopf, Bernhard

arXiv.org Machine LearningOct-19-2016

We provide a theoretical foundation for non-parametric estimation of functions of random variables using kernel mean embeddings. We show that for any continuous function $f$, consistent estimators of the mean embedding of a random variable $X$ lead to consistent estimators of the mean embedding of $f(X)$. For Mat\'ern kernels and sufficiently smooth functions we also provide rates of convergence. Our results extend to functions of multiple random variables. If the variables are dependent, we require an estimator of the mean embedding of their joint distribution as a starting point; if they are independent, it is sufficient to have separate estimators of the mean embeddings of their marginal distributions. In either case, our results cover both mean embeddings based on i.i.d. samples as well as "reduced set" expansions in terms of dependent expansion points. The latter serves as a justification for using such expansions to limit memory resources when applying the approach as a basis for probabilistic programming.

artificial intelligence, machine learning, random variable, (19 more...)

arXiv.org Machine Learning

1610.0595

Country: Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Genre: Research Report > New Finding (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback