AITopics | Mitliagkas, Ioannis

Collaborating Authors

Mitliagkas, Ioannis

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games

Sokota, Samuel, D'Orazio, Ryan, Kolter, J. Zico, Loizou, Nicolas, Lanctot, Marc, Mitliagkas, Ioannis, Brown, Noam, Kroer, Christian

arXiv.org Artificial IntelligenceApr-11-2023

This work studies an algorithm, which we call magnetic mirror descent, that is inspired by mirror descent and the non-Euclidean proximal gradient algorithm. Our contribution is demonstrating the virtues of magnetic mirror descent as both an equilibrium solver and as an approach to reinforcement learning in two-player zero-sum games. These virtues include: 1) Being the first quantal response equilibria solver to achieve linear convergence for extensive-form games with first order feedback; 2) Being the first standard reinforcement learning algorithm to achieve empirically competitive results with CFR in tabular settings; 3) Achieving favorable performance in 3x3 Dark Hex and Phantom Tic-Tac-Toe as a self-play deep reinforcement learning algorithm.

machine learning, mmd, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2206.05825

Country:

Europe (0.67)
North America > United States (0.46)
North America > Canada (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Games (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Neural Networks Efficiently Learn Low-Dimensional Representations with SGD

Mousavi-Hosseini, Alireza, Park, Sejun, Girotti, Manuela, Mitliagkas, Ioannis, Erdogdu, Murat A.

arXiv.org Artificial IntelligenceMar-15-2023

We study the problem of training a two-layer neural network (NN) of arbitrary width using stochastic gradient descent (SGD) where the input $\boldsymbol{x}\in \mathbb{R}^d$ is Gaussian and the target $y \in \mathbb{R}$ follows a multiple-index model, i.e., $y=g(\langle\boldsymbol{u_1},\boldsymbol{x}\rangle,...,\langle\boldsymbol{u_k},\boldsymbol{x}\rangle)$ with a noisy link function $g$. We prove that the first-layer weights of the NN converge to the $k$-dimensional principal subspace spanned by the vectors $\boldsymbol{u_1},...,\boldsymbol{u_k}$ of the true model, when online SGD with weight decay is used for training. This phenomenon has several important consequences when $k \ll d$. First, by employing uniform convergence on this smaller subspace, we establish a generalization error bound of $O(\sqrt{{kd}/{T}})$ after $T$ iterations of SGD, which is independent of the width of the NN. We further demonstrate that, SGD-trained ReLU NNs can learn a single-index target of the form $y=f(\langle\boldsymbol{u},\boldsymbol{x}\rangle) + \epsilon$ by recovering the principal direction, with a sample complexity linear in $d$ (up to log factors), where $f$ is a monotonic function with at most polynomial growth, and $\epsilon$ is the noise. This is in contrast to the known $d^{\Omega(p)}$ sample requirement to learn any degree $p$ polynomial in the kernel regime, and it shows that NNs trained with SGD can outperform the neural tangent kernel at initialization. Finally, we also provide compressibility guarantees for NNs using the approximate low-rank structure produced by SGD.

artificial intelligence, machine learning, probability, (14 more...)

arXiv.org Artificial Intelligence

2209.14863

Country: North America > Canada (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)

Add feedback

Optimal transport meets noisy label robust loss and MixUp regularization for domain adaptation

Fatras, Kilian, Naganuma, Hiroki, Mitliagkas, Ioannis

arXiv.org Machine LearningJun-22-2022

It is common in computer vision to be confronted with domain shift: images which have the same class but different acquisition conditions. In domain adaptation (DA), one wants to classify unlabeled target images using source labeled images. Unfortunately, deep neural networks trained on a source training set perform poorly on target images which do not belong to the training domain. One strategy to improve these performances is to align the source and target image distributions in an embedded space using optimal transport (OT). However OT can cause negative transfer, i.e. aligning samples with different labels, which leads to overfitting especially in the presence of label shift between domains. In this work, we mitigate negative alignment by explaining it as a noisy label assignment to target images. We then mitigate its effect by appropriate regularization. We propose to couple the MixUp regularization \citep{zhang2018mixup} with a loss that is robust to noisy labels in order to improve domain adaptation performance. We show in an extensive ablation study that a combination of the two techniques is critical to achieve improved performance. Finally, we evaluate our method, called \textsc{mixunbot}, on several benchmarks and real-world DA problems.

artificial intelligence, domain adaptation, machine learning, (19 more...)

arXiv.org Machine Learning

2206.1118

Country:

Europe (0.67)
North America > Canada > Quebec > Montreal (0.28)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Convergence Analysis and Implicit Regularization of Feedback Alignment for Deep Linear Networks

Girotti, Manuela, Mitliagkas, Ioannis, Gidel, Gauthier

arXiv.org Machine LearningOct-20-2021

We theoretically analyze the Feedback Alignment (FA) algorithm, an efficient alternative to backpropagation for training neural networks. We provide convergence guarantees with rates for deep linear networks for both continuous and discrete dynamics. Additionally, we study incremental learning phenomena for shallow linear networks. Interestingly, certain specific initializations imply that negligible components are learned before the principal ones, thus potentially negatively affecting the effectiveness of such a learning algorithm; a phenomenon we classify as implicit anti-regularization. We also provide initialization schemes where the components of the problem are approximately learned by decreasing order of importance, thus providing a form of implicit regularization.

artificial intelligence, machine learning, neural network, (17 more...)

arXiv.org Machine Learning

2110.10815

Country: North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity

Loizou, Nicolas, Berard, Hugo, Gidel, Gauthier, Mitliagkas, Ioannis, Lacoste-Julien, Simon

arXiv.org Machine LearningJun-30-2021

Two of the most prominent algorithms for solving unconstrained smooth games are the classical stochastic gradient descent-ascent (SGDA) and the recently introduced stochastic consensus optimization (SCO) (Mescheder et al., 2017). SGDA is known to converge to a stationary point for specific classes of games, but current convergence analyses require a bounded variance assumption. SCO is used successfully for solving large-scale adversarial problems, but its convergence guarantees are limited to its deterministic variant. In this work, we introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO under this condition for solving a class of stochastic variational inequality problems that are potentially non-monotone. We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size, and we propose insightful stepsize-switching rules to guarantee convergence to the exact solution. In addition, our convergence guarantees hold under the arbitrary sampling paradigm, and as such, we give insights into the complexity of minibatching.

artificial intelligence, convergence, machine learning, (16 more...)

arXiv.org Machine Learning

2107.00052

Country:

North America > Canada (0.14)
Asia (0.14)

Genre:

Research Report (0.64)
Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization

Ahuja, Kartik, Caballero, Ethan, Zhang, Dinghuai, Bengio, Yoshua, Mitliagkas, Ioannis, Rish, Irina

arXiv.org Machine LearningJun-11-2021

The invariance principle from causality is at the heart of notable approaches such as invariant risk minimization (IRM) that seek to address out-of-distribution (OOD) generalization failures. Despite the promising theory, invariance principle-based approaches fail in common classification tasks, where invariant (causal) features capture all the information about the label. Are these failures due to the methods failing to capture the invariance? Or is the invariance principle itself insufficient? To answer these questions, we revisit the fundamental assumptions in linear regression tasks, where invariance-based approaches were shown to provably generalize OOD. In contrast to the linear regression tasks, we show that for linear classification tasks we need much stronger restrictions on the distribution shifts, or otherwise OOD generalization is impossible. Furthermore, even with appropriate restrictions on distribution shifts in place, we show that the invariance principle alone is insufficient. We prove that a form of the information bottleneck constraint along with invariance helps address key failures when invariant features capture all the information about the label and also retains the existing success when they do not. We propose an approach that incorporates both of these principles and demonstrate its effectiveness in several experiments.

deep learning, invariant feature, neural network, (21 more...)

arXiv.org Machine Learning

2106.06607

Country:

North America > Canada > Quebec (0.14)
North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Gotta Go Fast When Generating Data with Score-Based Models

Jolicoeur-Martineau, Alexia, Li, Ke, Piché-Taillefer, Rémi, Kachman, Tal, Mitliagkas, Ioannis

arXiv.org Machine LearningMay-28-2021

Score-based (denoising diffusion) generative models have recently gained a lot of success in generating realistic and diverse data. These approaches define a forward diffusion process for transforming data to noise and generate data by reversing it (thereby going from noise to data). Unfortunately, current score-based models generate data very slowly due to the sheer number of score network evaluations required by numerical SDE solvers. In this work, we aim to accelerate this process by devising a more efficient SDE solver. Existing approaches rely on the Euler-Maruyama (EM) solver, which uses a fixed step size. We found that naively replacing it with other SDE solvers fares poorly - they either result in low-quality samples or become slower than EM. To get around this issue, we carefully devise an SDE solver with adaptive step sizes tailored to score-based generative models piece by piece. Our solver requires only two score function evaluations, rarely rejects samples, and leads to high-quality samples. Our approach generates data 2 to 10 times faster than EM while achieving better or equal sample quality. For high-resolution images, our method leads to significantly higher quality samples than all other methods tested. Our SDE solver has the benefit of requiring no step size tuning.

artificial intelligence, dynamic-step extrapolation, neural network, (16 more...)

arXiv.org Machine Learning

2105.1408

Country: North America > Canada > Quebec (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Vision (0.89)

Add feedback

A Study of Condition Numbers for First-Order Optimization

Guille-Escuret, Charles, Goujaud, Baptiste, Girotti, Manuela, Mitliagkas, Ioannis

arXiv.org Artificial IntelligenceDec-25-2020

The study of first-order optimization algorithms (FOA) typically starts with assumptions on the objective functions, most commonly smoothness and strong convexity. These metrics are used to tune the hyperparameters of FOA. We introduce a class of perturbations quantified via a new norm, called *-norm. We show that adding a small perturbation to the objective function has an equivalently small impact on the behavior of any FOA, which suggests that it should have a minor impact on the tuning of the algorithm. However, we show that smoothness and strong convexity can be heavily impacted by arbitrarily small perturbations, leading to excessively conservative tunings and convergence issues. In view of these observations, we propose a notion of continuity of the metrics, which is essential for a robust tuning strategy. Since smoothness and strong convexity are not continuous, we propose a comprehensive study of existing alternative metrics which we prove to be continuous. We describe their mutual relations and provide their guaranteed convergence rates for the Gradient Descent algorithm accordingly tuned. Finally we discuss how our work impacts the theoretical understanding of FOA and their performances.

artificial intelligence, convexity, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2012.05782

Country: North America > Canada (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

In Search of Robust Measures of Generalization

Dziugaite, Gintare Karolina, Drouin, Alexandre, Neal, Brady, Rajkumar, Nitarshan, Caballero, Ethan, Wang, Linbo, Mitliagkas, Ioannis, Roy, Daniel M.

arXiv.org Machine LearningOct-22-2020

One of the principal scientific challenges in deep learning is explaining generalization, i.e., why the particular way the community now trains networks to achieve small training error also leads to small error on held-out data from the same population. It is widely appreciated that some worst-case theories -- such as those based on the VC dimension of the class of predictors induced by modern neural network architectures -- are unable to explain empirical performance. A large volume of work aims to close this gap, primarily by developing bounds on generalization error, optimization error, and excess risk. When evaluated empirically, however, most of these bounds are numerically vacuous. Focusing on generalization bounds, this work addresses the question of how to evaluate such bounds empirically. Jiang et al. (2020) recently described a large-scale empirical study aimed at uncovering potential causal relationships between bounds/measures and generalization. Building on their study, we highlight where their proposed methods can obscure failures and successes of generalization measures in explaining generalization. We argue that generalization measures should instead be evaluated within the framework of distributional robustness.

deep learning, generalization, neural network, (21 more...)

arXiv.org Machine Learning

2010.11924

Country: North America > Canada > Ontario (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Adversarial score matching and improved sampling for image generation

Jolicoeur-Martineau, Alexia, Piché-Taillefer, Rémi, Combes, Rémi Tachet des, Mitliagkas, Ioannis

arXiv.org Machine LearningOct-10-2020

Denoising Score Matching with Annealed Langevin Sampling (DSM-ALS) has recently found success in generative modeling. The approach works by first training a neural network to estimate the score of a distribution, and then using Langevin dynamics to sample from the data distribution assumed by the score network. Despite the convincing visual quality of samples, this method appears to perform worse than Generative Adversarial Networks (GANs) under the Fr\'echet Inception Distance, a standard metric for generative models. We show that this apparent gap vanishes when denoising the final Langevin samples using the score network. In addition, we propose two improvements to DSM-ALS: 1) Consistent Annealed Sampling as a more stable alternative to Annealed Langevin Sampling, and 2) a hybrid training formulation, composed of both Denoising Score Matching and adversarial objectives. By combining these two techniques and exploring different network architectures, we elevate score matching methods and obtain results competitive with state-of-the-art image generation on CIFAR-10.

artificial intelligence, neural network, song and ermon, (15 more...)

arXiv.org Machine Learning

2009.05475

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback