AITopics | Cevher, Volkan

Collaborating Authors

Cevher, Volkan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Benign Overfitting in Deep Neural Networks under Lazy Training

Zhu, Zhenyu, Liu, Fanghui, Chrysos, Grigorios G, Locatello, Francesco, Cevher, Volkan

arXiv.org Artificial IntelligenceMay-30-2023

This paper focuses on over-parameterized deep neural networks (DNNs) with ReLU activation functions and proves that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification while obtaining (nearly) zero-training error under the lazy training regime. For this purpose, we unify three interrelated concepts of overparameterization, benign overfitting, and the Lipschitz constant of DNNs. Our results indicate that interpolating with smoother functions leads to better generalization. Furthermore, we investigate the special case where interpolating smooth ground-truth functions is performed by DNNs under the Neural Tangent Kernel (NTK) regime for generalization. Our result demonstrates that the generalization error converges to a constant order that only depends on label noise and initialization noise, which theoretically verifies benign overfitting. Our analysis provides a tight lower bound on the normalized margin under non-smooth activation functions, as well as the minimum eigenvalue of NTK under high-dimensional settings, which has its own interest in learning theory.

artificial intelligence, machine learning, neural network, (15 more...)

arXiv.org Artificial Intelligence

2305.19377

Country: North America > United States > Hawaii (0.14)

Genre: Research Report > New Finding (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

DiGress: Discrete Denoising diffusion for graph generation

Vignac, Clement, Krawczuk, Igor, Siraudin, Antoine, Wang, Bohan, Cevher, Volkan, Frossard, Pascal

arXiv.org Artificial IntelligenceMay-23-2023

This work introduces DiGress, a discrete denoising diffusion model for generating graphs with categorical node and edge attributes. A graph transformer network is trained to revert this process, simplifying the problem of distribution learning over graphs into a sequence of node and edge classification tasks. We further improve sample quality by introducing a Markovian noise model that preserves the marginal distribution of node and edge types during diffusion, and by incorporating auxiliary graph-theoretic features. A procedure for conditioning the generation on graph-level features is also proposed. DiGress achieves state-of-theart performance on molecular and non-molecular datasets, with up to 3x validity improvement on a planar graph dataset. It is also the first model to scale to the large GuacaMol dataset containing 1.3M drug-like molecules without the use of molecule-specific representations. At a high-level, these models are trained to denoise diffusion trajectories, and produce new samples by sampling noise and recursively denoising it. Diffusion models have been used successfully in a variety of settings, outperforming all other methods on image and video (Dhariwal & Nichol, 2021; Ho et al., 2022). These successes raise hope for building powerful models for graph generation, a task with diverse applications such as molecule design (Liu et al., 2018), traffic modeling (Yu & Gu, 2019), and code completion (Brockschmidt et al., 2019). However, generating graphs remains challenging due to their unordered nature and sparsity properties.

artificial intelligence, graph, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2209.14734

Country: Europe (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Regularization of polynomial networks for image recognition

Chrysos, Grigorios G, Wang, Bohan, Deng, Jiankang, Cevher, Volkan

arXiv.org Artificial IntelligenceMar-24-2023

Deep Neural Networks (DNNs) have obtained impressive performance across tasks, however they still remain as black boxes, e.g., hard to theoretically analyze. At the same time, Polynomial Networks (PNs) have emerged as an alternative method with a promising performance and improved interpretability but have yet to reach the performance of the powerful DNN baselines. In this work, we aim to close this performance gap. We introduce a class of PNs, which are able to reach the performance of ResNet across a range of six benchmarks. We demonstrate that strong regularization is critical and conduct an extensive study of the exact regularization schemes required to match performance. To further motivate the regularization schemes, we introduce D-PolyNets that achieve a higher-degree of expansion than previously proposed polynomial networks. D-PolyNets are more parameter-efficient while achieving a similar performance as other polynomial networks. We expect that our new models can lead to an understanding of the role of elementwise activation functions (which are no longer required for training PNs). The source code is available at https://github.com/grigorisg9gr/regularized_polynomials.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2303.13896

Country: Europe (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Transportation (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

No-Regret Learning in Games with Noisy Feedback: Faster Rates and Adaptivity via Learning Rate Separation

Hsieh, Yu-Guan, Antonakopoulos, Kimon, Cevher, Volkan, Mertikopoulos, Panayotis

arXiv.org Artificial IntelligenceMar-17-2023

We examine the problem of regret minimization when the learner is involved in a continuous game with other optimizing agents: in this case, if all players follow a no-regret algorithm, it is possible to achieve significantly lower regret relative to fully adversarial environments. We study this problem in the context of variationally stable games (a class of continuous games which includes all convex-concave and monotone games), and when the players only have access to noisy estimates of their individual payoff gradients. If the noise is additive, the game-theoretic and purely adversarial settings enjoy similar regret guarantees; however, if the noise is multiplicative, we show that the learners can, in fact, achieve constant regret. We achieve this faster rate via an optimistic gradient scheme with learning rate separation -- that is, the method's extrapolation and update steps are tuned to different schedules, depending on the noise profile. Subsequently, to eliminate the need for delicate hyperparameter tuning, we propose a fully adaptive method that attains nearly the same guarantees as its non-adapted counterpart, while operating without knowledge of either the game or of the noise profile.

artificial intelligence, inequality, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2206.06015

Country:

North America > United States (0.45)
Europe (0.28)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.45)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Escaping limit cycles: Global convergence for constrained nonconvex-nonconcave minimax problems

Pethick, Thomas, Latafat, Puya, Patrinos, Panagiotis, Fercoq, Olivier, Cevher, Volkan

arXiv.org Artificial IntelligenceFeb-20-2023

This paper introduces a new extragradient-type algorithm for a class of nonconvex-nonconcave minimax problems. It is well-known that finding a local solution for general minimax problems is computationally intractable. This observation has recently motivated the study of structures sufficient for convergence of first order methods in the more general setting of variational inequalities when the so-called weak Minty variational inequality (MVI) holds. This problem class captures non-trivial structures as we demonstrate with examples, for which a large family of existing algorithms provably converge to limit cycles. Our results require a less restrictive parameter range in the weak MVI compared to what is previously known, thus extending the applicability of our scheme. The proposed algorithm is applicable to constrained and regularized problems, and involves an adaptive stepsize allowing for potentially larger stepsizes. Our scheme also converges globally even in settings where the underlying operator exhibits limit cycles.

artificial intelligence, converge, limit cycle, (15 more...)

arXiv.org Artificial Intelligence

2302.09831

Country: Europe (0.28)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.92)

Add feedback

Revisiting adversarial training for the worst-performing class

Pethick, Thomas, Chrysos, Grigorios G., Cevher, Volkan

arXiv.org Artificial IntelligenceFeb-17-2023

Despite progress in adversarial training (AT), there is a substantial gap between the top-performing and worst-performing classes in many datasets. For example, on CIFAR10, the accuracies for the best and worst classes are 74% and 23%, respectively. We argue that this gap can be reduced by explicitly optimizing for the worst-performing class, resulting in a min-max-max optimization formulation. Our method, called class focused online learning (CFOL), includes high probability convergence guarantees for the worst class loss and can be easily integrated into existing training setups with minimal computational overhead. We demonstrate an improvement to 32% in the worst class accuracy on CIFAR10, and we observe consistent behavior across CIFAR100 and STL10. Our study highlights the importance of moving beyond average accuracy, which is particularly important in safety-critical applications.

accuracy, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2302.08872

Genre: Research Report (0.40)

Industry:

Information Technology (0.46)
Education > Educational Setting > Online (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Solving stochastic weak Minty variational inequalities without increasing batch size

Pethick, Thomas, Fercoq, Olivier, Latafat, Puya, Patrinos, Panagiotis, Cevher, Volkan

arXiv.org Artificial IntelligenceFeb-17-2023

This paper introduces a family of stochastic extragradient-type algorithms for a class of nonconvex-nonconcave problems characterized by the weak Minty vari-ational inequality (MVI). Unlike existing results on extragradient methods in the monotone setting, employing diminishing stepsizes is no longer possible in the weak MVI setting. This has led to approaches such as increasing batch sizes per iteration which can however be prohibitively expensive. In contrast, our proposed methods involves two stepsizes and only requires one additional oracle evaluation per iteration. We show that it is possible to keep one fixed stepsize while it is only the second stepsize that is taken to be diminishing, making it interesting even in the monotone setting. Almost sure convergence is established and we provide a unified analysis for this family of schemes which contains a nonlinear generalization of the celebrated primal dual hybrid gradient algorithm. Stochastic first-order methods have been at the core of the current success in deep learning applications. These methods are mostly well-understood for minimization problems at this point. This is even the case in the nonconvex setting where there exists matching upper and lower bounds on the complexity for finding an approximately stable point (Arjevani et al., 2019). The picture becomes less clear when moving beyond minimization into nonconvex-nonconcave min-imax problems--or more generally nonmonotone variational inequalities. Even in the deterministic case, finding a stationary point is in general intractable (Daskalakis et al., 2021; Hirsch & V avasis, 1987). This is in stark contrast with minimization where only global optimality is NP-hard. An interesting nonmonotone class for which we do have e fficient algorithms is characterized by the so called weak Minty variational inequality (MVI) (Diakonikolas et al., 2021). This problem class captures nontrivial structures such as attracting limit cycles and is governed by a parameter ρ whose negativity increases the degree of nonmonotonicity. In other words, it seems that we need to take γ large to guarantee convergence for a large class. This reliance on a large stepsize is at the core of why the community has struggled to provide a stochastic variants for weak MVIs. The only known results e ff ectively increase the batch size at every iteration (Diakonikolas et al., 2021, Thm. Pethick et al. (2022) proposed (SEG +) which attempts to tackle the noise by only diminishing the second stepsize.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2302.09029

Country: Europe (0.28)

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

Robustness in deep learning: The good (width), the bad (depth), and the ugly (initialization)

Zhu, Zhenyu, Liu, Fanghui, Chrysos, Grigorios G, Cevher, Volkan

arXiv.org Artificial IntelligenceFeb-9-2023

We study the average robustness notion in deep neural networks in (selected) wide and narrow, deep and shallow, as well as lazy and non-lazy training settings. We prove that in the under-parameterized setting, width has a negative effect while it improves robustness in the over-parameterized setting. The effect of depth closely depends on the initialization and the training mode. In particular, when initialized with LeCun initialization, depth helps robustness with the lazy training regime. In contrast, when initialized with Neural Tangent Kernel (NTK) and He-initialization, depth hurts the robustness. Moreover, under the non-lazy training regime, we demonstrate how the width of a two-layer ReLU network benefits robustness. Our theoretical developments improve the results by [Huang et al. NeurIPS21; Wu et al. NeurIPS21] and are consistent with [Bubeck and Sellke NeurIPS21; Bubeck et al. COLT21].

artificial intelligence, initialization, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2209.07263

Country: North America (0.28)

Genre: Research Report > New Finding (0.68)

Industry: Government (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Min-Max Optimization Made Simple: Approximating the Proximal Point Method via Contraction Maps

Cevher, Volkan, Piliouras, Georgios, Sim, Ryann, Skoulakis, Stratis

arXiv.org Artificial IntelligenceJan-16-2023

In this paper we present a first-order method that admits near-optimal convergence rates for convex/concave min-max problems while requiring a simple and intuitive analysis. Similarly to the seminal work of Nemirovski and the recent approach of Piliouras et al. in normal form games, our work is based on the fact that the update rule of the Proximal Point method (PP) can be approximated up to accuracy $\epsilon$ with only $O(\log 1/\epsilon)$ additional gradient-calls through the iterations of a contraction map. Then combining the analysis of (PP) method with an error-propagation analysis we establish that the resulting first order method, called Clairvoyant Extra Gradient, admits near-optimal time-average convergence for general domains and last-iterate convergence in the unconstrained case.

artificial intelligence, clairvoyant extra-gradient, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2301.03931

Country:

North America > United States (0.94)
Europe (0.68)

Genre: Research Report (0.64)

Industry: Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Extra-Newton: A First Approach to Noise-Adaptive Accelerated Second-Order Methods

Antonakopoulos, Kimon, Kavis, Ali, Cevher, Volkan

arXiv.org Artificial IntelligenceDec-12-2022

This work proposes a universal and adaptive second-order method for minimizing second-order smooth, convex functions. Our algorithm achieves $O(\sigma / \sqrt{T})$ convergence when the oracle feedback is stochastic with variance $\sigma^2$, and improves its convergence to $O( 1 / T^3)$ with deterministic oracles, where $T$ is the number of iterations. Our method also interpolates these rates without knowing the nature of the oracle apriori, which is enabled by a parameter-free adaptive step-size that is oblivious to the knowledge of smoothness modulus, variance bounds and the diameter of the constrained set. To our knowledge, this is the first universal algorithm with such global guarantees within the second-order optimization literature.

algorithm, artificial intelligence, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2211.01832

Country: Europe (0.28)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback