AITopics | Yurtsever, Alp

Collaborating Authors

Yurtsever, Alp

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Provable Reduction in Communication Rounds for Non-Smooth Convex Federated Learning

Palenzuela, Karlo, Dadras, Ali, Yurtsever, Alp, Löfstedt, Tommy

arXiv.org Artificial IntelligenceMar-27-2025

Multiple local steps are key to communication-efficient federated learning. However, theoretical guarantees for such algorithms, without data heterogeneity-bounding assumptions, have been lacking in general non-smooth convex problems. A typical FL algorithm consists of two main phases: local training and aggregation. Scaffold (Karimireddy et al., 2020) and Scaffnew (Mishchenko et al., 2022) stand out as notable We explore the following natural question in this work: Can multiple local steps provably reduce communication rounds in the non-smooth convex setting? Authors made an equal contribution to this work.

artificial intelligence, machine learning, proceedings, (13 more...)

arXiv.org Artificial Intelligence

2503.21627

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Revisiting Frank-Wolfe for Structured Nonconvex Optimization

Maskan, Hoomaan, Hou, Yikun, Sra, Suvrit, Yurtsever, Alp

arXiv.org Artificial IntelligenceMar-11-2025

We introduce a new projection-free (Frank-Wolfe) method for optimizing structured nonconvex functions that are expressed as a difference of two convex functions. This problem class subsumes smooth nonconvex minimization, positioning our method as a promising alternative to the classical Frank-Wolfe algorithm. DC decompositions are not unique; by carefully selecting a decomposition, we can better exploit the problem structure, improve computational efficiency, and adapt to the underlying problem geometry to find better local solutions. We prove that the proposed method achieves a first-order stationary point in $O(1/\epsilon^2)$ iterations, matching the complexity of the standard Frank-Wolfe algorithm for smooth nonconvex minimization in general. Specific decompositions can, for instance, yield a gradient-efficient variant that requires only $O(1/\epsilon)$ calls to the gradient oracle. Finally, we present numerical experiments demonstrating the effectiveness of the proposed method compared to the standard Frank-Wolfe algorithm.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2503.08921

Country:

Europe > Sweden (0.14)
Europe > Germany (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Implicit Bias in Matrix Factorization and its Explicit Realization in a New Architecture

Hou, Yikun, Sra, Suvrit, Yurtsever, Alp

arXiv.org Machine LearningJan-27-2025

Gradient descent for matrix factorization is known to exhibit an implicit bias toward approximately low-rank solutions. While existing theories often assume the boundedness of iterates, empirically the bias persists even with unbounded sequences. We thus hypothesize that implicit bias is driven by divergent dynamics markedly different from the convergent dynamics for data fitting. Using this perspective, we introduce a new factorization model: $X\approx UDV^\top$, where $U$ and $V$ are constrained within norm balls, while $D$ is a diagonal factor allowing the model to span the entire search space. Our experiments reveal that this model exhibits a strong implicit bias regardless of initialization and step size, yielding truly (rather than approximately) low-rank solutions. Furthermore, drawing parallels between matrix factorization and neural networks, we propose a novel neural network model featuring constrained layers and diagonal components. This model achieves strong performance across various regression and classification tasks while finding low-rank solutions, resulting in efficient and lightweight networks.

artificial intelligence, machine learning, udv, (17 more...)

arXiv.org Machine Learning

2501.16322

Country:

Europe > Sweden (0.14)
Europe > Germany (0.14)

Genre: Research Report > New Finding (0.92)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Convex Formulations for Training Two-Layer ReLU Neural Networks

Prakhya, Karthik, Birdal, Tolga, Yurtsever, Alp

arXiv.org Artificial IntelligenceOct-29-2024

Solving non-convex, NP-hard optimization problems is crucial for training machine learning models, including neural networks. However, non-convexity often leads to black-box machine learning models with unclear inner workings. While convex formulations have been used for verifying neural network robustness, their application to training neural networks remains less explored. In response to this challenge, we reformulate the problem of training infinite-width two-layer ReLU networks as a convex completely positive program in a finite-dimensional (lifted) space. Despite the convexity, solving this problem remains NP-hard due to the complete positivity constraint. To overcome this challenge, we introduce a semidefinite relaxation that can be solved in polynomial time. We then experimentally evaluate the tightness of this relaxation, demonstrating its competitive performance in test accuracy across a range of classification tasks.

artificial intelligence, machine learning, neural network, (16 more...)

arXiv.org Artificial Intelligence

2410.22311

Country:

Europe > Sweden (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

A Variational Perspective on High-Resolution ODEs

Maskan, Hoomaan, Zygalakis, Konstantinos C., Yurtsever, Alp

arXiv.org Artificial IntelligenceNov-3-2023

We consider unconstrained minimization of smooth convex functions. We propose a novel variational perspective using forced Euler-Lagrange equation that allows for studying high-resolution ODEs. Through this, we obtain a faster convergence rate for gradient norm minimization using Nesterov's accelerated gradient method. Additionally, we show that Nesterov's method can be interpreted as a rate-matching discretization of an appropriately chosen high-resolution ODE. Finally, using the results from the new variational perspective, we propose a stochastic method for noisy gradients. Several numerical experiments compare and illustrate our stochastic algorithm with state of the art methods.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2311.02002

Country:

North America > United States (0.14)
North America > Canada (0.14)
Asia > South Korea (0.14)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Stochastic Three-Composite Convex Minimization

Yurtsever, Alp, Vu, Bang Cong, Cevher, Volkan

Neural Information Processing SystemsFeb-14-2020, 15:56:06 GMT

We propose a stochastic optimization method for the minimization of the sum of three convex functions, one of which has Lipschitz continuous gradient as well as restricted strong convexity. Our approach is most suitable in the setting where it is computationally advantageous to process smooth term in the decomposition with its stochastic gradient estimate and the other two functions separately with their proximal operators, such as doubly regularized empirical risk minimization problems. We prove the convergence characterization of the proposed algorithm in expectation under the standard assumptions for the stochastic gradient estimate of the smooth term. Our method operates in the primal space and can be considered as a stochastic extension of the three-operator splitting method. Finally, numerical evidence supports the effectiveness of our method in real-world problems.

artificial intelligence, machine learning, stochastic three-composite convex minimization, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Stochastic Conditional Gradient Method for Composite Convex Minimization

Locatello, Francesco, Yurtsever, Alp, Fercoq, Olivier, Cevher, Volkan

arXiv.org Machine LearningJan-29-2019

In this paper, we propose the first practical algorithm to minimize stochastic composite optimization problems over compact convex sets. This template allows for affine constraints and therefore covers stochastic semidefinite programs (SDPs), which are vastly applicable in both machine learning and statistics. In this setup, stochastic algorithms with convergence guarantees are either not known or not tractable. We tackle this general problem and propose a convergent, easy to implement and tractable algorithm. We prove $\mathcal{O}(k^{-1/3})$ convergence rate in expectation on the objective residual and $\mathcal{O}(k^{-5/12})$ in expectation on the feasibility gap. These rates are achieved without increasing the batchsize, which can contain a single sample. We present extensive empirical evidence demonstrating the superiority of our algorithm on a broad range of applications including optimization of stochastic SDPs.

algorithm 1, artificial intelligence, optimization problem, (16 more...)

arXiv.org Machine Learning

1901.10348

Country:

North America > United States (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Online Adaptive Methods, Universality and Acceleration

Levy, Yehuda Kfir, Yurtsever, Alp, Cevher, Volkan

Neural Information Processing SystemsDec-31-2018

We present a novel method for convex unconstrained optimization that, without any modifications ensures: (1) accelerated convergence rate for smooth objectives, (2) standard convergence rate in the general (non-smooth) setting, and (3) standard convergence rate in the stochastic optimization setting. To the best of our knowledge, this is the first method that simultaneously applies to all of the above settings. At the heart of our method is an adaptive learning rate rule that employs importance weights, in the spirit of adaptive online learning algorithms [duchi2011adaptive,levy2017online], combined with an update that linearly couples two sequences, in the spirit of [AllenOrecchia2017]. An empirical examination of our method demonstrates its applicability to the above mentioned scenarios and corroborates our theoretical findings.

artificial intelligence, machine learning, optimization, (16 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland (0.14)
North America > United States (0.14)
North America > Canada (0.14)

Genre: Research Report (0.66)

Industry: Education (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)

Add feedback

Online Adaptive Methods, Universality and Acceleration

Levy, Yehuda Kfir, Yurtsever, Alp, Cevher, Volkan

Neural Information Processing SystemsDec-31-2018

computer based training, educational technology, optimization, (20 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland (0.14)
North America > United States (0.14)
North America > Canada (0.14)

Genre: Research Report (0.66)

Industry: Education (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)

Add feedback

Online Adaptive Methods, Universality and Acceleration

Levy, Kfir Y., Yurtsever, Alp, Cevher, Volkan

arXiv.org Machine LearningSep-8-2018

We present a novel method for convex unconstrained optimization that, without any modifications, ensures: (i) accelerated convergence rate for smooth objectives, (ii) standard convergence rate in the general (non-smooth) setting, and (iii) standard convergence rate in the stochastic optimization setting. To the best of our knowledge, this is the first method that simultaneously applies to all of the above settings. At the heart of our method is an adaptive learning rate rule that employs importance weights, in the spirit of adaptive online learning algorithms (Duchi et al., 2011; Levy, 2017), combined with an update that linearly couples two sequences, in the spirit of (Allen-Zhu and Orecchia, 2017). An empirical examination of our method demonstrates its applicability to the above mentioned scenarios and corroborates our theoretical findings.

computer based training, educational technology, lemma, (22 more...)

arXiv.org Machine Learning

1809.02864

Country: Europe (0.28)

Genre: Research Report (1.00)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback