AITopics | Bartan, Burak

Collaborating Authors

Bartan, Burak

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Randomized Polar Codes for Anytime Distributed Machine Learning

Bartan, Burak, Pilanci, Mert

arXiv.org Artificial IntelligenceSep-1-2023

We present a novel distributed computing framework that is robust to slow compute nodes, and is capable of both approximate and exact computation of linear operations. The proposed mechanism integrates the concepts of randomized sketching and polar codes in the context of coded computation. We propose a sequential decoding algorithm designed to handle real valued data while maintaining low computational complexity for recovery. Additionally, we provide an anytime estimator that can generate provably accurate estimates even when the set of available node outputs is not decodable. We demonstrate the potential applications of this framework in various contexts, such as large-scale matrix multiplication and black-box optimization. We present the implementation of these methods on a serverless cloud computing system and provide numerical results to demonstrate their scalability in practice, including ImageNet scale computations.

anytime, machine learning, randomized polar code

arXiv.org Artificial Intelligence

2309.00682

Genre: Research Report (0.40)

Industry:

Transportation > Marine (0.60)
Shipbuilding (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Moccasin: Efficient Tensor Rematerialization for Neural Networks

Bartan, Burak, Li, Haoming, Teague, Harris, Lott, Christopher, Dilkina, Bistra

arXiv.org Artificial IntelligenceMay-30-2023

The deployment and training of neural networks on edge computing devices pose many challenges. The low memory nature of edge devices is often one of the biggest limiting factors encountered in the deployment of large neural network models. Tensor rematerialization or recompute is a way to address high memory requirements for neural network training and inference. In this paper we consider the problem of execution time minimization of compute graphs subject to a memory budget. In particular, we develop a new constraint programming formulation called \textsc{Moccasin} with only $O(n)$ integer variables, where $n$ is the number of nodes in the compute graph. This is a significant improvement over the works in the recent literature that propose formulations with $O(n^2)$ Boolean variables. We present numerical studies that show that our approach is up to an order of magnitude faster than recent work especially for large-scale graphs.

artificial intelligence, graph, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2304.14463

Country:

North America > United States > Hawaii (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Hidden Convexity of Wasserstein GANs: Interpretable Generative Models with Closed-Form Solutions

Sahiner, Arda, Ergen, Tolga, Ozturkler, Batu, Bartan, Burak, Pauly, John, Mardani, Morteza, Pilanci, Mert

arXiv.org Machine LearningJul-12-2021

Generative Adversarial Networks (GANs) are commonly used for modeling complex distributions of data. Both the generators and discriminators of GANs are often modeled by neural networks, posing a non-transparent optimization problem which is non-convex and non-concave over the generator and discriminator, respectively. Such networks are often heuristically optimized with gradient descent-ascent (GDA), but it is unclear whether the optimization problem contains any saddle points, or whether heuristic methods can find them in practice. In this work, we analyze the training of Wasserstein GANs with two-layer neural network discriminators through the lens of convex duality, and for a variety of generators expose the conditions under which Wasserstein GANs can be solved exactly with convex optimization approaches, or can be represented as convex-concave games. Using this convex duality interpretation, we further demonstrate the impact of different activation functions of the discriminator. Our observations are verified with numerical results demonstrating the power of the convex interpretation, with applications in progressive training of convex architectures corresponding to linear generators and quadratic-activation discriminators for CelebA image generation.

discriminator, neural network, optimization problem, (16 more...)

arXiv.org Machine Learning

2107.0568

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Training Quantized Neural Networks to Global Optimality via Semidefinite Programming

Bartan, Burak, Pilanci, Mert

arXiv.org Machine LearningMay-5-2021

In this paper we focus on training quantized neural networks for efficient machine learning models. We consider the combinatorial and non-convex optimization of minimizing empirical error with respect to quantized weights. We focus on polynomial activation functions, where the training problem is still non-trivial to solve. Recent work has shown that two-layer neural networks with ReLU [35, 36] and leaky ReLU activations [25] can be trained via convex optimization in polynomial time with respect to the number of samples and neurons. Moreover, degree-two polynomial activations can be trained to global optimality in polynomial time with respect to all problem dimensions using semidefinite programming [7]. In this work, we take a similar convex duality approach that involves semidefinite programming. However, our method and theoretical analysis are substantially different since we consider quantized weights, which involves a discrete non-convex optimization problem. The fact that the first layer weights are discrete renders it a combinatorial NP-hard problem and thus we cannot hope to obtain a similar result as in [7] or [35].

deep learning, neural network, sdp, (15 more...)

arXiv.org Machine Learning

2105.0142

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Neural Spectrahedra and Semidefinite Lifts: Global Convex Optimization of Polynomial Activation Neural Networks in Fully Polynomial-Time

Bartan, Burak, Pilanci, Mert

arXiv.org Machine LearningJan-7-2021

The training of two-layer neural networks with nonlinear activation functions is an important non-convex optimization problem with numerous applications and promising performance in layerwise deep learning. In this paper, we develop exact convex optimization formulations for two-layer neural networks with second degree polynomial activations based on semidefinite programming. Remarkably, we show that semidefinite lifting is always exact and therefore computational complexity for global optimization is polynomial in the input dimension and sample size for all input data. The developed convex formulations are proven to achieve the same global optimal solution set as their non-convex counterparts. More specifically, the globally optimal two-layer neural network with polynomial activations can be found by solving a semidefinite program (SDP) and decomposing the solution using a procedure we call Neural Decomposition. Moreover, the choice of regularizers plays a crucial role in the computational tractability of neural network training. We show that the standard weight decay regularization formulation is NP-hard, whereas other simple convex penalties render the problem tractable in polynomial time via convex programming. We extend the results beyond the fully connected architecture to different neural network architectures including networks with vector outputs and convolutional architectures with pooling. We provide extensive numerical simulations showing that the standard backpropagation approach often fails to achieve the global optimum of the training loss. The proposed approach is significantly faster to obtain better test accuracy compared to the standard backpropagation procedure.

convex sdp, deep learning, neural network, (20 more...)

arXiv.org Machine Learning

2101.02429

Country:

Europe (0.67)
North America > United States (0.46)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Debiasing Distributed Second Order Optimization with Surrogate Sketching and Scaled Regularization

Dereziński, Michał, Bartan, Burak, Pilanci, Mert, Mahoney, Michael W.

arXiv.org Machine LearningJul-2-2020

In distributed second order optimization, a standard strategy is to average many local estimates, each of which is based on a small sketch or batch of the data. However, the local estimates on each machine are typically biased, relative to the full solution on all of the data, and this can limit the effectiveness of averaging. Here, we introduce a new technique for debiasing the local estimates, which leads to both theoretical and empirical improvements in the convergence rate of distributed second order methods. Our technique has two novel components: (1) modifying standard sketching techniques to obtain what we call a surrogate sketch; and (2) carefully scaling the global regularization parameter for local computations. Our surrogate sketches are based on determinantal point processes, a family of distributions for which the bias of an estimate of the inverse Hessian can be computed exactly. Based on this computation, we show that when the objective being minimized is $l_2$-regularized with parameter $\lambda$ and individual machines are each given a sketch of size $m$, then to eliminate the bias, local estimates should be computed using a shrunk regularization parameter given by $\lambda^{\prime}=\lambda\cdot(1-\frac{d_{\lambda}}{m})$, where $d_{\lambda}$ is the $\lambda$-effective dimension of the Hessian (or, for quadratic problems, the data matrix).

artificial intelligence, optimization problem, sketch, (17 more...)

arXiv.org Machine Learning

2007.01327

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

Convex Relaxations of Convolutional Neural Nets

Bartan, Burak, Pilanci, Mert

arXiv.org Machine LearningDec-31-2018

ABSTRACT We propose convex relaxations for convolutional neural nets with one hidden layer where the output weights are fixed. For convex activation functions such as rectified linear units, the relaxations are convex second order cone programs which can be solved very efficiently. We prove that the relaxation recovers the global minimum under a planted model assumption, given sufficiently many training samples from a Gaussian distribution. We also identify a phase transition phenomenon in recovering the global minimum for the relaxation. Index Terms -- Convolutional neural networks, convex relaxations, linear programming, deep learning 1. INTRODUCTION Convolutional neural networks (CNNs) have been extremely successful across many domains in machine learning and computer vision [14, 15].

deep learning, neural network, relaxation, (16 more...)

arXiv.org Machine Learning

1901.00035

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.76)

Add feedback