AITopics

2503.01544

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceMar-3-2025

Depth-Width tradeoffs in Algorithmic Reasoning of Graph Tasks with Transformers

Yehudai, Gilad, Sanford, Clayton, Bechler-Speicher, Maya, Fischer, Orr, Gilad-Bachrach, Ran, Globerson, Amir

Transformers have revolutionized the field of machine learning. In particular, they can be used to solve complex algorithmic problems, including graph-based tasks. In such algorithmic tasks a key question is what is the minimal size of a transformer that can implement a task. Recent work has begun to explore this problem for graph-based tasks, showing that for sub-linear embedding dimension (i.e., model width) logarithmic depth suffices. However, an open question, which we address here, is what happens if width is allowed to grow linearly. Here we analyze this setting, and provide the surprising result that with linear width, constant depth suffices for solving a host of graph-based problems. This suggests that a moderate increase in width can allow much shallower models, which are advantageous in terms of inference time. For other problems, we show that quadratic width is required. Our results demonstrate the complex and intriguing landscape of transformer implementations of graph-based algorithms. We support our theoretical results with empirical evaluations.

artificial intelligence, machine learning, natural language, (17 more...)

2503.01805

Country: Asia > China (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningFeb-16-2025

Logarithmic Width Suffices for Robust Memorization

Egosi, Amitsour, Yehudai, Gilad, Shamir, Ohad

The ability of neural networks to memorize labeled datasets is a central question in the study of their expressive power. Given some input domain X, output domain Y, and dataset size N, we say that a network memorizes datasets of size N, if for every labeled dataset D X Y, where |D| = N, we can find parameters such that the resulting network f: X Y perfectly fits the dataset (that is, f(x) = y for every labeled pair (x, y) D). The main question here - which has been studied in many recent works (see Section 2 for details) - is to characterize the size/architecture of the networks that have enough expressive power to memorize any dataset of a given size N. However, merely fitting a given dataset is not enough for most tasks, and a desirable property for trained networks is that they remain robust to noise and minor modifications in the dataset. This robustness property allows neural networks to generalize from observed data points to unseen data points. Furthermore, neural networks have been shown to be vulnerable to adversarial attacks [Szegedy et al., 2013, Carlini and Wagner, 2017, Papernot et al., 2017, Athalye et al., 2018] in the form of slightly perturbed examples, where (in the context of visual data) the perturbation is often imperceptible to the human eye. Moreover, existing constructions of memorizing networks are often quite delicate, and not at all robust to such perturbations. This motivates the question of characterizing the networks that have enough capacity to robustly memorize a dataset.

artificial intelligence, dataset, machine learning, (16 more...)

2502.11162

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.43)

arXiv.org Artificial IntelligenceNov-25-2024

On the Reconstruction of Training Data from Group Invariant Networks

Elbaz, Ran, Yehudai, Gilad, Galun, Meirav, Maron, Haggai

Reconstructing training data from trained neural networks is an active area of research with significant implications for privacy and explainability. Recent advances have demonstrated the feasibility of this process for several data types. However, reconstructing data from group-invariant neural networks poses distinct challenges that remain largely unexplored. This paper addresses this gap by first formulating the problem and discussing some of its basic properties. We then provide an experimental evaluation demonstrating that conventional reconstruction techniques are inadequate in this scenario. Specifically, we observe that the resulting data reconstructions gravitate toward symmetric inputs on which the group acts trivially, leading to poor-quality results. Finally, we propose two novel methods aiming to improve reconstruction in this setup and present promising preliminary experimental results. Our work sheds light on the complexities of reconstructing data from group invariant neural networks and offers potential avenues for future research in this domain.

artificial intelligence, machine learning, reconstruction, (14 more...)

2411.16458

Genre: Research Report > Promising Solution (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningJul-2-2024

MALT Powers Up Adversarial Attacks

Melamed, Odelia, Yehudai, Gilad, Shamir, Adi

Neural networks are widely known to be susceptible to adversarial perturbations (Szegedy et al. [2013]), which are typically imperceptible by humans. Many different papers have shown how to construct such attacks, where adding a small perturbation to the input significantly changes the output of the model (Carlini and Wagner [2017], Papernot et al. [2017], Athalye et al. [2018]). To protect from these attacks, researchers have tried to develop more robust models using several different techniques, such as adversarial training using different attacks (Madry et al. [2017], Papernot et al. [2016], Liu et al. [2023], Wang et al. [2023]). The current state of the art adversarial attack, known as AutoAttack (Croce and Hein [2020b]), combines several different parameter-free attacks, some targeted and some untargeted. AutoAttack currently leads the RobustBench benchmark (Croce et al. [2020]), which is the standard benchmark for adversarial robustness. Notably, the targeted attacks used in AutoAttack pick the adversarial target classes according to the model's confidence levels and attack the top nine classes, even though CIFAR-100 and ImageNet have many more possible target classes. The reason for attacking only a limited number of classes, rather than all possible classes, is computational, as each such attack has a significant running time. The reason that adversarial examples exist remains a hot topic of debate, specifically, whether it is due to the highly non-linear landscape of neural networks or rather to their local linearity properties.

adversarial example, artificial intelligence, machine learning, (17 more...)

2407.0224

Country:

Asia (0.14)
Europe > Slovakia (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceJan-15-2024

RedEx: Beyond Fixed Representation Methods via Convex Optimization

Daniely, Amit, Schain, Mariano, Yehudai, Gilad

Optimizing Neural networks is a difficult task which is still not well understood. On the other hand, fixed representation methods such as kernels and random features have provable optimization guarantees but inferior performance due to their inherent inability to learn the representations. In this paper, we aim at bridging this gap by presenting a novel architecture called RedEx (Reduced Expander Extractor) that is as expressive as neural networks and can also be trained in a layer-wise fashion via a convex program with semi-definite constraints and optimization guarantees. We also show that RedEx provably surpasses fixed representation methods, in the sense that it can efficiently learn a family of target functions which fixed representation methods cannot.

artificial intelligence, fixed representation method, machine learning, (2 more...)

2401.07606

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.44)

arXiv.org Machine LearningNov-23-2023

Locally Optimal Descent for Dynamic Stepsize Scheduling

Yehudai, Gilad, Cohen, Alon, Daniely, Amit, Drori, Yoel, Koren, Tomer, Schain, Mariano

Stochastic gradient-based optimization methods such as SGD and Adam (Kingma & Ba, 2014) are the main workhorse behind modern machine learning. Such methods sequentially apply stochastic gradient steps to update the trained model and their performance crucially depends on the choice of a learning rate sequence, or schedule, used throughout this process to determine the magnitude of the sequential updates. All in all, effectively tuning the learning rate schedule is widely considered a tedious task requiring extensive, sometimes prohibitive, hyper-parameter search, resulting in a significant excess of engineering time and compute resources usage in ML training. A prominent approach to address this issue gave rise to a plethora of adaptive optimization methods (most notably Duchi et al., 2011 and Kingma & Ba, 2014), where the learning rate parameter is automatically tuned during the optimization process based on previously received stochastic gradients. In some important applications these methods provide superior convergence performance, while their theoretical guarantees match the state-of-the-art in the stochastic convex and (smooth) non-convex optimization settings (Li & Orabona, 2019; Ward et al., 2020; Attia & Koren, 2023). However, despite the adaptivity incorporated into these methods, auxiliary learning rate schedules are often still required to actually attain their optimal performance (e.g., Loshchilov & Hutter, 2016), and the nuisance of laborious and extensive manual tuning still remain relevant for these methods as well.

artificial intelligence, machine learning, optimization problem, (18 more...)

2311.13877

Country: Asia > Middle East > Israel (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.78)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

arXiv.org Machine LearningNov-16-2023

Adversarial Examples Exist in Two-Layer ReLU Networks for Low Dimensional Linear Subspaces

Melamed, Odelia, Yehudai, Gilad, Vardi, Gal

Despite a great deal of research, it is still not well-understood why trained neural networks are highly vulnerable to adversarial examples. In this work we focus on two-layer neural networks trained using data which lie on a low dimensional linear subspace. We show that standard gradient methods lead to non-robust neural networks, namely, networks which have large gradients in directions orthogonal to the data subspace, and are susceptible to small adversarial $L_2$-perturbations in these directions. Moreover, we show that decreasing the initialization scale of the training algorithm, or adding $L_2$ regularization, can make the trained network more robust to adversarial perturbations orthogonal to the data.

artificial intelligence, machine learning, perturbation, (20 more...)

2303.00783

Country: Asia > Middle East > Israel (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Artificial IntelligenceNov-2-2023

Deconstructing Data Reconstruction: Multiclass, Weight Decay and General Losses

Buzaglo, Gon, Haim, Niv, Yehudai, Gilad, Vardi, Gal, Oz, Yakir, Nikankin, Yaniv, Irani, Michal

Memorization of training data is an active research area, yet our understanding of the inner workings of neural networks is still in its infancy. Recently, Haim et al. [2022] proposed a scheme to reconstruct training samples from multilayer perceptron binary classifiers, effectively demonstrating that a large portion of training samples are encoded in the parameters of such networks. In this work, we extend their findings in several directions, including reconstruction from multiclass and convolutional neural networks. We derive a more general reconstruction scheme which is applicable to a wider range of loss functions such as regression losses. Moreover, we study the various factors that contribute to networks' susceptibility to such reconstruction schemes. Intriguingly, we observe that using weight decay during training increases reconstructability both in terms of quantity and quality. Additionally, we examine the influence of the number of neurons relative to the number of training samples on the reconstructability.

artificial intelligence, machine learning, reconstruction, (17 more...)

2307.01827

Country:

North America (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

arXiv.org Machine LearningOct-25-2023

From Tempered to Benign Overfitting in ReLU Neural Networks

Kornowski, Guy, Yehudai, Gilad, Shamir, Ohad

Overparameterized neural networks (NNs) are observed to generalize well even when trained to perfectly fit noisy data. This phenomenon motivated a large body of work on "benign overfitting", where interpolating predictors achieve near-optimal performance. Recently, it was conjectured and empirically observed that the behavior of NNs is often better described as "tempered overfitting", where the performance is non-optimal yet also non-trivial, and degrades as a function of the noise level. However, a theoretical justification of this claim for non-linear NNs has been lacking so far. In this work, we provide several results that aim at bridging these complementing views. We study a simple classification setting with 2-layer ReLU NNs, and prove that under various assumptions, the type of overfitting transitions from tempered in the extreme case of one-dimensional data, to benign in high dimensions. Thus, we show that the input dimension has a crucial role on the type of overfitting in this setting, which we also validate empirically for intermediate dimensions. Overall, our results shed light on the intricate connections between the dimension, sample size, architecture and training algorithm on the one hand, and the type of resulting overfitting on the other hand.

artificial intelligence, assumption, machine learning, (18 more...)

2305.15141

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)