AITopics | Vardi, Gal

Collaborating Authors

Vardi, Gal

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Theory of Learning with Autoregressive Chain of Thought

Joshi, Nirmit, Vardi, Gal, Block, Adam, Goel, Surbhi, Li, Zhiyuan, Misiakiewicz, Theodor, Srebro, Nathan

arXiv.org Machine LearningMar-10-2025

For a given base class of sequence-to-next-token generators, we consider learning prompt-to-answer mappings obtained by iterating a fixed, time-invariant generator for multiple steps, thus generating a chain-of-thought, and then taking the final token as the answer. We formalize the learning problems both when the chain-of-thought is observed and when training only on prompt-answer pairs, with the chain-of-thought latent. We analyze the sample and computational complexity both in terms of general properties of the base class (e.g. its VC dimension) and for specific base classes such as linear thresholds. We present a simple base class that allows for universal representability and computationally tractable chain-of-thought learning. Central to our development is that time invariance allows for sample complexity that is independent of the length of the chain-of-thought. Attention arises naturally in our construction.

artificial intelligence, machine learning, vcdim, (15 more...)

arXiv.org Machine Learning

2503.07932

Country: North America > United States (0.14)

Genre:

Research Report (0.63)
Workflow (0.46)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.88)

Add feedback

Trained Transformer Classifiers Generalize and Exhibit Benign Overfitting In-Context

Frei, Spencer, Vardi, Gal

arXiv.org Machine LearningDec-13-2024

Transformers have the capacity to act as supervised learning algorithms: by properly encoding a set of labeled training ("in-context") examples and an unlabeled test example into an input sequence of vectors of the same dimension, the forward pass of the transformer can produce predictions for that unlabeled test example. A line of recent work has shown that when linear transformers are pre-trained on random instances for linear regression tasks, these trained transformers make predictions using an algorithm similar to that of ordinary least squares. In this work, we investigate the behavior of linear transformers trained on random linear classification tasks. Via an analysis of the implicit regularization of gradient descent, we characterize how many pre-training tasks and in-context examples are needed for the trained transformer to generalize well at test-time. We further show that in some settings, these trained transformers can exhibit "benign overfitting in-context": when in-context examples are corrupted by label flipping noise, the transformer memorizes all of its in-context examples (including those with noisy labels) yet still generalizes near-optimally for clean test examples.

artificial intelligence, machine learning, transformer, (17 more...)

arXiv.org Machine Learning

2410.01774

Country:

North America (0.46)
Europe (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Flavors of Margin: Implicit Bias of Steepest Descent in Homogeneous Neural Networks

Tsilivis, Nikolaos, Vardi, Gal, Kempe, Julia

arXiv.org Machine LearningOct-29-2024

We study the implicit bias of the general family of steepest descent algorithms, which includes gradient descent, sign descent and coordinate descent, in deep homogeneous neural networks. We prove that an algorithm-dependent geometric margin starts increasing once the networks reach perfect training accuracy and characterize the late-stage bias of the algorithms. In particular, we define a generalized notion of stationarity for optimization problems and show that the algorithms progressively reduce a (generalized) Bregman divergence, which quantifies proximity to such stationary points of a margin-maximization problem. We then experimentally zoom into the trajectories of neural networks optimized with various steepest descent algorithms, highlighting connections to the implicit bias of Adam.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2410.22069

Country:

Europe (1.00)
North America > United States > California > Los Angeles County > Long Beach (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Provable Tempered Overfitting of Minimal Nets and Typical Nets

Harel, Itamar, Hoza, William M., Vardi, Gal, Evron, Itay, Srebro, Nathan, Soudry, Daniel

arXiv.org Machine LearningOct-24-2024

We study the overfitting behavior of fully connected deep Neural Networks (NNs) with binary weights fitted to perfectly classify a noisy training set. We consider interpolation using both the smallest NN (having the minimal number of weights) and a random interpolating NN. For both learning rules, we prove overfitting is tempered. Our analysis rests on a new bound on the size of a threshold circuit consistent with a partial function. To the best of our knowledge, ours are the first theoretical results on benign or tempered overfitting that: (1) apply to deep NNs, and (2) do not require a very high or very low input dimension.

artificial intelligence, machine learning, threshold network, (18 more...)

arXiv.org Machine Learning

2410.19092

Country:

North America > United States (0.28)
Europe (0.28)

Genre:

Research Report (0.64)
Workflow (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Benign Overfitting in Single-Head Attention

Magen, Roey, Shang, Shuning, Xu, Zhiwei, Frei, Spencer, Hu, Wei, Vardi, Gal

arXiv.org Machine LearningOct-10-2024

The phenomenon of benign overfitting, where a trained neural network perfectly fits noisy training data but still achieves near-optimal test performance, has been extensively studied in recent years for linear models and fully-connected/convolutional networks. In this work, we study benign overfitting in a single-head softmax attention model, which is the fundamental building block of Transformers. We prove that under appropriate conditions, the model exhibits benign overfitting in a classification setting already after two steps of gradient descent. Moreover, we show conditions where a minimum-norm/maximum-margin interpolator exhibits benign overfitting. We study how the overfitting behavior depends on the signal-to-noise ratio (SNR) of the data distribution, namely, the ratio between norms of signal and noise tokens, and prove that a sufficiently large SNR is both necessary and sufficient for benign overfitting.

artificial intelligence, inequality, machine learning, (18 more...)

arXiv.org Machine Learning

2410.07746

Country: North America > United States (0.27)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.87)

Add feedback

Provable Privacy Attacks on Trained Shallow Neural Networks

Smorodinsky, Guy, Vardi, Gal, Safran, Itay

arXiv.org Artificial IntelligenceOct-10-2024

We study what provable privacy attacks can be shown on trained, 2-layer ReLU neural networks. We explore two types of attacks; data reconstruction attacks, and membership inference attacks. We prove that theoretical results on the implicit bias of 2-layer neural networks can be used to provably reconstruct a set of which at least a constant fraction are training points in a univariate setting, and can also be used to identify with high probability whether a given point was used in the training set in a high dimensional setting. To the best of our knowledge, our work is the first to show provable vulnerabilities in this setting.

artificial intelligence, machine learning, provable privacy attack, (1 more...)

arXiv.org Artificial Intelligence

2410.07632

Genre: Research Report (0.69)

Industry: Information Technology > Security & Privacy (0.89)

Technology:

Information Technology > Security & Privacy (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.80)

Add feedback

Adversarial Examples Exist in Two-Layer ReLU Networks for Low Dimensional Linear Subspaces

Melamed, Odelia, Yehudai, Gilad, Vardi, Gal

arXiv.org Machine LearningNov-16-2023

Despite a great deal of research, it is still not well-understood why trained neural networks are highly vulnerable to adversarial examples. In this work we focus on two-layer neural networks trained using data which lie on a low dimensional linear subspace. We show that standard gradient methods lead to non-robust neural networks, namely, networks which have large gradients in directions orthogonal to the data subspace, and are susceptible to small adversarial $L_2$-perturbations in these directions. Moreover, we show that decreasing the initialization scale of the training algorithm, or adding $L_2$ regularization, can make the trained network more robust to adversarial perturbations orthogonal to the data.

artificial intelligence, machine learning, perturbation, (20 more...)

arXiv.org Machine Learning

2303.00783

Country: Asia > Middle East > Israel (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Deconstructing Data Reconstruction: Multiclass, Weight Decay and General Losses

Buzaglo, Gon, Haim, Niv, Yehudai, Gilad, Vardi, Gal, Oz, Yakir, Nikankin, Yaniv, Irani, Michal

arXiv.org Artificial IntelligenceNov-2-2023

Memorization of training data is an active research area, yet our understanding of the inner workings of neural networks is still in its infancy. Recently, Haim et al. [2022] proposed a scheme to reconstruct training samples from multilayer perceptron binary classifiers, effectively demonstrating that a large portion of training samples are encoded in the parameters of such networks. In this work, we extend their findings in several directions, including reconstruction from multiclass and convolutional neural networks. We derive a more general reconstruction scheme which is applicable to a wider range of loss functions such as regression losses. Moreover, we study the various factors that contribute to networks' susceptibility to such reconstruction schemes. Intriguingly, we observe that using weight decay during training increases reconstructability both in terms of quantity and quality. Additionally, we examine the influence of the number of neurons relative to the number of training samples on the reconstructability.

artificial intelligence, machine learning, reconstruction, (17 more...)

arXiv.org Artificial Intelligence

2307.01827

Country:

North America (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

Add feedback

The Double-Edged Sword of Implicit Bias: Generalization vs. Robustness in ReLU Networks

Frei, Spencer, Vardi, Gal, Bartlett, Peter L., Srebro, Nathan

arXiv.org Machine LearningOct-31-2023

In this work, we study the implications of the implicit bias of gradient flow on generalization and adversarial robustness in ReLU networks. We focus on a setting where the data consists of clusters and the correlations between cluster means are small, and show that in two-layer ReLU networks gradient flow is biased towards solutions that generalize well, but are highly vulnerable to adversarial examples. Our results hold even in cases where the network has many more parameters than training examples. Despite the potential for harmful overfitting in such overparameterized settings, we prove that the implicit bias of gradient flow prevents it. However, the implicit bias also leads to non-robust solutions (susceptible to small adversarial $\ell_2$-perturbations), even though robust networks that fit the data exist.

artificial intelligence, double-edged sword, inductive learning, (4 more...)

arXiv.org Machine Learning

2303.01456

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.53)

Add feedback

Most Neural Networks Are Almost Learnable

Daniely, Amit, Srebro, Nathan, Vardi, Gal

arXiv.org Machine LearningOct-24-2023

One of the greatest mysteries surrounding deep learning is the discrepancy between its phenomenal capabilities in practice and the fact that despite a great deal of research, polynomial-time algorithms for learning deep models are known only for very restrictive cases. Indeed, state of the art results are only capable of dealing with two-layer networks under assumptions on the input distribution and the network's weights. Furthermore, theoretical study shows that even with very naive architectures, learning neural networks is worst-case computationally intractable. In this paper, we contrast the aforementioned theoretical state of affairs, and show that, perhaps surprisingly, even though constant-depth networks are completely out of reach from a worst-case perspective, most of them are not as hard as one would imagine. That is, they are distribution-free learnable in polynomial time up to any desired constant accuracy. This is the first polynomial-time approximation scheme (PTAS) for learning neural networks of depth greater than 2 (see the related work section for more details). Moreover, we show that the standard SGD algorithm on a ReLU network can be used as a PTAS for learning random networks. The question of whether learning random networks can be done efficiently was posed by Daniely et al. [15], and our work provides a positive result in that respect.

artificial intelligence, machine learning, neural network, (18 more...)

arXiv.org Machine Learning

2305.16508

Country:

Europe (0.14)
Asia > Middle East (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback