AITopics | Evron, Itay

Collaborating Authors

Evron, Itay

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Better Rates for Random Task Orderings in Continual Linear Models

Evron, Itay, Levinstein, Ran, Schliserman, Matan, Sherman, Uri, Koren, Tomer, Soudry, Daniel, Srebro, Nathan

arXiv.org Machine LearningApr-6-2025

We study the common continual learning setup where an overparameterized model is sequentially fitted to a set of jointly realizable tasks. We analyze the forgetting, i.e., loss on previously seen tasks, after $k$ iterations. For linear models, we prove that fitting a task is equivalent to a single stochastic gradient descent (SGD) step on a modified objective. We develop novel last-iterate SGD upper bounds in the realizable least squares setup, and apply them to derive new results for continual learning. Focusing on random orderings over $T$ tasks, we establish universal forgetting rates, whereas existing rates depend on the problem dimensionality or complexity. Specifically, in continual regression with replacement, we improve the best existing rate from $O((d-r)/k)$ to $O(\min(k^{-1/4}, \sqrt{d-r}/k, \sqrt{Tr}/k))$, where $d$ is the dimensionality and $r$ the average task rank. Furthermore, we establish the first rates for random task orderings without replacement. The obtained rate of $O(\min(T^{-1/4}, (d-r)/T))$ proves for the first time that randomization alone, with no task repetition, can prevent catastrophic forgetting in sufficiently long task sequences. Finally, we prove a similar $O(k^{-1/4})$ universal rate for the forgetting in continual linear classification on separable data. Our universal rates apply for broader projection methods, such as block Kaczmarz and POCS, illuminating their loss convergence under i.i.d and one-pass orderings.

artificial intelligence, machine learning, nullnull, (16 more...)

arXiv.org Machine Learning

2504.04579

Country:

Europe (0.28)
Asia (0.28)
North America > United States (0.27)

Genre: Research Report > New Finding (0.87)

Industry: Education (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

Provable Tempered Overfitting of Minimal Nets and Typical Nets

Harel, Itamar, Hoza, William M., Vardi, Gal, Evron, Itay, Srebro, Nathan, Soudry, Daniel

arXiv.org Machine LearningOct-24-2024

We study the overfitting behavior of fully connected deep Neural Networks (NNs) with binary weights fitted to perfectly classify a noisy training set. We consider interpolation using both the smallest NN (having the minimal number of weights) and a random interpolating NN. For both learning rules, we prove overfitting is tempered. Our analysis rests on a new bound on the size of a threshold circuit consistent with a partial function. To the best of our knowledge, ours are the first theoretical results on benign or tempered overfitting that: (1) apply to deep NNs, and (2) do not require a very high or very low input dimension.

artificial intelligence, machine learning, threshold network, (18 more...)

arXiv.org Machine Learning

2410.19092

Country:

North America > United States (0.28)
Europe (0.28)

Genre:

Research Report (0.64)
Workflow (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

The Joint Effect of Task Similarity and Overparameterization on Catastrophic Forgetting -- An Analytical Model

Goldfarb, Daniel, Evron, Itay, Weinberger, Nir, Soudry, Daniel, Hand, Paul

arXiv.org Artificial IntelligenceJan-24-2024

In continual learning, catastrophic forgetting is affected by multiple aspects of the tasks. Previous works have analyzed separately how forgetting is affected by either task similarity or overparameterization. In contrast, our paper examines how task similarity and overparameterization jointly affect forgetting in an analyzable model. Specifically, we focus on two-task continual linear regression, where the second task is a random orthogonal transformation of an arbitrary first task (an abstraction of random permutation tasks). We derive an exact analytical expression for the expected forgetting - and uncover a nuanced pattern. In highly overparameterized models, intermediate task similarity causes the most forgetting. However, near the interpolation threshold, forgetting decreases monotonically with the expected task similarity. We validate our findings with linear regression on synthetic data, and with neural networks on established permutation task benchmarks.

artificial intelligence, conference paper, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2401.12617

Country: Europe (0.27)

Genre: Research Report > New Finding (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.54)

Add feedback

Continual Learning in Linear Classification on Separable Data

Evron, Itay, Moroshko, Edward, Buzaglo, Gon, Khriesh, Maroun, Marjieh, Badea, Srebro, Nathan, Soudry, Daniel

arXiv.org Artificial IntelligenceJun-6-2023

We theoretically study the continual learning of a linear classification model on separable data with binary classes. We analyze continual learning on a sequence Even though this is a fundamental setup to consider, there of separable linear classification tasks with binary are still very few analytic results on it, since most of the labels. We show theoretically that learning continual learning theory thus far has focused on regression with weak regularization reduces to solving settings (e.g., Bennani et al. (2020); Doan et al. (2021); a sequential max-margin problem, corresponding Asanuma et al. (2021); Lee et al. (2021); Evron et al. (2022); to a special case of the Projection Onto Convex Goldfarb & Hand (2023); Li et al. (2023)).

artificial intelligence, linear classification, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2306.03534

Country:

North America > United States (0.67)
Asia (0.67)
Europe (0.45)

Genre: Research Report > New Finding (0.46)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)

Add feedback

The Role of Codeword-to-Class Assignments in Error-Correcting Codes: An Empirical Study

Evron, Itay, Onn, Ophir, Orzech, Tamar Weiss, Azeroual, Hai, Soudry, Daniel

arXiv.org Artificial IntelligenceFeb-10-2023

Error-correcting codes (ECC) are used to reduce multiclass classification tasks to multiple binary classification subproblems. In ECC, classes are represented by the rows of a binary matrix, corresponding to codewords in a codebook. Codebooks are commonly either predefined or problem dependent. Given predefined codebooks, codeword-to-class assignments are traditionally overlooked, and codewords are implicitly assigned to classes arbitrarily. Our paper shows that these assignments play a major role in the performance of ECC. Specifically, we examine similarity-preserving assignments, where similar codewords are assigned to similar classes. Addressing a controversy in existing literature, our extensive experiments confirm that similarity-preserving assignments induce easier subproblems and are superior to other assignment policies in terms of their generalization performance. We find that similarity-preserving assignments make predefined codebooks become problem-dependent, without altering other favorable codebook properties. Finally, we show that our findings can improve predefined codebooks dedicated to extreme classification.

artificial intelligence, assignment, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2302.05334

Country: Europe (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

How do infinite width bounded norm networks look in function space?

Savarese, Pedro, Evron, Itay, Soudry, Daniel, Srebro, Nathan

arXiv.org Machine LearningFeb-13-2019

We consider the question of what functions can be captured by ReLU networks with an unbounded number of units (infinite width), but where the overall network Euclidean norm (sum of squares of all weights in the system, except for an unregularized bias term for each unit) is bounded; or equivalently what is the minimal norm required to approximate a given function. For functions $f : \mathbb R \rightarrow \mathbb R$ and a single hidden layer, we show that the minimal network norm for representing $f$ is $\max(\int |f''(x)| dx, |f'(-\infty) + f'(+\infty)|)$, and hence the minimal norm fit for a sample is given by a linear spline interpolation.

artificial intelligence, neural network, relu network, (19 more...)

arXiv.org Machine Learning

1902.0504

Country:

North America > United States (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Efficient Loss-Based Decoding on Graphs for Extreme Classification

Evron, Itay, Moroshko, Edward, Crammer, Koby

Neural Information Processing SystemsDec-31-2018

In extreme classification problems, learning algorithms are required to map instances to labels from an extremely large label set. We build on a recent extreme classification framework with logarithmic time and space (LTLS), and on a general approach for error correcting output coding (ECOC) with loss-based decoding, and introduce a flexible and efficient approach accompanied by theoretical bounds. Our framework employs output codes induced by graphs, for which we show how to perform efficient loss-based decoding to potentially improve accuracy. In addition, our framework offers a tradeoff between accuracy, model size and prediction time. We show how to find the sweet spot of this tradeoff using only the training data. Our experimental study demonstrates the validity of our assumptions and claims, and shows that our method is competitive with state-of-the-art algorithms.

accuracy, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec (0.14)
Oceania > Australia > New South Wales > Sydney (0.14)
North America > United States > Virginia (0.14)

Genre:

Research Report > Experimental Study (0.66)
Research Report > New Finding (0.48)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Efficient Loss-Based Decoding on Graphs for Extreme Classification

Evron, Itay, Moroshko, Edward, Crammer, Koby

Neural Information Processing SystemsDec-31-2018

In extreme classification problems, learning algorithms are required to map instances tolabels from an extremely large label set. We build on a recent extreme classification framework with logarithmic time and space [19], and on a general approach for error correcting output coding (ECOC) with loss-based decoding [1], and introduce a flexible and efficient approach accompanied by theoretical bounds. Our framework employs output codes induced by graphs, for which we show how to perform efficient loss-based decoding to potentially improve accuracy. In addition, ourframework offers a tradeoff between accuracy, model size and prediction time. We show how to find the sweet spot of this tradeoff using only the training data. Our experimental study demonstrates the validity of our assumptions and claims, and shows that our method is competitive with state-of-the-art algorithms.

artificial intelligence, graph, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec (0.14)
Oceania > Australia > New South Wales > Sydney (0.14)
North America > United States > Virginia (0.14)

Genre:

Research Report > Experimental Study (0.66)
Research Report > New Finding (0.48)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Efficient Loss-Based Decoding On Graphs For Extreme Classification

Evron, Itay, Moroshko, Edward, Crammer, Koby

arXiv.org Machine LearningMar-8-2018

In extreme classification problems, learning algorithms are required to map instances to labels from an extremely large label set. We build on a recent extreme classification framework with logarithmic time and space, and on a general approach for error correcting output coding (ECOC), and introduce a flexible and efficient approach accompanied by bounds. Our framework employs output codes induced by graphs, and offers a tradeoff between accuracy and model size. We show how to find the sweet spot of this tradeoff using only the training data. Our experimental study demonstrates the validity of our assumptions and claims, and shows the superiority of our method compared with state-of-the-art algorithms.

artificial intelligence, graph, machine learning, (15 more...)

arXiv.org Machine Learning

1803.03319

Country:

Asia > Middle East > Israel (0.14)
North America > United States (0.14)

Genre:

Research Report > Experimental Study (0.66)
Research Report > New Finding (0.48)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback