AITopics | Kanter, Ido

Collaborating Authors

Kanter, Ido

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Unified CNNs and transformers underlying learning mechanism reveals multi-head attention modus vivendi

Koresh, Ella, Gross, Ronit D., Meir, Yuval, Tzach, Yarden, Halevi, Tal, Kanter, Ido

arXiv.org Artificial IntelligenceJan-22-2025

Convolutional neural networks (CNNs) evaluate short-range correlations in input images which progress along the layers, whereas vision transformer (ViT) architectures evaluate long-range correlations, using repeated transformer encoders composed of fully connected layers. Both are designed to solve complex classification tasks but from different perspectives. This study demonstrates that CNNs and ViT architectures stem from a unified underlying learning mechanism, which quantitatively measures the single-nodal performance (SNP) of each node in feedforward (FF) and multi-head attention (MHA) subblocks. Each node identifies small clusters of possible output labels, with additional noise represented as labels outside these clusters. These features are progressively sharpened along the transformer encoders, enhancing the signal-to-noise ratio. This unified underlying learning mechanism leads to two main findings. First, it enables an efficient applied nodal diagonal connection (ANDC) pruning technique without affecting the accuracy. Second, based on the SNP, spontaneous symmetry breaking occurs among the MHA heads, such that each head focuses its attention on a subset of labels through cooperation among its SNPs. Consequently, each head becomes an expert in recognizing its designated labels, representing a quantitative MHA modus vivendi mechanism. These results are based on a compact convolutional transformer architecture trained on the CIFAR-100 and Flowers-102 datasets and call for their extension to other architectures and applications, such as natural language processing.

accuracy, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2501.129

Country: North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Advanced deep architecture pruning using single filter performance

Tzach, Yarden, Meir, Yuval, Gross, Ronit D., Tevet, Ofek, Koresh, Ella, Kanter, Ido

arXiv.org Artificial IntelligenceJan-22-2025

Pruning the parameters and structure of neural networks reduces the computational complexity, energy consumption, and latency during inference. Recently, a novel underlying mechanism for successful deep learning (DL) was presented based on a method that quantitatively measures the single filter performance in each layer of a DL architecture, and a new comprehensive mechanism of how deep learning works was presented. Herein, we demonstrate how this understanding paves the path to highly dilute the convolutional layers of deep architectures without affecting their overall accuracy using applied filter cluster connections (AFCC). AFCC is exemplified on VGG-11 and EfficientNet-B0 architectures trained on CIFAR-100, and its high pruning outperforms other techniques using the same pruning magnitude. Additionally, this technique is broadened to single nodal performance and highly pruning of fully connected layers, suggesting a possible implementation to considerably reduce the complexity of over-parameterized AI tasks.

artificial intelligence, machine learning, pruning, (17 more...)

arXiv.org Artificial Intelligence

2501.1288

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Role of Delay in Brain Dynamics

Meir, Yuval, Tevet, Ofek, Tzach, Yarden, Hodassman, Shiri, Kanter, Ido

arXiv.org Artificial IntelligenceOct-15-2024

Significant variations of delays among connecting neurons cause an inevitable disadvantage of asynchronous brain dynamics compared to synchronous deep learning. However, this study demonstrates that this disadvantage can be converted into a computational advantage using a network with a single output and M multiple delays between successive layers, thereby generating a polynomial time-series outputs with M. The proposed role of delay in brain dynamics (RoDiB) model, is capable of learning increasing number of classified labels using a fixed architecture, and overcomes the inflexibility of the brain to update the learning architecture using additional neurons and connections. Moreover, the achievable accuracies of the RoDiB system are comparable with those of its counterpart tunable single delay architectures with M outputs. Further, the accuracies are significantly enhanced when the number of output labels exceeds its fully connected input size. The results are mainly obtained using simulations of VGG-6 on CIFAR datasets and also include multiple label inputs. However, currently only a small fraction of the abundant number of RoDiB outputs is utilized, thereby suggesting its potential for advanced computational power yet to be discovered.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.physa.2024.130166

2410.11384

Genre: Research Report (0.50)

Industry: Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Statistical Mechanics of Learning via Reverberation in Bidirectional Associative Memories

Centonze, Martino Salomone, Kanter, Ido, Barra, Adriano

arXiv.org Machine LearningJul-17-2023

We study bi-directional associative neural networks that, exposed to noisy examples of an extensive number of random archetypes, learn the latter (with or without the presence of a teacher) when the supplied information is enough: in this setting, learning is heteroassociative -- involving couples of patterns -- and it is achieved by reverberating the information depicted from the examples through the layers of the network. By adapting Guerra's interpolation technique, we provide a full statistical mechanical picture of supervised and unsupervised learning processes (at the replica symmetric level of description) obtaining analytically phase diagrams, thresholds for learning, a picture of the ground-state in plain agreement with Monte Carlo simulations and signal-to-noise outcomes. In the large dataset limit, the Kosko storage prescription as well as its statistical mechanical picture provided by Kurchan, Peliti, and Saber in the eighties is fully recovered. Computational advantages in dealing with information reverberation, rather than storage, are discussed for natural test cases. In particular, we show how this network admits an integral representation in terms of two coupled restricted Boltzmann machines, whose hidden layers are entirely built of by grand-mother neurons, to prove that by coupling solely these grand-mother neurons we can correlate the patterns they are related to: it is thus possible to recover Pavlov's Classical Conditioning by adding just one synapse among the correct grand-mother neurons (hence saving an extensive number of these links for further information storage w.r.t. the classical autoassociative setting).

artificial intelligence, machine learning, neural network, (20 more...)

arXiv.org Machine Learning

2307.08365

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Efficient shallow learning as an alternative to deep learning

Meir, Yuval, Tevet, Ofek, Tzach, Yarden, Hodassman, Shiri, Gross, Ronit D., Kanter, Ido

arXiv.org Artificial IntelligenceNov-23-2022

The realization of complex classification tasks requires training of deep learning (DL) architectures consisting of tens or even hundreds of convolutional and fully connected hidden layers, which is far from the reality of the human brain. According to the DL rationale, the first convolutional layer reveals localized patterns in the input and large-scale patterns in the following layers, until it reliably characterizes a class of inputs. Here, we demonstrate that with a fixed ratio between the depths of the first and second convolutional layers, the error rates of the generalized shallow LeNet architecture, consisting of only five layers, decay as a power law with the number of filters in the first convolutional layer. The extrapolation of this power law indicates that the generalized LeNet can achieve small error rates that were previously obtained for the CIFAR-10 database using DL architectures. A power law with a similar exponent also characterizes the generalized VGG-16 architecture. However, this results in a significantly increased number of operations required to achieve a given error rate with respect to LeNet. This power law phenomenon governs various generalized LeNet and VGG-16 architectures, hinting at its universal behavior and suggesting a quantitative hierarchical time-space complexity among machine learning architectures. Additionally, the conservation law along the convolutional layers, which is the square-root of their size times their depth, is found to asymptotically minimize error rates. The efficient shallow learning that is demonstrated in this study calls for further quantitative examination using various databases and architectures and its accelerated implementation using future dedicated hardware developments.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1038/s41598-023-32559-8

2211.11106

Country: Asia > Middle East (0.28)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Power-law Scaling to Assist with Key Challenges in Artificial Intelligence

Meir, Yuval, Sardi, Shira, Hodassman, Shiri, Kisos, Karin, Ben-Noam, Itamar, Goldental, Amir, Kanter, Ido

arXiv.org Artificial IntelligenceNov-15-2022

Power-law scaling, a central concept in critical phenomena, is found to be useful in deep learning, where optimized test errors on handwritten digit examples converge as a power-law to zero with database size. For rapid decision making with one training epoch, each example is presented only once to the trained network, the power-law exponent increased with the number of hidden layers. For the largest dataset, the obtained test error was estimated to be in the proximity of state-of-the-art algorithms for large epoch numbers. Power-law scaling assists with key challenges found in current artificial intelligence applications and facilitates an a priori dataset size estimation to achieve a desired test accuracy. It establishes a benchmark for measuring training complexity and a quantitative hierarchy of machine learning tasks and algorithms.

artificial intelligence, machine learning, test error, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1038/s41598-020-76764-1

2211.0843

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Brain inspired neuronal silencing mechanism to enable reliable sequence identification

Hodassman, Shiri, Meir, Yuval, Kisos, Karin, Ben-Noam, Itamar, Tugendhaft, Yael, Goldental, Amir, Vardi, Roni, Kanter, Ido

arXiv.org Artificial IntelligenceOct-2-2022

Real-time sequence identification is a core use-case of artificial neural networks (ANNs), ranging from recognizing temporal events to identifying verification codes. Existing methods apply recurrent neural networks, which suffer from training difficulties; however, performing this function without feedback loops remains a challenge. Here, we present an experimental neuronal long-term plasticity mechanism for high-precision feedforward sequence identification networks (ID-nets) without feedback loops, wherein input objects have a given order and timing. This mechanism temporarily silences neurons following their recent spiking activity. Therefore, transitory objects act on different dynamically created feedforward sub-networks. ID-nets are demonstrated to reliably identify 10 handwritten digit sequences, and are generalized to deep convolutional ANNs with continuous activation nodes trained on image sequences. Counterintuitively, their classification performance, even with a limited number of training examples, is high for sequences but low for individual objects. ID-nets are also implemented for writer-dependent recognition, and suggested as a cryptographic tool for encrypted authentication. The presented mechanism opens new horizons for advanced ANN algorithms.

artificial intelligence, machine learning, sequence, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1038/s41598-022-20337-x

2203.13028

Country: North America > United States (0.67)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Supervised Hebbian Learning

Alemanno, Francesco, Aquaro, Miriam, Kanter, Ido, Barra, Adriano, Agliari, Elena

arXiv.org Artificial IntelligenceSep-7-2022

In neural network's Literature, Hebbian learning traditionally refers to the procedure by which the Hopfield model and its generalizations store archetypes (i.e., definite patterns that are experienced just once to form the synaptic matrix). However, the term "Learning" in Machine Learning refers to the ability of the machine to extract features from the supplied dataset (e.g., made of blurred examples of these archetypes), in order to make its own representation of the unavailable archetypes. Here, given a sample of examples, we define a supervised learning protocol by which the Hopfield network can infer the archetypes, and we detect the correct control parameters (including size and quality of the dataset) to depict a phase diagram for the system performance. We also prove that, for structureless datasets, the Hopfield model equipped with this supervised learning rule is equivalent to a restricted Boltzmann machine and this suggests an optimal and interpretable training routine. Finally, this approach is generalized to structured datasets: we highlight a quasi-ultrametric organization (reminiscent of replica-symmetry-breaking) in the analyzed datasets and, consequently, we introduce an additional "replica hidden layer" for its (partial) disentanglement, which is shown to improve MNIST classification from 75% to 95%, and to offer a new perspective on deep architectures.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1209/0295-5075/aca55f

2203.01304

Country: Europe (0.68)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Synchronization of neural networks by mutual learning and its application to cryptography

Klein, Einat, Mislovaty, Rachel, Kanter, Ido, Ruttor, Andreas, Kinzel, Wolfgang

Neural Information Processing SystemsDec-31-2005

Two neural networks that are trained on their mutual output synchronize to an identical time dependant weight vector. This novel phenomenon can be used for creation of a secure cryptographic secret-key using a public channel. Several models for this cryptographic system have been suggested, and have been tested for their security under different sophisticated attackstrategies. The most promising models are networks that involve chaos synchronization. The synchronization process of mutual learning is described analytically using statistical physics methods.

artificial intelligence, neural network, synchronization, (15 more...)

Neural Information Processing Systems

Country: Europe > Germany (0.15)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)

Add feedback

Analytical Study of the Interplay between Architecture and Predictability

Priel, Avner, Kanter, Ido, Kessler, David A.

Neural Information Processing SystemsDec-31-1998

We study model feed forward networks as time series predictors in the stationary limit. The focus is on complex, yet non-chaotic, behavior. The main question we address is whether the asymptotic behavior is governed by the architecture, regardless the details of the weights. We find hierarchies among classes of architectures with respect to the attract or dimension of the long term sequence they are capable of generating; larger number of hidden units can attractors. In the case of a perceptron,generate higher dimensional the stationary solution for general weights, and showwe develop that the flow is typically one dimensional.

artificial intelligence, attractor, neural network, (20 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.59)

Add feedback