AITopics | Yahav, Eran

Collaborating Authors

Yahav, Eran

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On the Expressivity Role of LayerNorm in Transformers' Attention

Brody, Shaked, Alon, Uri, Yahav, Eran

arXiv.org Artificial IntelligenceMay-11-2023

Layer Normalization (LayerNorm) is an inherent component in all Transformer-based models. In this paper, we show that LayerNorm is crucial to the expressivity of the multi-head attention layer that follows it. This is in contrast to the common belief that LayerNorm's only role is to normalize the activations during the forward pass, and their gradients during the backward pass. We consider a geometric interpretation of LayerNorm and show that it consists of two components: (a) projection of the input vectors to a $d-1$ space that is orthogonal to the $\left[1,1,...,1\right]$ vector, and (b) scaling of all vectors to the same norm of $\sqrt{d}$. We show that each of these components is important for the attention layer that follows it in Transformers: (a) projection allows the attention mechanism to create an attention query that attends to all keys equally, offloading the need to learn this operation by the attention; and (b) scaling allows each key to potentially receive the highest attention, and prevents keys from being "un-select-able". We show empirically that Transformers do indeed benefit from these properties of LayeNorm in general language modeling and even in computing simple functions such as "majority". Our code is available at https://github.com/tech-srl/layer_norm_expressivity_role .

layernorm, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.02582

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Diffusing Graph Attention

Glickman, Daniel, Yahav, Eran

arXiv.org Artificial IntelligenceMar-1-2023

The dominant paradigm for machine learning on graphs uses Message Passing Graph Neural Networks (MP-GNNs), in which node representations are updated by aggregating information in their local neighborhood. Recently, there have been increasingly more attempts to adapt the Transformer architecture to graphs in an effort to solve some known limitations of MP-GNN. A challenging aspect of designing Graph Transformers is integrating the arbitrary graph structure into the architecture. We propose Graph Diffuser (GD) to address this challenge. GD learns to extract structural and positional relationships between distant nodes in the graph, which it then uses to direct the Transformer's attention and node representation. We demonstrate that existing GNNs and Graph Transformers struggle to capture long-range interactions and how Graph Diffuser does so while admitting intuitive visualizations. Experiments on eight benchmarks show Graph Diffuser to be a highly competitive model, outperforming the state-of-the-art in a diverse set of domains. Graph Neural Networks have seen increasing popularity as a versatile tool for graph representation learning, with applications in a wide variety of domains such as protein design (e.g., Ingraham et al. (2019)) and drug development (e.g., Gaudelet et al. (2020)).

artificial intelligence, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2303.00613

Genre: Research Report (0.83)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.89)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

On the Bottleneck of Graph Neural Networks and its Practical Implications

Alon, Uri, Yahav, Eran

arXiv.org Machine LearningJun-9-2020

Graph neural networks (GNNs) were shown to effectively learn from highly structured data containing elements (nodes) with relationships (edges) between them. GNN variants differ in how each node in the graph absorbs the information flowing from its neighbor nodes. In this paper, we highlight an inherent problem in GNNs: the mechanism of propagating information between neighbors creates a bottleneck when every node aggregates messages from its neighbors. This bottleneck causes the over-squashing of exponentially-growing information into fixed-size vectors. As a result, the graph fails to propagate messages flowing from distant nodes and performs poorly when the prediction task depends on long-range information. We demonstrate that the bottleneck hinders popular GNNs from fitting the training data. We show that GNNs that absorb incoming edges equally, like GCN and GIN, are more susceptible to over-squashing than other GNN types. We further show that existing, extensively-tuned, GNN-based models suffer from over-squashing and that breaking the bottleneck improves state-of-the-art results without any hyperparameter tuning or additional weights.

health & medicine, neural network, node, (19 more...)

arXiv.org Machine Learning

2006.05205

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Structural Language Models for Any-Code Generation

Alon, Uri, Sadaka, Roy, Levy, Omer, Yahav, Eran

arXiv.org Machine LearningSep-30-2019

We address the problem of Any-Code Generation (AnyGen) - generating code without any restriction on the vocabulary or structure. The state-of-the-art in this problem is the sequence-to-sequence (seq2seq) approach, which treats code as a sequence and does not leverage any structural information. We introduce a new approach to AnyGen that leverages the strict syntax of programming languages to model a code snippet as a tree - structural language modeling (SLM). SLM estimates the probability of the program's abstract syntax tree (AST) by decomposing it into a product of conditional probabilities over its nodes. We present a neural model that computes these conditional probabilities by considering all AST paths leading to a target node. Unlike previous structural techniques that have severely restricted the kinds of expressions that can be generated, our approach can generate arbitrary expressions in any programming language. Our model significantly outperforms both seq2seq and a variety of existing structured approaches in generating Java and C# code. We make our code, datasets, and models available online.

deep learning, neural network, node, (17 more...)

arXiv.org Machine Learning

1910.00577

Country:

Europe > Portugal (0.14)
Europe > Germany (0.14)
Europe > Belgium (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (0.62)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.53)

Add feedback

Neural Reverse Engineering of Stripped Binaries

David, Yaniv, Alon, Uri, Yahav, Eran

arXiv.org Machine LearningFeb-25-2019

We address the problem of predicting procedure names in stripped executables which contain no debug information. Predicting procedure names can dramatically ease the task of reverse engineering, saving precious time and human effort. We present a novel approach that leverages static analysis of binaries with encoder-decoder-based neural networks. The main idea is to use static analysis to obtain enriched representations of API call sites; encode a set of sequences of these call sites; and finally, attend to the encoded sequences while decoding the target name token-by-token. We evaluate our model by predicting procedure names over $60,000$ procedures in $10,000$ stripped executables. Our model achieves $81.70$ precision and $80.12$ recall in predicting procedure names within GNU packages, and $55.48$ precision and $51.31$ recall in a diverse, cross-package, dataset. Comparing to previous approaches, the predictions made by our model are much more accurate and informative.

deep learning, neural network, sequence, (19 more...)

arXiv.org Machine Learning

1902.09122

Country: North America > United States > New York (0.29)

Genre: Research Report (0.84)

Industry: Information Technology (0.46)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

code2seq: Generating Sequences from Structured Representations of Code

Alon, Uri, Levy, Omer, Yahav, Eran

arXiv.org Machine LearningAug-3-2018

The ability to generate natural language sequences from source code snippets can be used for code summarization, documentation, and retrieval. Sequence-to-sequence (seq2seq) models, adopted from neural machine translation (NMT), have achieved state-of-the-art performance on these tasks by treating source code as a sequence of tokens. We present ${\rm {\scriptsize CODE2SEQ}}$: an alternative approach that leverages the syntactic structure of programming languages to better encode source code. Our model represents a code snippet as the set of paths in its abstract syntax tree (AST) and uses attention to select the relevant paths during decoding, much like contemporary NMT models. We demonstrate the effectiveness of our approach for two tasks, two programming languages, and four datasets of up to 16M examples. Our model significantly outperforms previous models that were specifically designed for programming languages, as well as general state-of-the-art NMT models.

deep learning, neural network, sequence, (22 more...)

arXiv.org Machine Learning

1808.014

Country:

Europe (0.47)
North America > United States > New York (0.30)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

On the Practical Computational Power of Finite Precision RNNs for Language Recognition

Weiss, Gail, Goldberg, Yoav, Yahav, Eran

arXiv.org Machine LearningMay-13-2018

While Recurrent Neural Networks (RNNs) are famously known to be Turing complete, this relies on infinite precision in the states and unbounded computation time. We consider the case of RNNs with finite precision whose computation time is linear in the input length. Under these limitations, we show that different RNN variants have different computational power. In particular, we show that the LSTM and the Elman-RNN with ReLU activation are strictly stronger than the RNN with a squashing activation and the GRU. This is achieved because LSTMs and ReLU-RNNs can easily implement counting behavior. We show empirically that the LSTM does indeed learn to effectively use the counting mechanism.

deep learning, dimension, neural network, (18 more...)

arXiv.org Machine Learning

1805.04908

Country:

North America > United States > Texas (0.14)
Asia > Middle East > Qatar (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

code2vec: Learning Distributed Representations of Code

Alon, Uri, Zilberstein, Meital, Levy, Omer, Yahav, Eran

arXiv.org Machine LearningMar-26-2018

We present a neural model for representing snippets of code as continuous distributed vectors. The main idea is to represent code as a collection of paths in its abstract syntax tree, and aggregate these paths, in a smart and scalable way, into a single fixed-length \emph{code vector}, which can be used to predict semantic properties of the snippet. We demonstrate the effectiveness of our approach by using it to predict a method's name from the vector representation of its body. We evaluate our approach by training a model on a dataset of $14$M methods. We show that code vectors trained on this dataset can predict method names from files that were completely unobserved during training. Furthermore, we show that our model learns useful method name vectors that capture semantic similarities, combinations, and analogies. Comparing previous techniques over the same data set, our approach obtains a relative improvement of over $75\%$, being the first to successfully predict method names based on a large, cross-project, corpus.

deep learning, neural network, representation, (22 more...)

arXiv.org Machine Learning

1803.09473

Country:

Europe (0.67)
North America > United States > New York > New York County > New York City (0.15)

Genre: Research Report (0.64)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.88)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.88)

Add feedback