AITopics | lrnn

Collaborating Authors

lrnn

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues

Grazzi, Riccardo, Siems, Julien, Franke, Jörg K. H., Zela, Arber, Hutter, Frank, Pontil, Massimiliano

arXiv.org Artificial IntelligenceDec-6-2024

Linear Recurrent Neural Networks (LRNNs) such as Mamba, RWKV, GLA, mLSTM, and DeltaNet have emerged as efficient alternatives to Transformers in large language modeling, offering linear scaling with sequence length and improved training efficiency. However, LRNNs struggle to perform state-tracking which may impair performance in tasks such as code evaluation or tracking a chess game. Even parity, the simplest state-tracking task, which non-linear RNNs like LSTM handle effectively, cannot be solved by current LRNNs. Recently, Sarrof et al. (2024) demonstrated that the failure of LRNNs like Mamba to solve parity stems from restricting the value range of their diagonal state-transition matrices to $[0, 1]$ and that incorporating negative values can resolve this issue. We extend this result to non-diagonal LRNNs, which have recently shown promise in models such as DeltaNet. We prove that finite precision LRNNs with state-transition matrices having only positive eigenvalues cannot solve parity, while complex eigenvalues are needed to count modulo $3$. Notably, we also prove that LRNNs can learn any regular language when their state-transition matrices are products of identity minus vector outer product matrices, each with eigenvalues in the range $[-1, 1]$. Our empirical results confirm that extending the eigenvalue range of models like Mamba and DeltaNet to include negative values not only enables them to solve parity but consistently improves their performance on state-tracking tasks. Furthermore, pre-training LRNNs with an extended eigenvalue range for language modeling achieves comparable performance and stability while showing promise on code and math data. Our work enhances the expressivity of modern LRNNs, broadening their applicability without changing the cost of training or inference.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.12537

Country:

North America > United States (0.14)
Asia > Middle East > Jordan (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Games > Chess (0.74)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Power of Linear Recurrent Neural Networks

Stolzenburg, Frieder, Litz, Sandra, Michael, Olivia, Obst, Oliver

arXiv.org Artificial IntelligenceMay-25-2023

Recurrent neural networks are a powerful means to cope with time series. We show how autoregressive linear, i.e., linearly activated recurrent neural networks (LRNNs) can approximate any time-dependent function f(t) given by a number of function values. The approximation can effectively be learned by simply solving a linear equation system; no backpropagation or similar methods are needed. Furthermore, and this is probably the main contribution of this article, the size of an LRNN can be reduced significantly in one step after inspecting the spectrum of the network transition matrix, i.e., its eigenvalues, by taking only the most relevant components. Therefore, in contrast to other approaches, we do not only learn network weights but also the network architecture. LRNNs have interesting properties: They end up in ellipse trajectories in the long run and allow the prediction of further values and compact representations of functions. We demonstrate this by several experiments, among them multiple superimposed oscillators (MSO), robotic soccer, and predicting stock prices. LRNNs outperform the previous state-of-the-art for the MSO task with a minimal number of units.

artificial intelligence, lrnn, machine learning, (15 more...)

arXiv.org Artificial Intelligence

1802.03308

Country:

Asia > Middle East > Jordan (0.06)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)
Europe > Switzerland (0.04)
(15 more...)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Sports > Soccer (1.00)
Banking & Finance (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Beyond Graph Neural Networks with Lifted Relational Neural Networks

Sourek, Gustav, Zelezny, Filip, Kuzelka, Ondrej

arXiv.org Artificial IntelligenceJul-13-2020

We demonstrate a declarative differentiable programming framework based on the language of Lifted Relational Neural Networks, where small parameterized logic programs are used to encode relational learning scenarios. When presented with relational data, such as various forms of graphs, the program interpreter dynamically unfolds differentiable computational graphs to be used for the program parameter optimization by standard means. Following from the used declarative Datalog abstraction, this results into compact and elegant learning programs, in contrast with the existing procedural approaches operating directly on the computational graph level. We illustrate how this idea can be used for an efficient encoding of a diverse range of existing advanced neural architectures, with a particular focus on Graph Neural Networks (GNNs). Additionally, we show how the contemporary GNN models can be easily extended towards higher relational expressiveness. In the experiments, we demonstrate correctness and computation efficiency through comparison against specialized GNN deep learning frameworks, while shedding some light on the learning performance of existing GNN models.

artificial intelligence, machine learning, representation, (16 more...)

arXiv.org Artificial Intelligence

2007.06286

Country:

North America > United States > Missouri > St. Louis County > St. Louis (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Europe > France > Grand Est > Bas-Rhin > Strasbourg (0.04)
Europe > Czechia > Prague (0.04)

Genre:

Research Report (0.50)
Overview (0.46)

Industry: Education (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Lifted Relational Neural Networks: Efficient Learning of Latent Relational Structures

Sourek, Gustav, Aschenbrenner, Vojtech, Zelezny, Filip, Schockaert, Steven, Kuzelka, Ondrej

Journal of Artificial Intelligence ResearchMay-17-2018

We propose a method to combine the interpretability and expressive power of firstorder logic with the effectiveness of neural network learning. In particular, we introduce a lifted framework in which first-order rules are used to describe the structure of a given problem setting. These rules are then used as a template for constructing a number of neural networks, one for each training and testing example. As the different networks corresponding to different examples share their weights, these weights can be efficiently learned using stochastic gradient descent. Our framework provides a flexible way for implementing and combining a wide variety of modelling constructs. In particular, the use of first-order logic allows for a declarative specification of latent relational structures, which can then be efficiently discovered in a given data set using neural network learning. Experiments on 78 relational learning benchmarks clearly demonstrate the effectiveness of the framework.

lrnn, neural network, neuron, (17 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.11203

AI Access Foundation

11203

Journal of Artificial Intelligence Research

Country:

Europe > Czechia > Prague (0.04)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
North America > United States > Oregon > Benton County > Corvallis (0.04)
(4 more...)

Genre:

Overview (0.46)
Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Stacked Structure Learning for Lifted Relational Neural Networks

Sourek, Gustav, Svatos, Martin, Zelezny, Filip, Schockaert, Steven, Kuzelka, Ondrej

arXiv.org Machine LearningOct-5-2017

Lifted Relational Neural Networks (LRNNs [15]) are weighted sets of first-order rules, which are used to construct feed-forward neural networks from relational structures. A central characteristic of LRNNs is that a different neural network is constructed for each learning example, but crucially, the weights of these different neural networks are shared. This allows LRNNs to use neural networks for learning in relational domains, despite the fact that training examples may vary considerably in size and structure. In previous work, LRNNs have been learned from handcrafted rules. In such cases, only the weights of the first-order rules have to be learned from training data, which can be accomplished using a variant of back-propagation. The use of handcrafted rules offers a natural way to incorporate domain knowledge in the learning process. In some applications, however, (sufficient) domain knowledge is lacking and both the rules and their weights have to be learned from data. To this end, in this paper we introduce a structure learning method for LRNNs. Our proposed structure learning method proceeds in an iterative fashion.

artificial intelligence, machine learning, predicate, (14 more...)

arXiv.org Machine Learning

1710.02221

Country: Europe > Czechia (0.15)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

Add feedback

Lifted Relational Neural Networks

Sourek, Gustav, Aschenbrenner, Vojtech, Zelezny, Filip, Kuzelka, Ondrej

arXiv.org Artificial IntelligenceOct-13-2015

We propose a method combining relational-logic representations with neural network learning. A general lifted architecture, possibly reflecting some background domain knowledge, is described through relational rules which may be handcrafted or learned. The relational rule-set serves as a template for unfolding possibly deep neural networks whose structures also reflect the structures of given training or testing relational examples. Different networks corresponding to different examples share their weights, which co-evolve during training by stochastic gradient descent algorithm. The framework allows for hierarchical relational modeling constructs and learning of latent relational concepts through shared hidden layers weights corresponding to the rules. Discovery of notable relational concepts and experiments on 78 relational learning benchmarks demonstrate favorable performance of the method.

artificial intelligence, machine learning, neural network, (16 more...)

arXiv.org Artificial Intelligence

1508.05128

Country: Europe > Czechia (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback