lrnn
Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues
Grazzi, Riccardo, Siems, Julien, Franke, Jörg K. H., Zela, Arber, Hutter, Frank, Pontil, Massimiliano
Linear Recurrent Neural Networks (LRNNs) such as Mamba, RWKV, GLA, mLSTM, and DeltaNet have emerged as efficient alternatives to Transformers in large language modeling, offering linear scaling with sequence length and improved training efficiency. However, LRNNs struggle to perform state-tracking which may impair performance in tasks such as code evaluation or tracking a chess game. Even parity, the simplest state-tracking task, which non-linear RNNs like LSTM handle effectively, cannot be solved by current LRNNs. Recently, Sarrof et al. (2024) demonstrated that the failure of LRNNs like Mamba to solve parity stems from restricting the value range of their diagonal state-transition matrices to $[0, 1]$ and that incorporating negative values can resolve this issue. We extend this result to non-diagonal LRNNs, which have recently shown promise in models such as DeltaNet. We prove that finite precision LRNNs with state-transition matrices having only positive eigenvalues cannot solve parity, while complex eigenvalues are needed to count modulo $3$. Notably, we also prove that LRNNs can learn any regular language when their state-transition matrices are products of identity minus vector outer product matrices, each with eigenvalues in the range $[-1, 1]$. Our empirical results confirm that extending the eigenvalue range of models like Mamba and DeltaNet to include negative values not only enables them to solve parity but consistently improves their performance on state-tracking tasks. Furthermore, pre-training LRNNs with an extended eigenvalue range for language modeling achieves comparable performance and stability while showing promise on code and math data. Our work enhances the expressivity of modern LRNNs, broadening their applicability without changing the cost of training or inference.
- North America > United States (0.14)
- Asia > Middle East > Jordan (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
The Power of Linear Recurrent Neural Networks
Stolzenburg, Frieder, Litz, Sandra, Michael, Olivia, Obst, Oliver
Recurrent neural networks are a powerful means to cope with time series. We show how autoregressive linear, i.e., linearly activated recurrent neural networks (LRNNs) can approximate any time-dependent function f(t) given by a number of function values. The approximation can effectively be learned by simply solving a linear equation system; no backpropagation or similar methods are needed. Furthermore, and this is probably the main contribution of this article, the size of an LRNN can be reduced significantly in one step after inspecting the spectrum of the network transition matrix, i.e., its eigenvalues, by taking only the most relevant components. Therefore, in contrast to other approaches, we do not only learn network weights but also the network architecture. LRNNs have interesting properties: They end up in ellipse trajectories in the long run and allow the prediction of further values and compact representations of functions. We demonstrate this by several experiments, among them multiple superimposed oscillators (MSO), robotic soccer, and predicting stock prices. LRNNs outperform the previous state-of-the-art for the MSO task with a minimal number of units.
- Asia > Middle East > Jordan (0.06)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)
- Europe > Switzerland (0.04)
- (15 more...)
- Leisure & Entertainment > Sports > Soccer (1.00)
- Banking & Finance (0.88)
Beyond Graph Neural Networks with Lifted Relational Neural Networks
Sourek, Gustav, Zelezny, Filip, Kuzelka, Ondrej
We demonstrate a declarative differentiable programming framework based on the language of Lifted Relational Neural Networks, where small parameterized logic programs are used to encode relational learning scenarios. When presented with relational data, such as various forms of graphs, the program interpreter dynamically unfolds differentiable computational graphs to be used for the program parameter optimization by standard means. Following from the used declarative Datalog abstraction, this results into compact and elegant learning programs, in contrast with the existing procedural approaches operating directly on the computational graph level. We illustrate how this idea can be used for an efficient encoding of a diverse range of existing advanced neural architectures, with a particular focus on Graph Neural Networks (GNNs). Additionally, we show how the contemporary GNN models can be easily extended towards higher relational expressiveness. In the experiments, we demonstrate correctness and computation efficiency through comparison against specialized GNN deep learning frameworks, while shedding some light on the learning performance of existing GNN models.
- North America > United States > Missouri > St. Louis County > St. Louis (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Europe > France > Grand Est > Bas-Rhin > Strasbourg (0.04)
- Europe > Czechia > Prague (0.04)
- Research Report (0.50)
- Overview (0.46)
Lifted Relational Neural Networks: Efficient Learning of Latent Relational Structures
Sourek, Gustav, Aschenbrenner, Vojtech, Zelezny, Filip, Schockaert, Steven, Kuzelka, Ondrej
We propose a method to combine the interpretability and expressive power of firstorder logic with the effectiveness of neural network learning. In particular, we introduce a lifted framework in which first-order rules are used to describe the structure of a given problem setting. These rules are then used as a template for constructing a number of neural networks, one for each training and testing example. As the different networks corresponding to different examples share their weights, these weights can be efficiently learned using stochastic gradient descent. Our framework provides a flexible way for implementing and combining a wide variety of modelling constructs. In particular, the use of first-order logic allows for a declarative specification of latent relational structures, which can then be efficiently discovered in a given data set using neural network learning. Experiments on 78 relational learning benchmarks clearly demonstrate the effectiveness of the framework.
- Europe > Czechia > Prague (0.04)
- Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
- North America > United States > Oregon > Benton County > Corvallis (0.04)
- (4 more...)
- Overview (0.46)
- Research Report > New Finding (0.46)
Stacked Structure Learning for Lifted Relational Neural Networks
Sourek, Gustav, Svatos, Martin, Zelezny, Filip, Schockaert, Steven, Kuzelka, Ondrej
Lifted Relational Neural Networks (LRNNs [15]) are weighted sets of first-order rules, which are used to construct feed-forward neural networks from relational structures. A central characteristic of LRNNs is that a different neural network is constructed for each learning example, but crucially, the weights of these different neural networks are shared. This allows LRNNs to use neural networks for learning in relational domains, despite the fact that training examples may vary considerably in size and structure. In previous work, LRNNs have been learned from handcrafted rules. In such cases, only the weights of the first-order rules have to be learned from training data, which can be accomplished using a variant of back-propagation. The use of handcrafted rules offers a natural way to incorporate domain knowledge in the learning process. In some applications, however, (sufficient) domain knowledge is lacking and both the rules and their weights have to be learned from data. To this end, in this paper we introduce a structure learning method for LRNNs. Our proposed structure learning method proceeds in an iterative fashion.
Lifted Relational Neural Networks
Sourek, Gustav, Aschenbrenner, Vojtech, Zelezny, Filip, Kuzelka, Ondrej
We propose a method combining relational-logic representations with neural network learning. A general lifted architecture, possibly reflecting some background domain knowledge, is described through relational rules which may be handcrafted or learned. The relational rule-set serves as a template for unfolding possibly deep neural networks whose structures also reflect the structures of given training or testing relational examples. Different networks corresponding to different examples share their weights, which co-evolve during training by stochastic gradient descent algorithm. The framework allows for hierarchical relational modeling constructs and learning of latent relational concepts through shared hidden layers weights corresponding to the rules. Discovery of notable relational concepts and experiments on 78 relational learning benchmarks demonstrate favorable performance of the method.
- Europe > Czechia > Prague (0.04)
- North America > United States > New Jersey (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (2 more...)