Goto

Collaborating Authors

 Mali, Ankur


On the Computational Complexity and Formal Hierarchy of Second Order Recurrent Neural Networks

arXiv.org Artificial Intelligence

Artificial neural networks (ANNs) with recurrence and self-attention have been shown to be Turing-complete (TC). However, existing work has shown that these ANNs require multiple turns or unbounded computation time, even with unbounded precision in weights, in order to recognize TC grammars. However, under constraints such as fixed or bounded precision neurons and time, ANNs without memory are shown to struggle to recognize even context-free languages. In this work, we extend the theoretical foundation for the $2^{nd}$-order recurrent network ($2^{nd}$ RNN) and prove there exists a class of a $2^{nd}$ RNN that is Turing-complete with bounded time. This model is capable of directly encoding a transition table into its recurrent weights, enabling bounded time computation and is interpretable by design. We also demonstrate that $2$nd order RNNs, without memory, under bounded weights and time constraints, outperform modern-day models such as vanilla RNNs and gated recurrent units in recognizing regular grammars. We provide an upper bound and a stability analysis on the maximum number of neurons required by $2$nd order RNNs to recognize any class of regular grammar. Extensive experiments on the Tomita grammars support our findings, demonstrating the importance of tensor connections in crafting computationally efficient RNNs. Finally, we show $2^{nd}$ order RNNs are also interpretable by extraction and can extract state machines with higher success rates as compared to first-order RNNs. Our results extend the theoretical foundations of RNNs and offer promising avenues for future explainable AI research.


Brain-Inspired Computational Intelligence via Predictive Coding

arXiv.org Artificial Intelligence

Artificial intelligence (AI) is rapidly becoming one of the key technologies of this century. The majority of results in AI thus far have been achieved using deep neural networks trained with the error backpropagation learning algorithm. However, the ubiquitous adoption of this approach has highlighted some important limitations such as substantial computational cost, difficulty in quantifying uncertainty, lack of robustness, unreliability, and biological implausibility. It is possible that addressing these limitations may require schemes that are inspired and guided by neuroscience theories. One such theory, called predictive coding (PC), has shown promising performance in machine intelligence tasks, exhibiting exciting properties that make it potentially valuable for the machine learning community: PC can model information processing in different brain areas, can be used in cognitive control and robotics, and has a solid mathematical grounding in variational inference, offering a powerful inversion scheme for a specific class of continuous-state generative models. With the hope of foregrounding research in this direction, we survey the literature that has contributed to this perspective, highlighting the many ways that PC might play a role in the future of machine learning and computational intelligence at large.


The Predictive Forward-Forward Algorithm

arXiv.org Artificial Intelligence

The algorithm known as backpropagation of errors [59, 32], or "backprop" for short, has long faced criticism concerning its neurobiological plausibility [10, 14, 56, 35, 15]. Despite powering the tremendous progress and success behind deep learning and its every-growing myriad of promising applications [57, 12], it is improbable that backprop is a viable model of learning in the brain, such as in cortical regions. Notably, there are both practical and biophysical issues [15, 35], and, among these issues, there is a lack of evidence that: 1) neural activities are explicitly stored to be used later for synaptic adjustment, 2) error derivatives are backpropagated along a global feedback pathway to generate teaching signals, 3) the error signals move back along the same neural pathways used to forward propagate information, and, 4) inference and learning are locked to be largely sequential (instead of massively parallel). Furthermore, when processing temporal data, it is certainly not the case that the neural circuitry of the brain is unfolded backward through time to adjust synapses [42] (as in backprop through time). Recently, there has been a growing interest in the research domain of brain-inspired computing, which focuses on developing algorithms and computational models that attempt to circumvent or resolve critical issues such as those highlighted above. Among the most powerful and promising ones is predictive coding (PC) [18, 48, 13, 4, 51, 41], and among the most recent ones is the forward-forward (FF) algorithm [19]. These alternatives offer different means of conducting credit assignments with performance similar to backprop, but to the contrary, are more likely consistent with and similar to real biological neuron learning (see Figure 1 for a graphical depiction and comparison of respective credit assignment setups). This paper will propose a novel model and learning process, the predictive forward-forward (PFF) process, that generalizes and combines FF and PC into a robust stochastic neural system that simultaneously learns a representation and generative model in a biologically-plausible fashion. Like the FF algorithm, the PFF procedure offers a promising, potentially helpful model of biological neural circuits, a potential candidate system for low-power analog hardware and neuromorphic circuits, and a potential backprop-alternative worthy of future investigation and study.


Like a bilingual baby: The advantage of visually grounding a bilingual language model

arXiv.org Artificial Intelligence

Unlike most neural language models, humans learn language in a rich, multi-sensory and, often, multi-lingual environment. Current language models typically fail to fully capture the complexities of multilingual language use. We train an LSTM language model on images and captions in English and Spanish from MS-COCO-ES. We find that the visual grounding improves the model's understanding of semantic similarity both within and across languages and improves perplexity. However, we find no significant advantage of visual grounding for abstract words. Our results provide additional evidence of the advantages of visually grounded language models and point to the need for more naturalistic language data from multilingual speakers and multilingual datasets with perceptual grounding.


Convolutional Neural Generative Coding: Scaling Predictive Coding to Natural Images

arXiv.org Artificial Intelligence

The algorithm known as backpropagation of errors [65, 29] (or backprop) has served as a crucial element behind the tremendous progress that has been made in recent machine learning research, progress which has been accelerated by advances made in computational hardware as well as the increasing availability of vast quantities of data. Nevertheless, despite reaching or surpassing human-level performance on many different tasks ranging from those in computer vision [18] to game-playing [60], the field still has a long way to go towards developing artificial general intelligence. In order to increase task-level performance, the size of deep networks has increased greatly over the years, up to hundreds of billions of synaptic parameters as seen in modern-day transformer networks [12]. However, this trend has started to raise concerns related to energy consumption [49] and as to whether such large systems can attain the flexible, generalization ability of the human brain [5]. Furthermore, backprop itself imposes additional limitations beyond its long-argued biological implausibility [11, 15, 59], such as its dependence on a global error feedback pathway for determining each neuron's individual contribution to a deep network's overall performance [34], resulting in sequential backward, non-local updates that make parallelization difficult (which stands in strong contrast to how learning occurs in the brain [24, 47, 46]).


Provably Stable Interpretable Encodings of Context Free Grammars in RNNs with a Differentiable Stack

arXiv.org Machine Learning

Given a collection of strings belonging to a context free grammar (CFG) and another collection of strings not belonging to the CFG, how might one infer the grammar? This is the problem of grammatical inference. Since CFGs are the languages recognized by pushdown automata (PDA), it suffices to determine the state transition rules and stack action rules of the corresponding PDA. An approach would be to train a recurrent neural network (RNN) to classify the sample data and attempt to extract these PDA rules. But neural networks are not a priori aware of the structure of a PDA and would likely require many samples to infer this structure. Furthermore, extracting the PDA rules from the RNN is nontrivial. We build a RNN specifically structured like a PDA, where weights correspond directly to the PDA rules. This requires a stack architecture that is somehow differentiable (to enable gradient-based learning) and stable (an unstable stack will show deteriorating performance with longer strings). We propose a stack architecture that is differentiable and that provably exhibits orbital stability. Using this stack, we construct a neural network that provably approximates a PDA for strings of arbitrary length. Moreover, our model and method of proof can easily be generalized to other state machines, such as a Turing Machine.


Lifelong Neural Predictive Coding: Sparsity Yields Less Forgetting when Learning Cumulatively

arXiv.org Machine Learning

In lifelong learning systems, especially those based on artificial neural networks, one of the biggest obstacles is the severe inability to retain old knowledge as new information is encountered. This phenomenon is known as catastrophic forgetting. In this paper, we present a new connectionist model, the Sequential Neural Coding Network, and its learning procedure, grounded in the neurocognitive theory of predictive coding. The architecture experiences significantly less forgetting as compared to standard neural models and outperforms a variety of previously proposed remedies and methods when trained across multiple task datasets in a stream-like fashion. The promising performance demonstrated in our experiments offers motivation that directly incorporating mechanisms prominent in real neuronal systems, such as competition, sparse activation patterns, and iterative input processing, can create viable pathways for tackling the challenge of lifelong machine learning.


Biologically Motivated Algorithms for Propagating Local Target Representations

arXiv.org Machine Learning

Finding biologically plausible alternatives to back-propagation of errors is a fundamentally important challenge in artificial neural network research. In this paper, we propose a simple learning algorithm called error-driven Local Representation Alignment (LRA-E), which has strong connections to predictive coding, a theory that offers a mechanistic way of describing neurocomputational machinery. In addition, we propose an improved variant of Difference Target Propagation, another procedure that comes from the same family of algorithms as Local Representation Alignment. We compare our learning procedures to several other biologically-motivated algorithms, including two feedback alignment algorithms and Equilibrium Propagation. In two benchmark datasets, we find that both of our proposed learning algorithms yield stable performance and strong generalization abilities in comparison to other competing back-propagation alternatives when training deeper, highly nonlinear networks, with LRA-E performing the best overall.


Visually Grounded, Situated Learning in Neural Models

arXiv.org Artificial Intelligence

The theory of situated cognition postulates that language is inseparable from its physical context--words, phrases, and sentences must be learned in the context of the objects or concepts to which they refer. Yet, statistical language models are trained on words alone. This makes it impossible for language models to connect to the real world--the world described in the sentences presented to the model. In this paper, we examine the generalization ability of neural language models trained with a visual context. A multimodal connectionist language architecture based on the Differential State Framework is proposed, which outperforms its equivalent trained on language alone, even when no visual context is available at test time. Superior performance for language models trained with a visual context is robust across different languages and models.


Conducting Credit Assignment by Aligning Local Representations

arXiv.org Machine Learning

The use of back-propagation and its variants to train deep networks is often problematic for new users, with issues such as exploding gradients, vanishing gradients, and high sensitivity to weight initialization strategies often making networks difficult to train. In this paper, we present Local Representation Alignment (LRA), a training procedure that is much less sensitive to bad initializations, does not require modifications to the network architecture, and can be adapted to networks with highly nonlinear and discrete-valued activation functions. Furthermore, we show that one variation of LRA can start with a null initialization of network weights and still successfully train networks with a wide variety of nonlinearities, including tanh, ReLU-6, softplus, signum and others that are more biologically plausible. Experiments on MNIST and Fashion MNIST validate the performance of the algorithm and show that LRA can train networks robustly and effectively, succeeding even when back-propagation fails and outperforming other alternative learning algorithms, such as target propagation and feedback alignment.