Sengupta, Biswa
Learning Multi-Agent Communication with Contrastive Learning
Lo, Yat Long, Sengupta, Biswa, Foerster, Jakob, Noukhovitch, Michael
Communication is a powerful tool for coordination in multi-agent RL. But inducing an effective, common language is a difficult challenge, particularly in the decentralized setting. In this work, we introduce an alternative perspective where communicative messages sent between agents are considered as different incomplete views of the environment state. By examining the relationship between messages sent and received, we propose to learn to communicate using contrastive learning to maximize the mutual information between messages of a given trajectory. In communication-essential environments, our method outperforms previous work in both performance and learning speed. Using qualitative metrics and representation probing, we show that our method induces more symmetric communication and captures global state information from the environment. Overall, we show the power of contrastive learning and the importance of leveraging messages as encodings for effective communication.
How Robust are Deep Neural Networks?
Sengupta, Biswa, Friston, Karl J.
Convolutional and Recurrent, deep neural networks have been successful in machine learning systems for computer vision, reinforcement learning, and other allied fields. However, the robustness of such neural networks is seldom apprised, especially after high classification accuracy has been attained. In this paper, we evaluate the robustness of three recurrent neural networks to tiny perturbations, on three widely used datasets, to argue that high accuracy does not always mean a stable and a robust (to bounded perturbations, adversarial attacks, etc.) system. Especially, normalizing the spectrum of the discrete recurrent network to bound the spectrum (using power method, Rayleigh quotient, etc.) on a unit disk produces stable, albeit highly non-robust neural networks. Furthermore, using the $\epsilon$-pseudo-spectrum, we show that training of recurrent networks, say using gradient-based methods, often result in non-normal matrices that may or may not be diagonalizable. Therefore, the open problem lies in constructing methods that optimize not only for accuracy but also for the stability and the robustness of the underlying neural network, a criterion that is distinct from the other.
StackSeq2Seq: Dual Encoder Seq2Seq Recurrent Networks
Bay, Alessandro, Sengupta, Biswa
A widely studied non-deterministic polynomial time (NP) hard problem lies in finding a route between the two nodes of a graph. Often meta-heuristics algorithms such as $A^{*}$ are employed on graphs with a large number of nodes. Here, we propose a deep recurrent neural network architecture based on the Sequence-2-Sequence (Seq2Seq) model, widely used, for instance in text translation. Particularly, we illustrate that utilising a context vector that has been learned from two different recurrent networks enables increased accuracies in learning the shortest route of a graph. Additionally, we show that one can boost the performance of the Seq2Seq network by smoothing the loss function using a homotopy continuation of the decoder's loss function.
GeoSeq2Seq: Information Geometric Sequence-to-Sequence Networks
Bay, Alessandro, Sengupta, Biswa
The Fisher information metric is an important foundation of information geometry, wherein it allows us to approximate the local geometry of a probability distribution. Recurrent neural networks such as the Sequence-to-Sequence (Seq2Seq) networks that have lately been used to yield state-of-the-art performance on speech translation or image captioning have so far ignored the geometry of the latent embedding, that they iteratively learn. We propose the information geometric Seq2Seq (GeoSeq2Seq) network which abridges the gap between deep recurrent neural networks and information geometry. Specifically, the latent embedding offered by a recurrent network is encoded as a Fisher kernel of a parametric Gaussian Mixture Model, a formalism common in computer vision. We utilise such a network to predict the shortest routes between two nodes of a graph by learning the adjacency matrix using the GeoSeq2Seq formalism; our results show that for such a problem the probabilistic representation of the latent embedding supersedes the non-probabilistic embedding by 10-15\%.
Multi-kernel learning of deep convolutional features for action recognition
Sengupta, Biswa, Qian, Yu
Image understanding using deep convolutional network has reached human-level performance, yet a closely related problem of video understanding especially, action recognition has not reached the requisite level of maturity. We combine multi-kernels based support-vector-machines (SVM) with a multi-stream deep convolutional neural network to achieve close to state-of-the-art performance on a 51-class activity recognition problem (HMDB-51 dataset); this specific dataset has proved to be particularly challenging for deep neural networks due to the heterogeneity in camera viewpoints, video quality, etc. The resulting architecture is named pillar networks as each (very) deep neural network acts as a pillar for the hierarchical classifiers. In addition, we illustrate that hand-crafted features such as improved dense trajectories (iDT) and Multi-skip Feature Stacking (MIFS), as additional pillars, can further supplement the performance.
Bayesian Belief Updating of Spatiotemporal Seizure Dynamics
Cooray, Gerald K, Rosch, Richard, Baldeweg, Torsten, Lemieux, Louis, Friston, Karl, Sengupta, Biswa
Epileptic seizure activity shows complicated dynamics in both space and time. To understand the evolution and propagation of seizures spatially extended sets of data need to be analysed. We have previously described an efficient filtering scheme using variational Laplace that can be used in the Dynamic Causal Modelling (DCM) framework [Friston, 2003] to estimate the temporal dynamics of seizures recorded using either invasive or non-invasive electrical recordings (EEG/ECoG). Spatiotemporal dynamics are modelled using a partial differential equation -- in contrast to the ordinary differential equation used in our previous work on temporal estimation of seizure dynamics [Cooray, 2016]. We provide the requisite theoretical background for the method and test the ensuing scheme on simulated seizure activity data and empirical invasive ECoG data. The method provides a framework to assimilate the spatial and temporal dynamics of seizure activity, an aspect of great physiological and clinical importance.
Approximating meta-heuristics with homotopic recurrent neural networks
Bay, Alessandro, Sengupta, Biswa
Much combinatorial optimisation problems constitute a non-polynomial (NP) hard optimisation problem, i.e., they can not be solved in polynomial time. One such problem is finding the shortest route between two nodes on a graph. Meta-heuristic algorithms such as $A^{*}$ along with mixed-integer programming (MIP) methods are often employed for these problems. Our work demonstrates that it is possible to approximate solutions generated by a meta-heuristic algorithm using a deep recurrent neural network. We compare different methodologies based on reinforcement learning (RL) and recurrent neural networks (RNN) to gauge their respective quality of approximation. We show the viability of recurrent neural network solutions on a graph that has over 300 nodes and argue that a sequence-to-sequence network rather than other recurrent networks has improved approximation quality. Additionally, we argue that homotopy continuation -- that increases chances of hitting an extremum -- further improves the estimate generated by a vanilla RNN.
Pillar Networks++: Distributed non-parametric deep and wide networks
Sengupta, Biswa, Qian, Yu
In recent work, it was shown that combining multi-kernel based support vector machines (SVMs) can lead to near state-of-the-art performance on an action recognition dataset (HMDB-51 dataset). This was 0.4\% lower than frameworks that used hand-crafted features in addition to the deep convolutional feature extractors. In the present work, we show that combining distributed Gaussian Processes with multi-stream deep convolutional neural networks (CNN) alleviate the need to augment a neural network with hand-crafted features. In contrast to prior work, we treat each deep neural convolutional network as an expert wherein the individual predictions (and their respective uncertainties) are combined into a Product of Experts (PoE) framework.
Energetically Optimal Action Potentials
Stemmler, Martin B., Sengupta, Biswa, Laughlin, Simon, Niven, Jeremy
Most action potentials in the nervous system take on the form of strong, rapid, and brief voltage deflections known as spikes, in stark contrast to other action potentials, such as in the heart, that are characterized by broad voltage plateaus. We derive the shape of the neuronal action potential from first principles, by postulating that action potential generation is strongly constrained by the brain's need to minimize energy expenditure. For a given height of an action potential, the least energy is consumed when the underlying currents obey the bang-bang principle: the currents giving rise to the spike should be intense, yet short-lived, yielding spikes with sharp onsets and offsets. Energy optimality predicts features in the biophysics that are not per se required for producing the characteristic neuronal action potential: sodium currents should be extraordinarily powerful and inactivate with voltage; both potassium and sodium currents should have kinetics that have a bell-shaped voltage-dependence; and the cooperative action of multiple `gates' should start the flow of current.