Goto

Collaborating Authors

 bilinear model





Revisiting Bi-Linear State Transitions in Recurrent Neural Networks

Ebrahimi, M. Reza, Memisevic, Roland

arXiv.org Artificial Intelligence

The role of hidden units in recurrent neural networks is typically seen as modeling memory, with research focusing on enhancing information retention through gating mechanisms. A less explored perspective views hidden units as active participants in the computation performed by the network, rather than passive memory stores. In this work, we revisit bilinear operations, which involve multiplicative interactions between hidden units and input embeddings. We demonstrate theoretically and empirically that they constitute a natural inductive bias for representing the evolution of hidden states in state tracking tasks. These are the simplest type of tasks that require hidden units to actively contribute to the behavior of the network. We also show that bilinear state updates form a natural hierarchy corresponding to state tracking tasks of increasing complexity, with popular linear recurrent networks such as Mamba residing at the lowest-complexity center of that hierarchy.


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

This work addresses the question of how to improve the invariance properties of Convolutional Neural Networks. It introduces the so-called spatial transformer, a layer that performs an adaptive warping of incoming feature maps, thus generalizing the recent attention mechanisms for images. The resulting model requires no extra supervision and is trained back-to-back using backpropagation, leading to state-of-the-art results on several classification tasks. The paper is clearly written and its main contribution, the spatial transformer layer, is valuable for its novelty, simplicity and effectiveness. The related work section covers most relevant literature, except perhaps recent works that combine deformable parts models with CNNs (see for example "Deformable Part Models are Convolutional Neural Networks", "End-to-End Integration of a Convolution Network, Deformable Parts Model and Non-Maximum Suppression" both at cvpr 2015), since they also incorporate an inference over deformation or registration parameters, as in the spatial transformer case.


Multimodal Learning and Reasoning for Visual Question Answering

Ilija Ilievski, Jiashi Feng

Neural Information Processing Systems

Reasoning about entities and their relationships from multimodal data is a key goal of Artificial General Intelligence. The visual question answering (VQA) problem is an excellent way to test such reasoning capabilities of an AI model and its multimodal representation learning. However, the current VQA models are oversimplified deep neural networks, comprised of a long short-term memory (LSTM) unit for question comprehension and a convolutional neural network (CNN) for learning single image representation. We argue that the single visual representation contains a limited and general information about the image contents and thus limits the model reasoning capabilities. In this work we introduce a modular neural network model that learns a multimodal and multifaceted representation of the image and the question. The proposed model learns to use the multimodal representation to reason about the image entities and achieves a new state-of-the-art performance on both VQA benchmark datasets, VQA v1.0 and v2.0, by a wide margin.


Model-Free Robust $\phi$-Divergence Reinforcement Learning Using Both Offline and Online Data

Panaganti, Kishan, Wierman, Adam, Mazumdar, Eric

arXiv.org Machine Learning

The robust $\phi$-regularized Markov Decision Process (RRMDP) framework focuses on designing control policies that are robust against parameter uncertainties due to mismatches between the simulator (nominal) model and real-world settings. This work makes two important contributions. First, we propose a model-free algorithm called Robust $\phi$-regularized fitted Q-iteration (RPQ) for learning an $\epsilon$-optimal robust policy that uses only the historical data collected by rolling out a behavior policy (with robust exploratory requirement) on the nominal model. To the best of our knowledge, we provide the first unified analysis for a class of $\phi$-divergences achieving robust optimal policies in high-dimensional systems with general function approximation. Second, we introduce the hybrid robust $\phi$-regularized reinforcement learning framework to learn an optimal robust policy using both historical data and online sampling. Towards this framework, we propose a model-free algorithm called Hybrid robust Total-variation-regularized Q-iteration (HyTQ: pronounced height-Q). To the best of our knowledge, we provide the first improved out-of-data-distribution assumption in large-scale problems with general function approximation under the hybrid robust $\phi$-regularized reinforcement learning framework. Finally, we provide theoretical guarantees on the performance of the learned policies of our algorithms on systems with arbitrary large state space.


A Deep Architecture for Matching Short Texts

Neural Information Processing Systems

Many machine learning problems can be interpreted as learning for matching two types of objects (e.g., images and captions, users and products, queries and documents, etc.). The matching level of two objects is usually measured as the inner product in a certain feature space, while the modeling effort focuses on mapping of objects from the original space to the feature space. This schema, although proven successful on a range of matching tasks, is insufficient for capturing the rich structure in the matching process of more complicated objects. In this paper, we propose a new deep architecture to more effectively model the complicated matching relations between two objects from heterogeneous domains. More specifically, we apply this model to matching tasks in natural language, e.g., finding sensible responses for a tweet, or relevant answers to a given question. This new architecture naturally combines the localness and hierarchy intrinsic to the natural language problems, and therefore greatly improves upon the state-of-the-art models.


Separating Style and Content

Neural Information Processing Systems

We seek to analyze and manipulate two factors, which we call style and content, underlying a set of observations. We fit training data with bilinear models which explicitly represent the two-factor struc(cid:173) ture. These models can adapt easily during testing to new styles or content, allowing us to solve three general tasks: extrapolation of a new style to unobserved content; classification of content observed in a new style; and translation of new content observed in a new style. Significant per(cid:173) formance improvement on a benchmark speech dataset shows the benefits of our approach.


A Bilinear Model for Sparse Coding

Neural Information Processing Systems

Recent algorithms for sparse coding and independent component analy- sis (ICA) have demonstrated how localized features can be learned from natural images. However, these approaches do not take image transfor- mations into account. As a result, they produce image codes that are redundant because the same feature is learned at multiple locations. We describe an algorithm for sparse coding based on a bilinear generative model of images. By explicitly modeling the interaction between im- age features and their transformations, the bilinear approach helps reduce redundancy in the image code and provides a basis for transformation- invariant vision. We also explore an extension of the model that can capture spatial relationships between the independent features of an ob- ject, thereby providing a new framework for parts-based object recogni- tion.