sequence learning
Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning
Recurrent neural networks have a strong inductive bias towards learning temporally compressed representations, as the entire history of a sequence is represented by a single vector. By contrast, Transformers have little inductive bias towards learning temporally compressed representations, as they allow for attention over all previously computed elements in a sequence. Having a more compressed representation of a sequence may be beneficial for generalization, as a high-level representation may be more easily re-used and re-purposed and will contain fewer irrelevant details. At the same time, excessive compression of representations comes at the cost of expressiveness. We propose a solution which divides computation into two streams. A slow stream that is recurrent in nature aims to learn a specialized and compressed representation, by forcing chunks of $K$ time steps into a single representation which is divided into multiple vectors. At the same time, a fast stream is parameterized as a Transformer to process chunks consisting of $K$ time-steps conditioned on the information in the slow-stream. In the proposed approach we hope to gain the expressiveness of the Transformer, while encouraging better compression and structuring of representations in the slow stream. We show the benefits of the proposed method in terms of improved sample efficiency and generalization performance as compared to various competitive baselines for visual perception and sequential decision making tasks.
Sequence to Sequence Learning with Neural Networks
Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT-14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.8 on the entire test set, where the LSTM's BLEU score was penalized on out-of-vocabulary words.
Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning
Recurrent neural networks have a strong inductive bias towards learning temporally compressed representations, as the entire history of a sequence is represented by a single vector. By contrast, Transformers have little inductive bias towards learning temporally compressed representations, as they allow for attention over all previously computed elements in a sequence. Having a more compressed representation of a sequence may be beneficial for generalization, as a high-level representation may be more easily re-used and re-purposed and will contain fewer irrelevant details. At the same time, excessive compression of representations comes at the cost of expressiveness. We propose a solution which divides computation into two streams.
Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning
Recurrent neural networks have a strong inductive bias towards learning temporally compressed representations, as the entire history of a sequence is represented by a single vector. By contrast, Transformers have little inductive bias towards learning temporally compressed representations, as they allow for attention over all previously computed elements in a sequence. Having a more compressed representation of a sequence may be beneficial for generalization, as a high-level representation may be more easily re-used and re-purposed and will contain fewer irrelevant details. At the same time, excessive compression of representations comes at the cost of expressiveness. We propose a solution which divides computation into two streams.
Reviews: Wider and Deeper, Cheaper and Faster: Tensorized LSTMs for Sequence Learning
This paper proposes Tensorized LSTMs for efficient sequence learning. It represents hidden layers as tensors, and employs cross-layer memory cell convolution for efficiency and effectiveness. The model is clearly formulated. Experimental results show the utility of the proposed method. Although the paper is well written, I still have some questions/confusion as follows.
Hyperdimensional Vector Tsetlin Machines with Applications to Sequence Learning and Generation
A large part of any design of a data learning agent is in feature extraction of the underlying data, and how it is computed and represented. The best processes for extracting features for learning information from data typically take advantage of expert knowledge of the underlying data to either expose the most relevant features, reduced noise, and extract the most amount of independent information in the data. For many types of datasets, this might be challenging due to factors such as incoherence, abstractedness, or the sheer amount of noise present in the data. In designing features for Tsetlin machines, one is tasked to booleanize (or binarize) the underlying data, and under the presence of noise, this can be challenging. Furthermore, for notoriously complex high-dimensional data like noisy sequences, graphs, images, signal spectra, and natural language, creating encodings that are also interpretable for human reasoning in any post-hoc process can be difficult due to creating logic AND expressions that both take advantage of the relevant information in the data, but also lead to accurate expressions that can compete with other machine learning models. In this paper, we explore using Hyperdimensional Vector Computing (HV computing, or simply HVC) as an input layer to a novel Tsetlin machine architecture and apply it to learning, classifying, predicting, and generating sequences. Here, we argue that HVC can provide a robust layer of feature extraction due to the many computational advantages. This approach was first introduced in [1] and here, we streamline the approach to focus on sequences while further leveraging other attributes of HCV such as N-Gram sequence encoding and associative memory, while combining with TMs, to create a powerful hybrid methodology while remaining minimalist in memory sizes of the overall model.
MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh Tokenization
Chen, Yiwen, Wang, Yikai, Luo, Yihao, Wang, Zhengyi, Chen, Zilong, Zhu, Jun, Zhang, Chi, Lin, Guosheng
We introduce MeshAnything V2, an autoregressive transformer that generates Artist-Created Meshes (AM) aligned to given shapes. It can be integrated with various 3D asset production pipelines to achieve high-quality, highly controllable AM generation. MeshAnything V2 surpasses previous methods in both efficiency and performance using models of the same size. These improvements are due to our newly proposed mesh tokenization method: Adjacent Mesh Tokenization (AMT). Different from previous methods that represent each face with three vertices, AMT uses a single vertex whenever possible. Compared to previous methods, AMT requires about half the token sequence length to represent the same mesh in average. Furthermore, the token sequences from AMT are more compact and well-structured, fundamentally benefiting AM generation. Our extensive experiments show that AMT significantly improves the efficiency and performance of AM generation. Project Page: https://buaacyw.github.io/meshanything-v2/
On learning spatial sequences with the movement of attention
In this paper we start with a simple question, how is it possible that humans can recognize different movements over skin with only a prior visual experience of them? Or in general, what is the representation of spatial sequences that are invariant to scale, rotation, and translation across different modalities? To answer, we rethink the mathematical representation of spatial sequences, argue against the minimum description length principle, and focus on the movements of attention. We advance the idea that spatial sequences must be represented on different levels of abstraction, this adds redundancy but is necessary for recognition and generalization. To address the open question of how these abstractions are formed we propose two hypotheses: the first invites exploring selectionism learning, instead of finding parameters in some models; the second proposes to find new data structures, not neural network architectures, to efficiently store and operate over redundant features to be further selected. Movements of attention are central to human cognition and lessons should be applied to new better learning algorithms.
Sequence to Sequence (Seq2Seq) models
When learning about time-series models, we might have come across various sequence learning problems such as Stock Market Prediction, Story Telling, and AutoComplete that could be learnt by traditional neural networks like the Recurrent Neural Network (RNN), and Long Short-Term Memory (LSTM). But, will these Artificial Neural Networks (ANNs) work even when the problem gets complicated? Seq2Seq models are a special class of models that make minimal sequence structure assumptions and could be used to solve complex sequence problems. Let's discuss the Seq2Seq models on the following topics. Which of these tasks do you think could be solved by a single neural network?