convolutional lstm
Attention in Convolutional LSTM for Gesture Recognition
Convolutional long short-term memory (LSTM) networks have been widely used for action/gesture recognition, and different attention mechanisms have also been embedded into the LSTM or the convolutional LSTM (ConvLSTM) networks. Based on the previous gesture recognition architectures which combine the three-dimensional convolution neural network (3DCNN) and ConvLSTM, this paper explores the effects of attention mechanism in ConvLSTM. Several variants of ConvLSTM are evaluated: (a) Removing the convolutional structures of the three gates in ConvLSTM, (b) Applying the attention mechanism on the input of ConvLSTM, (c) Reconstructing the input and (d) output gates respectively with the modified channel-wise attention mechanism. The evaluation results demonstrate that the spatial convolutions in the three gates scarcely contribute to the spatiotemporal feature fusion, and the attention mechanisms embedded into the input and output gates cannot improve the feature fusion. In other words, ConvLSTM mainly contributes to the temporal fusion along with the recurrent steps to learn the long-term spatiotemporal features, when taking as input the spatial or spatiotemporal features. On this basis, a new variant of LSTM is derived, in which the convolutional structures are only embedded into the input-to-state transition of LSTM. The code of the LSTM variants is publicly available.
Reviews: PredRNN: Recurrent Neural Networks for Predictive Learning using Spatiotemporal LSTMs
Here, the authors focus on the maximum likelihood output, but it would be helpful for comparison with prior work to also report the likelihood. Additionally, the computational complexity is mentioned as an advantage of this model, but no detailed analysis or comparison is performed so its hard to know how this compares computational complexity with prior work. Minor notational suggestion: It might be easier for the reader to follow if you use M instead of C for the cell state in equation 3 so that the connection with equation 4 is clearer.
Reviews: Attention in Convolutional LSTM for Gesture Recognition
SUMMARY: This paper proposes a pipeline combining Res3D-3DCNN, convLSTM, MobileNet-CNN hybrid architecture for performing gesture recognition. In particular, it explores the integration of pooling and neural attention mechanisms in ConvLSTM cells. Four convLSTM variants are compared, which place pooling and attention mechanisms at different locations inside the cell. The pooling leads to a somewhat novel LSTM/convLSTM hybrid architecture. Finally, an attention-inspired gating mechanism is proposed, with some differences to the formulation in [5].
Grid Frequency Forecasting in University Campuses using Convolutional LSTM
The modern power grid is facing increasing complexities, primarily stemming from the integration of renewable energy sources and evolving consumption patterns. This paper introduces an innovative methodology that harnesses Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks to establish robust time series forecasting models for grid frequency. These models effectively capture the spatiotemporal intricacies inherent in grid frequency data, significantly enhancing prediction accuracy and bolstering power grid reliability. The research explores the potential and development of individualized Convolutional LSTM (ConvLSTM) models for buildings within a university campus, enabling them to be independently trained and evaluated for each building. Individual ConvLSTM models are trained on power consumption data for each campus building and forecast the grid frequency based on historical trends. The results convincingly demonstrate the superiority of the proposed models over traditional forecasting techniques, as evidenced by performance metrics such as Mean Square Error (MSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). Additionally, an Ensemble Model is formulated to aggregate insights from the building-specific models, delivering comprehensive forecasts for the entire campus. This approach ensures the privacy and security of power consumption data specific to each building.
Improving Knot Prediction in Wood Logs with Longitudinal Feature Propagation
Khazem, Salim, Fix, Jeremy, Pradalier, Cédric
The quality of a wood log in the wood industry depends heavily on the presence of both outer and inner defects, including inner knots that are a result of the growth of tree branches. Today, locating the inner knots require the use of expensive equipment such as X-ray scanners. In this paper, we address the task of predicting the location of inner defects from the outer shape of the logs. The dataset is built by extracting both the contours and the knots with X-ray measurements. We propose to solve this binary segmentation task by leveraging convolutional recurrent neural networks. Once the neural network is trained, inference can be performed from the outer shape measured with cheap devices such as laser profilers. We demonstrate the effectiveness of our approach on fir and spruce tree species and perform ablation on the recurrence to demonstrate its importance.
- Europe > France (0.05)
- Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
Automatic Remaining Useful Life Estimation Framework with Embedded Convolutional LSTM as the Backbone
Zhou, Yexu, Gao, Yuting, Huang, Yiran, Hefenbrock, Michael, Riedel, Till, Beigl, Michael
An essential task in predictive maintenance is the prediction of the Remaining Useful Life (RUL) through the analysis of multivariate time series. Using the sliding window method, Convolutional Neural Network (CNN) and conventional Recurrent Neural Network (RNN) approaches have produced impressive results on this matter, due to their ability to learn optimized features. However, sequence information is only partially modeled by CNN approaches. Due to the flatten mechanism in conventional RNNs, like Long Short Term Memories (LSTM), the temporal information within the window is not fully preserved. To exploit the multi-level temporal information, many approaches are proposed which combine CNN and RNN models. In this work, we propose a new LSTM variant called embedded convolutional LSTM (ECLSTM). In ECLSTM a group of different 1D convolutions is embedded into the LSTM structure. Through this, the temporal information is preserved between and within windows. Since the hyper-parameters of models require careful tuning, we also propose an automated prediction framework based on the Bayesian optimization with hyperband optimizer, which allows for efficient optimization of the network architecture. Finally, we show the superiority of our proposed ECLSTM approach over the state-of-the-art approaches on several widely used benchmark data sets for RUL Estimation.
- North America > United States (0.14)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
- Asia > China > Shandong Province > Qingdao (0.04)
Attention in Convolutional LSTM for Gesture Recognition
Zhang, Liang, Zhu, Guangming, Mei, Lin, Shen, Peiyi, Shah, Syed Afaq Ali, Bennamoun, Mohammed
Convolutional long short-term memory (LSTM) networks have been widely used for action/gesture recognition, and different attention mechanisms have also been embedded into the LSTM or the convolutional LSTM (ConvLSTM) networks. Based on the previous gesture recognition architectures which combine the three-dimensional convolution neural network (3DCNN) and ConvLSTM, this paper explores the effects of attention mechanism in ConvLSTM. Several variants of ConvLSTM are evaluated: (a) Removing the convolutional structures of the three gates in ConvLSTM, (b) Applying the attention mechanism on the input of ConvLSTM, (c) Reconstructing the input and (d) output gates respectively with the modified channel-wise attention mechanism. The evaluation results demonstrate that the spatial convolutions in the three gates scarcely contribute to the spatiotemporal feature fusion, and the attention mechanisms embedded into the input and output gates cannot improve the feature fusion. In other words, ConvLSTM mainly contributes to the temporal fusion along with the recurrent steps to learn the long-term spatiotemporal features, when taking as input the spatial or spatiotemporal features.
Inception-inspired LSTM for Next-frame Video Prediction
Hosseini, Matin, Maida, Anthony S., Hosseini, Majid, Raju, Gottumukkala
The problem of video frame prediction has received much interest due to its relevance to many computer vision applications such as autonomous vehicles or robotics. Supervised methods for video frame prediction rely on labeled data, which may not always be available. In this paper, we provide a novel unsupervised deep-learning method called Inception-based LSTM for video frame prediction. The general idea of inception networks is to implement wider networks instead of deeper networks. This network design was shown to improve the performance of image classification. The proposed method is evaluated on both Inception-v1 and Inception-v2 structures. The proposed Inception LSTM methods are compared with convolutional LSTM when applied using PredNet predictive coding framework for both the KITTI and KTH data sets. We observed that the Inception based LSTM outperforms the convolutional LSTM. Also, Inception LSTM has better prediction performance compared to Inception v2 LSTM. However, Inception v2 LSTM has a lower computational cost compared to Inception LSTM.
- North America > United States > Louisiana (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
A CNN-RNN Architecture for Multi-Label Weather Recognition
Zhao, Bin, Li, Xuelong, Lu, Xiaoqiang, Wang, Zhigang
Weather Recognition plays an important role in our daily lives and many computer vision applications. However, recognizing the weather conditions from a single image remains challenging and has not been studied thoroughly. Generally, most previous works treat weather recognition as a single-label classification task, namely, determining whether an image belongs to a specific weather class or not. This treatment is not always appropriate, since more than one weather conditions may appear simultaneously in a single image. To address this problem, we make the first attempt to view weather recognition as a multi-label classification task, i.e., assigning an image more than one labels according to the displayed weather conditions. Specifically, a CNN-RNN based multi-label classification approach is proposed in this paper. The convolutional neural network (CNN) is extended with a channel-wise attention model to extract the most correlated visual features. The Recurrent Neural Network (RNN) further processes the features and excavates the dependencies among weather classes. Finally, the weather labels are predicted step by step. Besides, we construct two datasets for the weather recognition task and explore the relationships among different weather conditions. Experimental results demonstrate the superiority and effectiveness of the proposed approach. The new constructed datasets will be available at https://github.com/wzgwzg/Multi-Label-Weather-Recognition.
- Asia > China > Shaanxi Province > Xi'an (0.04)
- Oceania > Australia (0.04)
- North America > United States > New Jersey > Atlantic County > Atlantic City (0.04)
- North America > United States > California > Santa Clara County > Mountain View (0.04)
Convolutional LSTMs for Cloud-Robust Segmentation of Remote Sensing Imagery
Dynamic spatiotemporal processes on the Earth can be observed by an increasing number of optical Earth observation satellites that measure spectral reflectance at multiple spectral bands in regular intervals. Clouds partially covering the surface is an omnipresent challenge for the majority of remote sensing approaches that are not robust regarding cloud coverage. In these approaches, clouds are typically handled by cherry-picking cloud-free observations or by pre-classification of cloudy pixels and subsequent masking. In this work, we demonstrate the robustness of a straightforward convolutional long short-term memory network for vegetation classification using all available cloudy and non-cloudy satellite observations. We visualize the internal gate activations within the recurrent cells and find that, in some cells, modulation and input gates close on cloudy pixels. This indicates that the network has internalized a cloud-filtering mechanism without being specifically trained on cloud labels. The robustness regarding clouds is further demonstrated by experiments on sequences with varying degrees of cloud coverage where our network achieved similar accuracies on all cloudy and non-cloudy datasets. Overall, our results question the necessity of sophisticated pre-processing pipelines if robust classification methods are utilized.
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
- Asia > Uzbekistan (0.04)
- Asia > Central Asia (0.04)