Goto

Collaborating Authors

 gru


Preventing Gradient Explosions in Gated Recurrent Units

Neural Information Processing Systems

A gated recurrent unit (GRU) is a successful recurrent neural network architecture for time-series data. The GRU is typically trained using a gradient-based method, which is subject to the exploding gradient problem in which the gradient increases significantly. This problem is caused by an abrupt change in the dynamics of the GRU due to a small variation in the parameters. In this paper, we find a condition under which the dynamics of the GRU changes drastically and propose a learning method to address the exploding gradient problem. Our method constrains the dynamics of the GRU so that it does not drastically change. We evaluated our method in experiments on language modeling and polyphonic music modeling. Our experiments showed that our method can prevent the exploding gradient problem and improve modeling accuracy.




Preventing Gradient Explosions in Gated Recurrent Units

Neural Information Processing Systems

A gated recurrent unit (GRU) is a successful recurrent neural network architecture for time-series data. The GRU is typically trained using a gradient-based method, which is subject to the exploding gradient problem in which the gradient increases significantly. This problem is caused by an abrupt change in the dynamics of the GRU due to a small variation in the parameters. In this paper, we find a condition under which the dynamics of the GRU changes drastically and propose a learning method to address the exploding gradient problem. Our method constrains the dynamics of the GRU so that it does not drastically change. We evaluated our method in experiments on language modeling and polyphonic music modeling. Our experiments showed that our method can prevent the exploding gradient problem and improve modeling accuracy.




Applying Time Series Deep Learning Models to Forecast the Growth of Perennial Ryegrass in Ireland

Onibonoje, Oluwadurotimi, Ngo, Vuong M., McCarre, Andrew, Ruelle, Elodie, O-Briend, Bernadette, Roantree, Mark

arXiv.org Artificial Intelligence

Grasslands, constituting the world's second-largest terrestrial carbon sink, play a crucial role in biodiversity and the regulation of the carbon cycle. Currently, the Irish dairy sector, a significant economic contributor, grapples with challenges related to profitability and sustainability. Presently, grass growth forecasting relies on impractical mechanistic models. In response, we propose deep learning models tailored for univariate datasets, presenting cost-effective alternatives. Notably, a temporal convolutional network designed for forecasting Perennial Ryegrass growth in Cork exhibits high performance, leveraging historical grass height data with RMSE of 2.74 and MAE of 3.46. V alidation across a comprehensive dataset spanning 1,757 weeks over 34 years provides insights into optimal model configurations. This study enhances our understanding of model behavior, thereby improving reliability in grass growth forecasting and contributing to the advancement of sustainable dairy farming practices. Introduction Grasslands stand as the world's largest terrestrial ecosystem, serving as a pivotal source of sustenance for livestock. Tackling the escalating demand for meat and dairy products in an environmentally sustainable manner presents a formidable challenge. Encompassing 31.5% of the Earth's landmass (Latham et al., 2014), grasslands rank among the most prevalent and widespread vegetation types.


GroupSHAP-Guided Integration of Financial News Keywords and Technical Indicators for Stock Price Prediction

Kim, Minjoo, Kim, Jinwoong, Park, Sangjin

arXiv.org Artificial Intelligence

Recent advances in finance-specific language models such as FinBERT have enabled the quantification of public sentiment into index-based measures, yet compressing diverse linguistic signals into single metrics overlooks contextual nuances and limits interpretability. To address this limitation, explainable AI techniques, particularly SHAP (SHapley Additive Explanations), have been employed to identify influential features. However, SHAP's computational cost grows exponentially with input features, making it impractical for large-scale text-based financial data. This study introduces a GRU-based forecasting framework enhanced with GroupSHAP, which quantifies contributions of semantically related keyword groups rather than individual tokens, substantially reducing computational burden while preserving interpretability. We employed FinBERT to embed news articles from 2015 to 2024, clustered them into coherent semantic groups, and applied GroupSHAP to measure each group's contribution to stock price movements. The resulting group-level SHAP variables across multiple topics were used as input features for the prediction model. Empirical results from one-day-ahead forecasting of the S&P 500 index throughout 2024 demonstrate that our approach achieves a 32.2% reduction in MAE and a 40.5% reduction in RMSE compared with benchmark models without the GroupSHAP mechanism. This research presents the first application of GroupSHAP in news-driven financial forecasting, showing that grouped sentiment representations simultaneously enhance interpretability and predictive performance.


Hybrid Vision Servoing with Depp Alignment and GRU-Based Occlusion Recovery

Lee, Jee Won, Lim, Hansol, Yang, Sooyeun, Choi, Jongseong Brad

arXiv.org Artificial Intelligence

Traditional robotic controllers have long relied on proprioceptive sensors such as joint encoders, inertial measurement units, and force - torque sensors to estimate position and motion, but these often suffer from drift, calibration errors, and limited environmental awareness [1]. Image - based visual servoing has therefore been widely adopted for high - precision robotic assembly, aerial vehicle stabilization, and minimally invasive surgery, where direct visual feedback can compensate for model uncertainties an d encoder inaccuracies [2] [3]. In these closed - loop systems, perception must deliver sub - pixel localization accuracy at control rates above 30 Hz while tolerating partial or full occlusions, illumination shifts, and motion blur to maintain loop stability and precision [4]. Even millimeter - level tracking errors can accumulate into significant actuation drift, undermining safety and performance into sub - millimeter surgical targeting or centimeter - scale drone landing [5] [6]. Early IBVS methods emerged in the early 1990s to simplify robot control by directly mapping image features to velocity commands, establishing the foundation for image - space loop closure [2]. Handcrafted detectors such as SIFT [7], which identifies scale - invariant keypoints, SURF [8], which accelerates detection using integral images, and ORB [9], which offers an efficient binary alternative, were paired with RANSASC [10] to filter out mismatches. However, these sparse approaches struggled when keypoints wer e lost to occlusion or blur. To achieve denser alignment, the Lucas - Kanade algorithm was introduced to iteratively minimize photometric error over image patches and enable smooth sub - pixel registration [11].


Accuracy, Memory Efficiency and Generalization: A Comparative Study on Liquid Neural Networks and Recurrent Neural Networks

Zong, Shilong, Bierly, Alex, Boker, Almuatazbellah, Eldardiry, Hoda

arXiv.org Artificial Intelligence

This review aims to conduct a comparative analysis of liquid neural networks (LNNs) and traditional recurrent neural networks (RNNs) and their variants, such as long short-term memory networks (LSTMs) and gated recurrent units (GRUs). The core dimensions of the analysis include model accuracy, memory efficiency, and generalization ability. By systematically reviewing existing research, this paper explores the basic principles, mathematical models, key characteristics, and inherent challenges of these neural network architectures in processing sequential data. Research findings reveal that LNN, as an emerging, biologically inspired, continuous-time dynamic neural network, demonstrates significant potential in handling noisy, non-stationary data, and achieving out-of-distribution (OOD) generalization. Additionally, some LNN variants outperform traditional RNN in terms of parameter efficiency and computational speed. However, RNN remains a cornerstone in sequence modeling due to its mature ecosystem and successful applications across various tasks. This review identifies the commonalities and differences between LNNs and RNNs, summarizes their respective shortcomings and challenges, and points out valuable directions for future research, particularly emphasizing the importance of improving the scalability of LNNs to promote their application in broader and more complex scenarios.