Goto

Collaborating Authors

 highway network




Stable Weight Updating: A Key to Reliable PDE Solutions Using Deep Learning

Noorizadegan, A., Cavoretto, R., Young, D. L., Chen, C. S.

arXiv.org Artificial Intelligence

Background: Deep learning techniques, particularly neural networks, have revolutionized computational physics, offering powerful tools for solving complex partial differential equations (PDEs). However, ensuring stability and efficiency remains a challenge, especially in scenarios involving nonlinear and time-dependent equations. Methodology: This paper introduces novel residual-based architectures, namely the Simple Highway Network and the Squared Residual Network, designed to enhance stability and accuracy in physics-informed neural networks (PINNs). These architectures augment traditional neural networks by incorporating residual connections, which facilitate smoother weight updates and improve backpropagation efficiency. Results: Through extensive numerical experiments across various examples including linear and nonlinear, time-dependent and independent PDEs we demonstrate the efficacy of the proposed architectures. The Squared Residual Network, in particular, exhibits robust performance, achieving enhanced stability and accuracy compared to conventional neural networks. These findings underscore the potential of residual-based architectures in advancing deep learning for PDEs and computational physics applications.


Highway Networks for Improved Surface Reconstruction: The Role of Residuals and Weight Updates

Noorizadegan, A., Hon, Y. C., Young, D. L., Chen, C. S.

arXiv.org Artificial Intelligence

Surface reconstruction from point clouds is a fundamental challenge in computer graphics and medical imaging. In this paper, we explore the application of advanced neural network architectures for the accurate and efficient reconstruction of surfaces from data points. We introduce a novel variant of the Highway network (Hw) called Square-Highway (SqrHw) within the context of multilayer perceptrons and investigate its performance alongside plain neural networks and a simplified Hw in various numerical examples. These examples include the reconstruction of simple and complex surfaces, such as spheres, human hands, and intricate models like the Stanford Bunny. We analyze the impact of factors such as the number of hidden layers, interior and exterior points, and data distribution on surface reconstruction quality. Our results show that the proposed SqrHw architecture outperforms other neural network configurations, achieving faster convergence and higher-quality surface reconstructions. Additionally, we demonstrate the SqrHw's ability to predict surfaces over missing data, a valuable feature for challenging applications like medical imaging. Furthermore, our study delves into further details, demonstrating that the proposed method based on highway networks yields more stable weight norms and backpropagation gradients compared to the Plain Network architecture. This research not only advances the field of computer graphics but also holds utility for other purposes such as function interpolation and physics-informed neural networks, which integrate multilayer perceptrons into their algorithms.


Highway Value Iteration Networks

Wang, Yuhui, Li, Weida, Faccio, Francesco, Wu, Qingyuan, Schmidhuber, Jürgen

arXiv.org Artificial Intelligence

Value iteration networks (VINs) enable end-to-end learning for planning tasks by employing a differentiable "planning module" that approximates the value iteration algorithm. However, long-term planning remains a challenge because training very deep VINs is difficult. To address this problem, we embed highway value iteration -- a recent algorithm designed to facilitate long-term credit assignment -- into the structure of VINs. This improvement augments the "planning module" of the VIN with three additional components: 1) an "aggregate gate," which constructs skip connections to improve information flow across many layers; 2) an "exploration module," crafted to increase the diversity of information and gradient flow in spatial dimensions; 3) a "filter gate" designed to ensure safe exploration. The resulting novel highway VIN can be trained effectively with hundreds of layers using standard backpropagation. In long-term planning tasks requiring hundreds of planning steps, deep highway VINs outperform both traditional VINs and several advanced, very deep NNs.


Training Very Deep Networks

Neural Information Processing Systems

Theoretical and empirical evidence indicates that the depth of neural networks is crucial for their success. However, training becomes more difficult as depth increases, and training of very deep networks remains an open problem. Here we introduce a new architecture designed to overcome this. Our so-called highway networks allow unimpeded information flow across many layers on information highways. They are inspired by Long Short-Term Memory recurrent networks and use adaptive gating units to regulate the information flow. Even with hundreds of layers, highway networks can be trained directly through simple gradient descent. This enables the study of extremely deep and efficient architectures.


Distil the informative essence of loop detector data set: Is network-level traffic forecasting hungry for more data?

Li, Guopeng, Knoop, Victor L., C., J. W., Lint, van

arXiv.org Artificial Intelligence

Network-level traffic condition forecasting has been intensively studied for decades. Although prediction accuracy has been continuously improved with emerging deep learning models and ever-expanding traffic data, traffic forecasting still faces many challenges in practice. These challenges include the robustness of data-driven models, the inherent unpredictability of traffic dynamics, and whether further improvement of traffic forecasting requires more sensor data. In this paper, we focus on this latter question and particularly on data from loop detectors. To answer this, we propose an uncertainty-aware traffic forecasting framework to explore how many samples of loop data are truly effective for training forecasting models. Firstly, the model design combines traffic flow theory with graph neural networks, ensuring the robustness of prediction and uncertainty quantification. Secondly, evidential learning is employed to quantify different sources of uncertainty in a single pass. The estimated uncertainty is used to "distil" the essence of the dataset that sufficiently covers the information content. Results from a case study of a highway network around Amsterdam show that, from 2018 to 2021, more than 80\% of the data during daytime can be removed. The remaining 20\% samples have equal prediction power for training models. This result suggests that indeed large traffic datasets can be subdivided into significantly smaller but equally informative datasets. From these findings, we conclude that the proposed methodology proves valuable in evaluating large traffic datasets' true information content. Further extensions, such as extracting smaller, spatially non-redundant datasets, are possible with this method.


Multi-graph Spatio-temporal Graph Convolutional Network for Traffic Flow Prediction

Ding, Weilong, Zhang, Tianpu, Wang, Jianwu, Zhao, Zhuofeng

arXiv.org Artificial Intelligence

Inter-city highway transportation is significant for urban life. As one of the key functions in intelligent transportation system (ITS), traffic evaluation always plays significant role nowadays, and daily traffic flow prediction still faces challenges at network-wide toll stations. On the one hand, the data imbalance in practice among various locations deteriorates the performance of prediction. On the other hand, complex correlative spatio-temporal factors cannot be comprehensively employed in long-term duration. In this paper, a prediction method is proposed for daily traffic flow in highway domain through spatio-temporal deep learning. In our method, data normalization strategy is used to deal with data imbalance, due to long-tail distribution of traffic flow at network-wide toll stations. And then, based on graph convolutional network, we construct networks in distinct semantics to capture spatio-temporal features. Beside that, meteorology and calendar features are used by our model in the full connection stage to extra external characteristics of traffic flow. By extensive experiments and case studies in one Chinese provincial highway, our method shows clear improvement in predictive accuracy than baselines and practical benefits in business.


Machine learning for option pricing: an empirical investigation of network architectures

Van Mieghem, Laurens, Papapantoleon, Antonis, Papazoglou-Hennig, Jonas

arXiv.org Artificial Intelligence

The majority of articles in this literature considers a (plain) feed forward neural network architecture in order to connect the neurons used for learning the function mapping inputs to outputs. In this article, motivated by methods in image classification and recent advances in machine learning methods for PDEs, we investigate empirically whether and how the choice of network architecture affects the accuracy and training time of a machine learning algorithm. We find that for option pricing problems, where we focus on the Black-Scholes and the Heston model, the generalized highway network architecture outperforms all other variants, when considering the mean squared error and the training time as criteria. Moreover, for the computation of the implied volatility, after a necessary transformation, a variant of the DGM architecture outperforms all other variants, when considering again the mean squared error and the training time as criteria. Machine learning has taken the field of mathematical finance by a storm, and there are numerous applications of machine learning in finance by now. Concrete applications include, for example, the computation of option prices and implied volatilities as well as the calibration of financial models, see e.g. Buehler, Gonon, Teichmann, and Wood [10], portfolio selection and optimization, see e.g. A comprehensive overview of applications of machine learning in mathematical finance appears in the recent volume of Capponi and Lehalle [11], while an exhaustive overview focusing on pricing and hedging appears in Ruf and Wang [35]. We are interested in the computation of option prices and implied volatilities using machine learning methods, and thus, implicitly, in model calibration as well. More specifically, we consider the supervised learning problem of learning the price of an option or the implied volatility given appropriate input data (model parameters) and corresponding output data (option prices or implied volatilities). The majority of articles in this literature, see e.g. Cuchiero et al. [12], Horvath et al. [28], Liu et al. [30], consider a (plain) feed forward neural network architecture in order to connect the neurons used for learning the function mapping inputs to outputs. In this article, motivated by methods in image classification, see e.g. Option pricing, implied volatility, supervised learning, residual networks, highway networks, DGM networks. AP gratefully acknowledges the financial support from the Hellenic Foundation for Research and Innovation Grant No. HFRI-FM17-2152. JPH gratefully acknowledges the hospitality at the Financial Engineering & Mathematical Optimization Lab of the NTUA where this project was initiated. More specifically, next to the classical feed forward neural network or multilayer perceptron (MLP) architecture, we consider residual neural networks, highway networks and generalized highway networks.


From Audio to Symbolic Encoding

Yuan, Shenli, Kong, Lingjie, Guo, Jiushuang

arXiv.org Artificial Intelligence

Automatic music transcription (AMT) aims to convert raw audio to symbolic music representation. As a fundamental problem of music information retrieval (MIR), AMT is considered a difficult task even for trained human experts due to overlap of multiple harmonics in the acoustic signal. On the other hand, speech recognition, as one of the most popular tasks in natural language processing, aims to translate human spoken language to texts. Based on the similar nature of AMT and speech recognition (as they both deal with tasks of translating audio signal to symbolic encoding), this paper investigated whether a generic neural network architecture could possibly work on both tasks. In this paper, we introduced our new neural network architecture built on top of the current state-of-the-art Onsets and Frames, and compared the performances of its multiple variations on AMT task. We also tested our architecture with the task of speech recognition. For AMT, our models were able to produce better results compared to the model trained using the state-of-art architecture; however, although similar architecture was able to be trained on the speech recognition task, it did not generate very ideal result compared to other task-specific models.