Country
Contracting Implicit Recurrent Neural Networks: Stable Models with Improved Trainability
Revay, Max, Manchester, Ian R.
Australia Abstract Stability of recurrent models is closely linked with trainability, generalizability and in some applications, safety. Methods that train stable recurrent neural networks, however, do so at a significant cost to expressibility. We propose an implicit model structure that allows for a convex parametriza-tion of stable models using contraction analysis of nonlinear systems. Using these stability conditions we propose a new approach to model initialization and then provide a number of empirical results comparing the performance of our proposed model set to previous stable RNNs and vanilla RNNs. By carefully controlling stability in the model, we observe a significant increase in the speed of training and model performance. Keywords: System Identification, Contraction, Stability, Recurrent Neural Network, V anishing Gradient, Exploding Gradient, Nonlinear Systems, Echo State Network Notation Most of our notation is standard.
Estimation of Spectral Risk Measures
Pandey, Ajay Kumar, A., Prashanth L., Bhat, Sanjay P.
We consider the problem of estimating a spectral risk measure (SRM) from i.i.d. samples, and propose a novel method that is based on numerical integration. We show that our SRM estimate concentrates exponentially, when the underlying distribution has bounded support. Further, we also consider the case when the underlying distribution is either Gaussian or exponential, and derive a concentration bound for our estimation scheme. We validate the theoretical findings on a synthetic setup, and in a vehicular traffic routing application.
Blang: Bayesian declarative modelling of arbitrary data structures
Bouchard-Côté, Alexandre, Chern, Kevin, Cubranic, Davor, Hosseini, Sahand, Hume, Justin, Lepur, Matteo, Ouyang, Zihui, Sgarbi, Giorgio
Consider a Bayesian inference problem where a variable of interest does not take values in a Euclidean space. These "non-standard" data structures are in reality fairly common. They are frequently used in problems involving latent discrete factor models, networks, and domain specific problems such as sequence alignments and reconstructions, pedigrees, and phylogenies. In principle, Bayesian inference should be particularly well-suited in such scenarios, as the Bayesian paradigm provides a principled way to obtain confidence assessment for random variables of any type. However, much of the recent work on making Bayesian analysis more accessible and computationally efficient has focused on inference in Euclidean spaces. In this paper, we introduce Blang, a domain specific language (DSL) and library aimed at bridging this gap. Blang allows users to perform Bayesian analysis on arbitrary data types while using a declarative syntax similar to BUGS. Blang is augmented with intuitive language additions to invent data types of the user's choosing. To perform inference at scale on such arbitrary state spaces, Blang leverages recent advances in parallelizable, non-reversible Markov chain Monte Carlo methods.
Evaluation of Multi-Slice Inputs to Convolutional Neural Networks for Medical Image Segmentation
Vu, Minh H., Grimbergen, Guus, Nyholm, Tufve, Löfstedt, Tommy
-- When using Convolutional Neural Networks (CNNs) for segmentation of organs and lesions in medical images, the conventional approach is to work with inputs and outputs either as single slice (2D) or whole volumes (3D). One common alternative, in this study denoted as pseudo-3D, is to use a stack of adjacent slices as input and produce a prediction for at least the central slice. This approach gives the network the possibility to capture 3D spatial information, with only a minor additional computational cost. In this study, we systematically evaluate the segmentation performance and computational costs of this pseudo-3D approach as a function of the number of input slices, and compare the results to conventional end-to-end 2D and 3D CNNs. The standard pseudo-3D method regards the neighboring slices as multiple input image channels. We additionally evaluate a simple approach where the input stack is a volumetric input that is repeatably convolved in 3D to obtain a 2D feature map. This 2D map is in turn fed into a standard 2D network. We conducted experiments using two different CNN backbone architectures and on five diverse data sets covering different anatomical regions, imaging modalities, and segmentation tasks. We found that while both pseudo-3D methods can process a large number of slices at once and still be computationally much more efficient than fully 3D CNNs, a significant improvement over a regular 2D CNN was only observed for one of the five data sets. An analysis of the structural properties of the segmentation masks revealed no relations to the segmentation performance with respect to the number of input slices. The conclusion is therefore that in the general case, multi-slice inputs appear to not significantly improve segmentation results over using 2D or 3D CNNs. Segmentation of organs and pathologies are common activities for radiologists and routine work for radiation oncologists. Nowadays manual annotation of such regions of interest is aided by various software toolkits for image enhancement, automated contouring, and structure analysis in all fields on image-guided radiotherapy [1, 2, 3]. Over the recent years, deep learning (DL) has emerged as a very powerful concept in the field of medical image analysis.
Applying Deep Learning to Detect Traffic Accidents in Real Time Using Spatiotemporal Sequential Data
Parsa, Amir Bahador, Chauhan, Rishabh Singh, Taghipour, Homa, Derrible, Sybil, Mohammadian, Abolfazl
Accident detection is a vital part of traffic safety . Many road users suffer from traffic accidents, as well as their consequences such as delay, congestion, air pollu tion, and so on . In this study, we utilize two advanced deep learning techniques, Long Short - Term Memory (LSTM) and Gated Recurrent Units (GRU s), to detect traffic accidents in Chicago . These two techniques are selected because they are known to perform we ll with sequential data (i.e., time series). The full dataset consists of 241 accident and 6, 038 non - accident cases selected from Chicago expressway, and it includes traffic spatiotemporal data, weather condition data, and congestion status data . Moreover, b ecause the dataset is imbala nced (i.e., the dataset contains many more non - accident cases t han accident cases), Synthetic Minority Over - sampling Technique (SMOTE) is employed . Overall, the two models perform significantly well, both with an Area Under Curve (AUC) of 0.85. Nonetheless, the GRU model is observed to perform slightly better than LSTM model with respect to detection rate . The performance of both models is similar in terms of false alarm rate.
Direct and indirect reinforcement learning
Guan, Yang, Li, Shengbo Eben, Duan, Jingliang, Li, Jie, Ren, Yangang, Cheng, Bo
Reinforcement learning (RL) algorithms have been successfully applied to a range of challenging sequential decision making and control tasks. In this paper, we classify RL into direct and indirect methods according to how they seek optimal policy of the Markov Decision Process (MDP) problem. The former solves optimal policy by directly maximizing an objective function using gradient descent method, in which the objective function is usually the expectation of accumulative future rewards. The latter indirectly finds the optimal policy by solving the Bellman equation, which is the sufficient and necessary condition from Bellman's principle of optimality. We take vanilla policy gradient and approximate policy iteration to study their internal relationship, and reveal that both direct and indirect methods can be unified in actor-critic architecture and are equivalent if we always choose stationary state distribution of current policy as initial state distribution of MDP. Finally, we classify the current mainstream RL algorithms and compare the differences between other criteria including value-based and policy-based, model-based and model-free.
Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning
Tan, Tian, Xiong, Zhihan, Dwaracherla, Vikranth R.
It is well known that quantifying uncertainty in the action-value estimates is crucial for efficient exploration in reinforcement learning. Ensemble sampling offers a relatively computationally tractable way of doing this using randomized value functions. However, it still requires a huge amount of computational resources for complex problems. In this paper, we present an alternative, computationally efficient way to induce exploration using index sampling. We use an indexed value function to represent uncertainty in our action-value estimates. We first present an algorithm to learn parameterized indexed value function through a distributional version of temporal difference in a tabular setting and prove its regret bound. Then, in a computational point of view, we propose a dual-network architecture, Parameterized Indexed Networks (PINs), comprising one mean network and one uncertainty network to learn the indexed value function. Finally, we show the efficacy of PINs through computational experiments.
Teaching Responsible Data Science: Charting New Pedagogical Territory
Stoyanovich, Julia, Lewis, Armanda
Although numerous ethics courses are available, with many focusing specifically on technology and computer ethics, pedagogical approaches employed in these courses rely exclusively on texts rather than on software development or data analysis. Technical students often consider these courses unimportant and a distraction from the "real" material. To develop instructional materials and methodologies that are thoughtful and engaging, we must strive for balance: between texts and coding, between critique and solution, and between cutting-edge research and practical applicability. Finding such balance is particularly difficult in the nascent field of responsible data science (RDS), where we are only starting to understand how to interface between the intrinsically different methodologies of engineering and social sciences. In this paper we recount a recent experience in developing and teaching an RDS course to graduate and advanced undergraduate students in data science. We then dive into an area that is critically important to RDS -- transparency and interpretability of machine-assisted decision-making, and tie this area to the needs of emerging RDS curricula. Recounting our own experience, and leveraging literature on pedagogical methods in data science and beyond, we propose the notion of an "object-to-interpret-with". We link this notion to "nutritional labels" -- a family of interpretability tools that are gaining popularity in RDS research and practice. With this work we aim to contribute to the nascent area of RDS education, and to inspire others in the community to come together to develop a deeper theoretical understanding of the pedagogical needs of RDS, and contribute concrete educational materials and methodologies that others can use. All course materials are publicly available at https://dataresponsibly.github.io/courses.
Bringing Belief Base Change into Dynamic Epistemic Logic
AGM's belief revision is one of the main paradigms in the study of belief change operations. In this context, belief bases (prioritised bases) have been primarily used to specify the agent's belief state. While the connection of iterated AGM-like operations and their encoding in dynamic epistemic logics have been studied before, few works considered how well-known postulates from iterated belief revision theory can be characterised by means of belief bases and their counterpart in dynamic epistemic logic. Particularly, it has been shown that some postulates can be characterised through transformations in priority graphs, while others may not be represented that way. This work investigates changes in the semantics of Dynamic Preference Logic that give rise to an appropriate syntactic representation for its models that allow us to represent and reason about iterated belief base change in this logic.
Tag-less Back-Translation
Abdulmumin, Idris, Galadanci, Bashir Shehu, Garba, Aliyu
An effective method to generate a large number of parallel sentences for training improved neural machine translation (NMT) systems is the use of back-translations of the target-side monolingual data. Tagging, or using gates, has been used to enable translation models to distinguish between synthetic and natural data. This improves standard back-translation and also enables the use of iterative back-translation on language pairs that underperformed using standard back-translation. This work presents a simplified approach of differentiating between the two data using pretraining and finetuning. The approach - tag-less back-translation - trains the model on the synthetic data and finetunes it on the natural data. Preliminary experiments have shown the approach to continuously outperform the tagging approach on low resource English-Vietnamese neural machine translation. While the need for tagging (noising) the dataset has been removed, the approach outperformed the tagged back-translation approach by an average of 0.4 BLEU.