Dasgupta, Sakyasingha
Standing on the Shoulders of Giants: Hardware and Neural Architecture Co-Search with Hot Start
Jiang, Weiwen, Yang, Lei, Dasgupta, Sakyasingha, Hu, Jingtong, Shi, Yiyu
Hardware and neural architecture co-search that automatically generates Artificial Intelligence (AI) solutions from a given dataset is promising to promote AI democratization; however, the amount of time that is required by current co-search frameworks is in the order of hundreds of GPU hours for one target hardware. This inhibits the use of such frameworks on commodity hardware. The root cause of the low efficiency in existing co-search frameworks is the fact that they start from a "cold" state (i.e., search from scratch). In this paper, we propose a novel framework, namely HotNAS, that starts from a "hot" state based on a set of existing pre-trained models (a.k.a. model zoo) to avoid lengthy training time. As such, the search time can be reduced from 200 GPU hours to less than 3 GPU hours. In HotNAS, in addition to hardware design space and neural architecture search space, we further integrate a compression space to conduct model compressing during the co-search, which creates new opportunities to reduce latency but also brings challenges. One of the key challenges is that all of the above search spaces are coupled with each other, e.g., compression may not work without hardware design support. To tackle this issue, HotNAS builds a chain of tools to design hardware to support compression, based on which a global optimizer is developed to automatically co-search all the involved search spaces. Experiments on ImageNet dataset and Xilinx FPGA show that, within the timing constraint of 5ms, neural architectures generated by HotNAS can achieve up to 5.79% Top-1 and 3.97% Top-5 accuracy gain, compared with the existing ones.
Continual Learning via Online Leverage Score Sampling
Teng, Dan, Dasgupta, Sakyasingha
In order to mimic the human ability of continual acquisition and transfer of knowledge across various tasks, a learning system needs the capability for continual learning, effectively utilizing the previously acquired skills. As such, the key challenge is to transfer and generalize the knowledge learned from one task to other tasks, avoiding forgetting and interference of previous knowledge and improving the overall performance. In this paper, within the continual learning paradigm, we introduce a method that effectively forgets the less useful data samples continuously and allows beneficial information to be kept for training of the subsequent tasks, in an online manner. The method uses statistical leverage score information to measure the importance of the data samples in every task and adopts frequent directions approach to enable a continual or life-long learning property. This effectively maintains a constant training size across all tasks. We first provide mathematical intuition for the method and then demonstrate its effectiveness in avoiding catastrophic forgetting and computational efficiency on continual learning of classification tasks when compared with the existing state-of-the-art techniques.
Model-based Deep Reinforcement Learning for Dynamic Portfolio Optimization
Yu, Pengqian, Lee, Joon Sern, Kulyatin, Ilya, Shi, Zekun, Dasgupta, Sakyasingha
Dynamic portfolio optimization is the process of sequentially allocating wealth to a collection of assets in some consecutive trading periods, based on investors' return-risk profile. Automating this process with machine learning remains a challenging problem. Here, we design a deep reinforcement learning (RL) architecture with an autonomous trading agent such that, investment decisions and actions are made periodically, based on a global objective, with autonomy. In particular, without relying on a purely model-free RL agent, we train our trading agent using a novel RL architecture consisting of an infused prediction module (IPM), a generative adversarial data augmentation module (DAM) and a behavior cloning module (BCM). Our model-based approach works with both on-policy or off-policy RL algorithms. We further design the back-testing and execution engine which interact with the RL agent in real time. Using historical {\em real} financial market data, we simulate trading with practical constraints, and demonstrate that our proposed model is robust, profitable and risk-sensitive, as compared to baseline trading strategies and model-free RL agents from prior work.
Internal Model from Observations for Reward Shaping
Kimura, Daiki, Chaudhury, Subhajit, Tachibana, Ryuki, Dasgupta, Sakyasingha
Reinforcement learning methods require careful design involving a reward function to obtain the desired action policy for a given task. In the absence of hand-crafted reward functions, prior work on the topic has proposed several methods for reward estimation by using expert state trajectories and action pairs. However, there are cases where complete or good action information cannot be obtained from expert demonstrations. We propose a novel reinforcement learning method in which the agent learns an internal model of observation on the basis of expert-demonstrated state trajectories to estimate rewards without completely learning the dynamics of the external environment from state-action pairs. The internal model is obtained in the form of a predictive model for the given expert state distribution. During reinforcement learning, the agent predicts the reward as a function of the difference between the actual state and the state predicted by the internal model. We conducted multiple experiments in environments of varying complexity, including the Super Mario Bros and Flappy Bird games. We show our method successfully trains good policies directly from expert game-play videos.
Object Detection using Domain Randomization and Generative Adversarial Refinement of Synthetic Images
Nogue, Fernando Camaro, Huie, Andrew, Dasgupta, Sakyasingha
In this work, we present an application of domain randomization and generative adversarial networks (GAN) to train a near real-time object detector for industrial electric parts, entirely in a simulated environment. Large scale availability of labelled real world data is typically rare and difficult to obtain in many industrial settings. As such here, only a few hundred of unlabelled real images are used to train a Cyclic-GAN network, in combination with various degree of domain randomization procedures. We demonstrate that this enables robust translation of synthetic images to the real world domain. We show that a combination of the original synthetic (simulation) and GAN translated images, when used for training a Mask-RCNN object detection network achieves greater than 0.95 mean average precision in detecting and classifying a collection of industrial electric parts. We evaluate the performance across different combinations of training data.
Dynamic Boltzmann Machines for Second Order Moments and Generalized Gaussian Distributions
Raymond, Rudy, Osogami, Takayuki, Dasgupta, Sakyasingha
Dynamic Boltzmann Machine (DyBM) has been shown highly efficient to predict time-series data. Gaussian DyBM is a DyBM that assumes the predicted data is generated by a Gaussian distribution whose first-order moment (mean) dynamically changes over time but its second-order moment (variance) is fixed. However, in many financial applications, the assumption is quite limiting in two aspects. First, even when the data follows a Gaussian distribution, its variance may change over time. Such variance is also related to important temporal economic indicators such as the market volatility. Second, financial time-series data often requires learning datasets generated by the generalized Gaussian distribution with an additional shape parameter that is important to approximate heavy-tailed distributions. Addressing those aspects, we show how to extend DyBM that results in significant performance improvement in predicting financial time-series data.
Nonlinear Dynamic Boltzmann Machines for Time-Series Prediction
Dasgupta, Sakyasingha (IBM Research - Tokyo) | Osogami, Takayuki (IBM Research - Tokyo)
The dynamic Boltzmann machine (DyBM) has been proposed as a stochastic generative model of multi-dimensional time series, with an exact, learning rule that maximizes the log-likelihood of a given time series. The DyBM, however, is defined only for binary valued data, without any nonlinear hidden units. Here, in our first contribution, we extend the DyBM to deal with real valued data. We present a formulation called Gaussian DyBM, that can be seen as an extension of a vector autoregressive (VAR) model. This uses, in addition to standard (explanatory) variables, components that captures long term dependencies in the time series. In our second contribution, we extend the Gaussian DyBM model with a recurrent neural network (RNN) that controls the bias input to the DyBM units. We derive a stochastic gradient update rule such that, the output weights from the RNN can also be trained online along with other DyBM parameters. Furthermore, this acts as nonlinear hidden layer extending the capacity of DyBM and allows it to model nonlinear components in a given time-series. Numerical experiments with synthetic datasets show that the RNN-Gaussian DyBM improves predictive accuracy upon standard VAR by up to 35%. On real multi-dimensional time-series prediction, consisting of high nonlinearity and non-stationarity, we demonstrate that this nonlinear DyBM model achieves significant improvement upon state of the art baseline methods like VAR and long short-term memory (LSTM) networks at a reduced computational cost.
Regularized Dynamic Boltzmann Machine with Delay Pruning for Unsupervised Learning of Temporal Sequences
Dasgupta, Sakyasingha, Yoshizumi, Takayuki, Osogami, Takayuki
We introduce Delay Pruning, a simple yet powerful technique to regularize dynamic Boltzmann machines (DyBM). The recently introduced DyBM provides a particularly structured Boltzmann machine, as a generative model of a multi-dimensional time-series. This Boltzmann machine can have infinitely many layers of units but allows exact inference and learning based on its biologically motivated structure. DyBM uses the idea of conduction delays in the form of fixed length first-in first-out (FIFO) queues, with a neuron connected to another via this FIFO queue, and spikes from a pre-synaptic neuron travel along the queue to the post-synaptic neuron with a constant period of delay. Here, we present Delay Pruning as a mechanism to prune the lengths of the FIFO queues (making them zero) by setting some delay lengths to one with a fixed probability, and finally selecting the best performing model with fixed delays. The uniqueness of structure and a non-sampling based learning rule in DyBM, make the application of previously proposed regularization techniques like Dropout or DropConnect difficult, leading to poor generalization. First, we evaluate the performance of Delay Pruning to let DyBM learn a multidimensional temporal sequence generated by a Markov chain. Finally, we show the effectiveness of delay pruning in learning high dimensional sequences using the moving MNIST dataset, and compare it with Dropout and DropConnect methods.
Multiple chaotic central pattern generators with learning for legged locomotion and malfunction compensation
Ren, Guanjiao, Chen, Weihai, Dasgupta, Sakyasingha, Kolodziejski, Christoph, Wörgötter, Florentin, Manoonpong, Poramate
An originally chaotic system can be controlled into various periodic dynamics. When it is implemented into a legged robot's locomotion control as a central pattern generator (CPG), sophisticated gait patterns arise so that the robot can perform various walking behaviors. However, such a single chaotic CPG controller has difficulties dealing with leg malfunction. Specifically, in the scenarios presented here, its movement permanently deviates from the desired trajectory. To address this problem, we extend the single chaotic CPG to multiple CPGs with learning. The learning mechanism is based on a simulated annealing algorithm. In a normal situation, the CPGs synchronize and their dynamics are identical. With leg malfunction or disability, the CPGs lose synchronization leading to independent dynamics. In this case, the learning mechanism is applied to automatically adjust the remaining legs' oscillation frequencies so that the robot adapts its locomotion to deal with the malfunction. As a consequence, the trajectory produced by the multiple chaotic CPGs resembles the original trajectory far better than the one produced by only a single CPG. The performance of the system is evaluated first in a physical simulation of a quadruped as well as a hexapod robot and finally in a real six-legged walking machine called AMOSII. The experimental results presented here reveal that using multiple CPGs with learning is an effective approach for adaptive locomotion generation where, for instance, different body parts have to perform independent movements for malfunction compensation.