online sequential
An FPGA-Based On-Device Reinforcement Learning Approach using Online Sequential Learning
Watanabe, Hirohisa, Tsukada, Mineto, Matsutani, Hiroki
DQN (Deep Q-Network) is a method to perform Q-learning for reinforcement learning using deep neural networks. DQNs require a large buffer and batch processing for an experience replay and rely on a backpropagation based iterative optimization, making them difficult to be implemented on resource-limited edge devices. In this paper, we propose a lightweight on-device reinforcement learning approach for low-cost FPGA devices. It exploits a recently proposed neural-network based on-device learning approach that does not rely on the backpropagation method but uses OS-ELM (Online Sequential Extreme Learning Machine) based training algorithm. In addition, we propose a combination of L2 regularization and spectral normalization for the on-device reinforcement learning so that output values of the neural network can be fit into a certain range and the reinforcement learning becomes stable. The proposed reinforcement learning approach is designed for Xilinx PYNQ-Z1 board as a low-cost FPGA platform. The evaluation results using OpenAI Gym demonstrate that the proposed algorithm and its FPGA implementation without data transfer overhead complete a CartPole-v0 task 29.76x and 125.88x faster than a conventional DQN-based approach when the number of hidden-layer nodes is 64.
AdaBoost-assisted Extreme Learning Machine for Efficient Online Sequential Classification
Chen, Yi-Ta, Chuang, Yu-Chuan, An-Yeu, null, Wu, null
In this paper, we propose an AdaBoost - assisted extreme learning machine for efficient online sequential classification (AOS - ELM) . In order to achieve better accuracy in online sequential learning scenarios, we utilize the cost - sensitive algorithm - AdaBoost, which diversifying the weak classifiers, and addin g the forgetting mechanism, which stabilizing the performance during the training procedure . Hence, AOS - ELM adapt s bet ter to sequentially arrived data compared with other voting based methods. The experim ent results show AOS - ELM can achieve 9 4.41 % accuracy on MNIST dataset, which is the theoretical accuracy bound performed by original batch learning algorithm, AdaBoost - EL M. Moreover, with the forgetting mechanism, the standard deviation of accuracy during the online sequential learning process is reduced to 8.26x.