Dynamic Tsetlin Machine Accelerators for On-Chip Training at the Edge using FPGAs

Mao, Gang, Rahman, Tousif, Maheshwari, Sidharth, Pattison, Bob, Shao, Zhuang, Shafik, Rishad, Yakovlev, Alex

arXiv.org Artificial Intelligence 

--The increased demand for data privacy and security in machine learning (ML) applications has put impetus on effective edge training on Internet-of-Things (IoT) nodes. Edge training aims to leverage speed, energy efficiency and adaptability within the resource constraints of the nodes. This paper presents a Dynamic Tsetlin Machine (DTM) training accelerator as an alternative to DNN implementations. Underpinned on the V anilla and Coalesced Tsetlin Machine algorithms, the dynamic aspect of the accelerator design allows for a run-time reconfiguration targeting different datasets, model architectures, and model sizes without resynthesis. This makes the DTM suitable for targeting multivariate sensor-based edge tasks. Compared to DNNs, DTM trains with fewer multiply-accumulates, devoid of derivative computation. It is a data-centric ML algorithm that learns by aligning Tsetlin automata with input data to form logical propositions enabling efficient Lookup-T able (LUT) mapping and frugal Block RAM usage in FPGA training implementations. The proposed accelerator offers 2.54x more Giga operations per second per Watt (GOP/s per W) and uses 6x less power than the next-best comparable design. Index T erms --Edge Training, Coalesced Tsetlin Machines, Dynamic Tsetlin Machines, Embedded FPGA, Machine Learning Accelerator, On-Chip Learning, Logic-based-learning. ACHINE Learning (ML) offers a generalized approach to developing autonomous applications from "Internet-of-Things" (IoT) sensor data. Having ML execution units in close proximity to the sensor, at the so-called edge, enables faster task execution with high data security and privacy. However, sensor degradation and environmental factors may require recalibration [1] or user-personalized on-field training [2] to ensure continued functionality. Implementing solutions to these challenges is nontrivial. It requires finding the right balance between achieving the appropriate learning efficacy for the ML problem and the restrictive compute/ memory resources available on the platforms [3]. This work was supported by EPSRC EP/X036006/1 Scalability Oriented Novel Network of Event Triggered Systems (SONNETS) project and by EPSRC EP/X039943/1 UKRI-RCN: Exploiting the dynamics of self-timed machine learning hardware (ESTEEM) project. For ML inference tasks on edge nodes, these challenges have been widely explored, e.g., quantization [4], sparsity-based compression, and pruning for the most commonly used Deep Neural Network (DNN) models [3], [5], [6].