Goto

Collaborating Authors

 Genov, Roman


Seizure Detection and Prediction by Parallel Memristive Convolutional Neural Networks

arXiv.org Artificial Intelligence

During the past two decades, epileptic seizure detection and prediction algorithms have evolved rapidly. However, despite significant performance improvements, their hardware implementation using conventional technologies, such as Complementary Metal-Oxide-Semiconductor (CMOS), in power and area-constrained settings remains a challenging task; especially when many recording channels are used. In this paper, we propose a novel low-latency parallel Convolutional Neural Network (CNN) architecture that has between 2-2,800x fewer network parameters compared to SOTA CNN architectures and achieves 5-fold cross validation accuracy of 99.84% for epileptic seizure detection, and 99.01% and 97.54% for epileptic seizure prediction, when evaluated using the University of Bonn Electroencephalogram (EEG), CHB-MIT and SWEC-ETHZ seizure datasets, respectively. We subsequently implement our network onto analog crossbar arrays comprising Resistive Random-Access Memory (RRAM) devices, and provide a comprehensive benchmark by simulating, laying out, and determining hardware requirements of the CNN component of our system. To the best of our knowledge, we are the first to parallelize the execution of convolution layer kernels on separate analog crossbars to enable 2 orders of magnitude reduction in latency compared to SOTA hybrid Memristive-CMOS DL accelerators. Furthermore, we investigate the effects of non-idealities on our system and investigate Quantization Aware Training (QAT) to mitigate the performance degradation due to low ADC/DAC resolution. Finally, we propose a stuck weight offsetting methodology to mitigate performance degradation due to stuck RON/ROFF memristor weights, recovering up to 32% accuracy, without requiring retraining. The CNN component of our platform is estimated to consume approximately 2.791W of power while occupying an area of 31.255mm$^2$ in a 22nm FDSOI CMOS process.


Toward A Formalized Approach for Spike Sorting Algorithms and Hardware Evaluation

arXiv.org Artificial Intelligence

Spike sorting algorithms are used to separate extracellular recordings of neuronal populations into single-unit spike activities. The development of customized hardware implementing spike sorting algorithms is burgeoning. However, there is a lack of a systematic approach and a set of standardized evaluation criteria to facilitate direct comparison of both software and hardware implementations. In this paper, we formalize a set of standardized criteria and a publicly available synthetic dataset entitled Synthetic Simulations Of Extracellular Recordings (SSOER), which was constructed by aggregating existing synthetic datasets with varying Signal-To-Noise Ratios (SNRs). Furthermore, we present a benchmark for future comparison, and use our criteria to evaluate a simulated Resistive Random-Access Memory (RRAM) In-Memory Computing (IMC) system using the Discrete Wavelet Transform (DWT) for feature extraction. Our system consumes approximately (per channel) 10.72mW and occupies an area of 0.66mm$^2$ in a 22nm FDSOI Complementary Metal-Oxide-Semiconductor (CMOS) process.


Design Space Exploration of Dense and Sparse Mapping Schemes for RRAM Architectures

arXiv.org Artificial Intelligence

The impact of device and circuit-level effects in mixed-signal Resistive Random Access Memory (RRAM) accelerators typically manifest as performance degradation of Deep Learning (DL) algorithms, but the degree of impact varies based on algorithmic features. These include network architecture, capacity, weight distribution, and the type of inter-layer connections. Techniques are continuously emerging to efficiently train sparse neural networks, which may have activation sparsity, quantization, and memristive noise. In this paper, we present an extended Design Space Exploration (DSE) methodology to quantify the benefits and limitations of dense and sparse mapping schemes for a variety of network architectures. While sparsity of connectivity promotes less power consumption and is often optimized for extracting localized features, its performance on tiled RRAM arrays may be more susceptible to noise due to under-parameterization, when compared to dense mapping schemes. Moreover, we present a case study quantifying and formalizing the trade-offs of typical non-idealities introduced into 1-Transistor-1-Resistor (1T1R) tiled memristive architectures and the size of modular crossbar tiles using the CIFAR-10 dataset.


Node-Centric Graph Learning from Data for Brain State Identification

arXiv.org Machine Learning

Data-driven graph learning models a network by determining the strength of connections between its nodes. The data refers to a graph signal which associates a value with each graph node. Existing graph learning methods either use simplified models for the graph signal, or they are prohibitively expensive in terms of computational and memory requirements. This is particularly true when the number of nodes is high or there are temporal changes in the network. In order to consider richer models with a reasonable computational tractability, we introduce a graph learning method based on representation learning on graphs. Representation learning generates an embedding for each graph node, taking the information from neighbouring nodes into account. Our graph learning method further modifies the embeddings to compute the graph similarity matrix. In this work, graph learning is used to examine brain networks for brain state identification. We infer time-varying brain graphs from an extensive dataset of intracranial electroencephalographic (iEEG) signals from ten patients. We then apply the graphs as input to a classifier to distinguish seizure vs. non-seizure brain states. Using the binary classification metric of area under the receiver operating characteristic curve (AUC), this approach yields an average of 9.13 percent improvement when compared to two widely used brain network modeling methods.


Stochastic Mixed-Signal VLSI Architecture for High-Dimensional Kernel Machines

Neural Information Processing Systems

A mixed-signal paradigm is presented for high-resolution parallel innerproduct computationin very high dimensions, suitable for efficient implementation ofkernels in image processing. At the core of the externally digital architecture is a high-density, low-power analog array performing binary-binary partial matrix-vector multiplication. Full digital resolution is maintained even with low-resolution analog-to-digital conversion, owing torandom statistics in the analog summation of binary products. A random modulation scheme produces near-Bernoulli statistics even for highly correlated inputs. The approach is validated with real image data, and with experimental results from a CID/DRAM analog array prototype in 0.5