Goto

Collaborating Authors

 hybrid network






Quantifying Memory Utilization with Effective State-Size

Parnichkun, Rom N., Tumma, Neehal, Thomas, Armin W., Moro, Alessandro, An, Qi, Suzuki, Taiji, Yamashita, Atsushi, Poli, Michael, Massaroli, Stefano

arXiv.org Artificial Intelligence

The need to develop a general framework for architecture analysis is becoming increasingly important, given the expanding design space of sequence models. To this end, we draw insights from classical signal processing and control theory, to develop a quantitative measure of \textit{memory utilization}: the internal mechanisms through which a model stores past information to produce future outputs. This metric, which we call \textbf{\textit{effective state-size}} (ESS), is tailored to the fundamental class of systems with \textit{input-invariant} and \textit{input-varying linear operators}, encompassing a variety of computational units such as variants of attention, convolutions, and recurrences. Unlike prior work on memory utilization, which either relies on raw operator visualizations (e.g. attention maps), or simply the total \textit{memory capacity} (i.e. cache size) of a model, our metrics provide highly interpretable and actionable measurements. In particular, we show how ESS can be leveraged to improve initialization strategies, inform novel regularizers and advance the performance-efficiency frontier through model distillation. Furthermore, we demonstrate that the effect of context delimiters (such as end-of-speech tokens) on ESS highlights cross-architectural differences in how large language models utilize their available memory to recall information. Overall, we find that ESS provides valuable insights into the dynamics that dictate memory utilization, enabling the design of more efficient and effective sequence models.


Learning Broken Symmetries with Approximate Invariance

Nabat, Seth, Ghosh, Aishik, Witkowski, Edmund, Kasieczka, Gregor, Whiteson, Daniel

arXiv.org Artificial Intelligence

Recognizing symmetries in data allows for significant boosts in neural network training, which is especially important where training data are limited. In many cases, however, the exact underlying symmetry is present only in an idealized dataset, and is broken in actual data, due to asymmetries in the detector, or varying response resolution as a function of particle momentum. Standard approaches, such as data augmentation or equivariant networks fail to represent the nature of the full, broken symmetry, effectively overconstraining the response of the neural network. We propose a learning model which balances the generality and asymptotic performance of unconstrained networks with the rapid learning of constrained networks. This is achieved through a dual-subnet structure, where one network is constrained by the symmetry and the other is not, along with a learned symmetry factor. In a simplified toy example that demonstrates violation of Lorentz invariance, our model learns as rapidly as symmetry-constrained networks but escapes its performance limitations.


Towards Efficient Deployment of Hybrid SNNs on Neuromorphic and Edge AI Hardware

Seekings, James, Chandarana, Peyton, Ardakani, Mahsa, Mohammadi, MohammadReza, Zand, Ramtin

arXiv.org Artificial Intelligence

This paper explores the synergistic potential of neuromorphic and edge computing to create a versatile machine learning (ML) system tailored for processing data captured by dynamic vision sensors. We construct and train hybrid models, blending spiking neural networks (SNNs) and artificial neural networks (ANNs) using PyTorch and Lava frameworks. Our hybrid architecture integrates an SNN for temporal feature extraction and an ANN for classification. We delve into the challenges of deploying such hybrid structures on hardware. Specifically, we deploy individual components on Intel's Neuromorphic Processor Loihi (for SNN) and Jetson Nano (for ANN). We also propose an accumulator circuit to transfer data from the spiking to the non-spiking domain. Furthermore, we conduct comprehensive performance analyses of hybrid SNN-ANN models on a heterogeneous system of neuromorphic and edge AI hardware, evaluating accuracy, latency, power, and energy consumption. Our findings demonstrate that the hybrid spiking networks surpass the baseline ANN model across all metrics and outperform the baseline SNN model in accuracy and latency.


Neuromimetic metaplasticity for adaptive continual learning

Cho, Suhee, Lee, Hyeonsu, Baek, Seungdae, Paik, Se-Bum

arXiv.org Artificial Intelligence

Conventional intelligent systems based on deep neural network (DNN) models encounter challenges in achieving human-like continual learning due to catastrophic forgetting. Here, we propose a metaplasticity model inspired by human working memory, enabling DNNs to perform catastrophic forgetting-free continual learning without any pre- or post-processing. A key aspect of our approach involves implementing distinct types of synapses from stable to flexible, and randomly intermixing them to train synaptic connections with different degrees of flexibility. This strategy allowed the network to successfully learn a continuous stream of information, even under unexpected changes in input length. The model achieved a balanced tradeoff between memory capacity and performance without requiring additional training or structural modifications, dynamically allocating memory resources to retain both old and new information. Furthermore, the model demonstrated robustness against data poisoning attacks by selectively filtering out erroneous memories, leveraging the Hebb repetition effect to reinforce the retention of significant data.


Lyft says its future lies in a hybrid network of autonomous and driver rides

Engadget

Lyft drivers don't have to worry about being fully replaced by the company's autonomous vehicles just yet. Company president John Zimmer told CNBC that Lyft intends to operate a hybrid network at first, with a fleet that's largely comprised of non-autonomous cars. "[J]ust like what happened with phones, you didn't have 3G go to 4G go to 5G on separate networks," Zimmer explained."You And similar to when LTE was new and mobile users mostly had to connect to the internet via 3G, Lyft passengers will also largely have to rely on rideshare drivers. Zimmer envisions a network wherein autonomous vehicles will only be taking five percent of all trips at first, with rideshare drivers taking the lion's share of the rides booked through the platform.


Neural Architecture Dilation for Adversarial Robustness

Li, Yanxi, Yang, Zhaohui, Wang, Yunhe, Xu, Chang

arXiv.org Artificial Intelligence

With the tremendous advances in the architecture and scale of convolutional neural networks (CNNs) over the past few decades, they can easily reach or even exceed the performance of humans in certain tasks. However, a recently discovered shortcoming of CNNs is that they are vulnerable to adversarial attacks. Although the adversarial robustness of CNNs can be improved by adversarial training, there is a trade-off between standard accuracy and adversarial robustness. From the neural architecture perspective, this paper aims to improve the adversarial robustness of the backbone CNNs that have a satisfactory accuracy. Under a minimal computational overhead, the introduction of a dilation architecture is expected to be friendly with the standard performance of the backbone CNN while pursuing adversarial robustness. Theoretical analyses on the standard and adversarial error bounds naturally motivate the proposed neural architecture dilation algorithm. Experimental results on real-world datasets and benchmark neural networks demonstrate the effectiveness of the proposed algorithm to balance the accuracy and adversarial robustness.