Raychowdhury, Arijit
FAVbot: An Autonomous Target Tracking Micro-Robot with Frequency Actuation Control
Hao, Zhijian, Lele, Ashwin, Fang, Yan, Raychowdhury, Arijit, Ansari, Azadeh
Robotic autonomy at centimeter scale requires compact and miniaturization-friendly actuation integrated with sensing and neural network processing assembly within a tiny form factor. Applications of such systems have witnessed significant advancements in recent years in fields such as healthcare, manufacturing, and post-disaster rescue. The system design at this scale puts stringent constraints on power consumption for both the sensory front-end and actuation back-end and the weight of the electronic assembly for robust operation. In this paper, we introduce FAVbot, the first autonomous mobile micro-robotic system integrated with a novel actuation mechanism and convolutional neural network (CNN) based computer vision - all integrated within a compact 3-cm form factor. The novel actuation mechanism utilizes mechanical resonance phenomenon to achieve frequency-controlled steering with a single piezoelectric actuator. Experimental results demonstrate the effectiveness of FAVbot's frequency-controlled actuation, which offers a diverse selection of resonance modes with different motion characteristics. The actuation system is complemented with the vision front-end where a camera along with a microcontroller supports object detection for closed-loop control and autonomous target tracking. This enables adaptive navigation in dynamic environments. This work contributes to the evolving landscape of neural network-enabled micro-robotic systems showing the smallest autonomous robot built using controllable multi-directional single-actuator mechanism.
VAP: The Vulnerability-Adaptive Protection Paradigm Toward Reliable Autonomous Machines
Wan, Zishen, Gan, Yiming, Yu, Bo, Liu, Shaoshan, Raychowdhury, Arijit, Zhu, Yuhao
The next ubiquitous computing platform, following personal computers and smartphones, is poised to be inherently autonomous, encompassing technologies like drones, robots, and self-driving cars. Ensuring reliability for these autonomous machines is critical. However, current resiliency solutions make fundamental trade-offs between reliability and cost, resulting in significant overhead in performance, energy consumption, and chip area. This is due to the "one-size-fits-all" approach commonly used, where the same protection scheme is applied throughout the entire software computing stack. This paper presents the key insight that to achieve high protection coverage with minimal cost, we must leverage the inherent variations in robustness across different layers of the autonomous machine software stack. Specifically, we demonstrate that various nodes in this complex stack exhibit different levels of robustness against hardware faults. Our findings reveal that the front-end of an autonomous machine's software stack tends to be more robust, whereas the back-end is generally more vulnerable. Building on these inherent robustness differences, we propose a Vulnerability-Adaptive Protection (VAP) design paradigm. In this paradigm, the allocation of protection resources - whether spatially (e.g., through modular redundancy) or temporally (e.g., via re-execution) - is made inversely proportional to the inherent robustness of tasks or algorithms within the autonomous machine system. Experimental results show that VAP provides high protection coverage while maintaining low overhead in both autonomous vehicle and drone systems.
Towards Efficient Neuro-Symbolic AI: From Workload Characterization to Hardware Architecture
Wan, Zishen, Liu, Che-Kai, Yang, Hanchen, Raj, Ritik, Li, Chaojian, You, Haoran, Fu, Yonggan, Wan, Cheng, Li, Sixu, Kim, Youbin, Samajdar, Ananda, Lin, Yingyan Celine, Ibrahim, Mohamed, Rabaey, Jan M., Krishna, Tushar, Raychowdhury, Arijit
The remarkable advancements in artificial intelligence (AI), primarily driven by deep neural networks, are facing challenges surrounding unsustainable computational trajectories, limited robustness, and a lack of explainability. To develop next-generation cognitive AI systems, neuro-symbolic AI emerges as a promising paradigm, fusing neural and symbolic approaches to enhance interpretability, robustness, and trustworthiness, while facilitating learning from much less data. Recent neuro-symbolic systems have demonstrated great potential in collaborative human-AI scenarios with reasoning and cognitive capabilities. In this paper, we aim to understand the workload characteristics and potential architectures for neuro-symbolic AI. We first systematically categorize neuro-symbolic AI algorithms, and then experimentally evaluate and analyze them in terms of runtime, memory, computational operators, sparsity, and system characteristics on CPUs, GPUs, and edge SoCs. Our studies reveal that neuro-symbolic models suffer from inefficiencies on off-the-shelf hardware, due to the memory-bound nature of vector-symbolic and logical operations, complex flow control, data dependencies, sparsity variations, and limited scalability. Based on profiling insights, we suggest cross-layer optimization solutions and present a hardware acceleration case study for vector-symbolic architecture to improve the performance, efficiency, and scalability of neuro-symbolic computing. Finally, we discuss the challenges and potential future directions of neuro-symbolic AI from both system and architectural perspectives.
H3DFact: Heterogeneous 3D Integrated CIM for Factorization with Holographic Perceptual Representations
Wan, Zishen, Liu, Che-Kai, Ibrahim, Mohamed, Yang, Hanchen, Spetalnick, Samuel, Krishna, Tushar, Raychowdhury, Arijit
Disentangling attributes of various sensory signals is central to human-like perception and reasoning and a critical task for higher-order cognitive and neuro-symbolic AI systems. An elegant approach to represent this intricate factorization is via high-dimensional holographic vectors drawing on brain-inspired vector symbolic architectures. However, holographic factorization involves iterative computation with high-dimensional matrix-vector multiplications and suffers from non-convergence problems. In this paper, we present H3DFact, a heterogeneous 3D integrated in-memory compute engine capable of efficiently factorizing high-dimensional holographic representations. H3DFact exploits the computation-in-superposition capability of holographic vectors and the intrinsic stochasticity associated with memristive-based 3D compute-in-memory. Evaluated on large-scale factorization and perceptual problems, H3DFact demonstrates superior capability in factorization accuracy and operational capacity by up to five orders of magnitude, with 5.5x compute density, 1.2x energy efficiency improvements, and 5.9x less silicon footprint compared to iso-capacity 2D designs.
Towards Cognitive AI Systems: a Survey and Prospective on Neuro-Symbolic AI
Wan, Zishen, Liu, Che-Kai, Yang, Hanchen, Li, Chaojian, You, Haoran, Fu, Yonggan, Wan, Cheng, Krishna, Tushar, Lin, Yingyan, Raychowdhury, Arijit
The remarkable advancements in artificial intelligence (AI), primarily driven by deep neural networks, have significantly impacted various aspects of our lives. However, the current challenges surrounding unsustainable computational trajectories, limited robustness, and a lack of explainability call for the development of next-generation AI systems. Neuro-symbolic AI (NSAI) emerges as a promising paradigm, fusing neural, symbolic, and probabilistic approaches to enhance interpretability, robustness, and trustworthiness while facilitating learning from much less data. Recent NSAI systems have demonstrated great potential in collaborative human-AI scenarios with reasoning and cognitive capabilities. In this paper, we provide a systematic review of recent progress in NSAI and analyze the performance characteristics and computational operators of NSAI models. Furthermore, we discuss the challenges and potential future directions of NSAI from both system and architectural perspectives.
BERRY: Bit Error Robustness for Energy-Efficient Reinforcement Learning-Based Autonomous Systems
Wan, Zishen, Chandramoorthy, Nandhini, Swaminathan, Karthik, Chen, Pin-Yu, Reddi, Vijay Janapa, Raychowdhury, Arijit
Autonomous systems, such as Unmanned Aerial Vehicles (UAVs), are expected to run complex reinforcement learning (RL) models to execute fully autonomous position-navigation-time tasks within stringent onboard weight and power constraints. We observe that reducing onboard operating voltage can benefit the energy efficiency of both the computation and flight mission, however, it can also result in on-chip bit failures that are detrimental to mission safety and performance. To this end, we propose BERRY, a robust learning framework to improve bit error robustness and energy efficiency for RL-enabled autonomous systems. BERRY supports robust learning, both offline and on-board the UAV, and for the first time, demonstrates the practicality of robust low-voltage operation on UAVs that leads to high energy savings in both compute-level operation and system-level quality-of-flight. We perform extensive experiments on 72 autonomous navigation scenarios and demonstrate that BERRY generalizes well across environments, UAVs, autonomy policies, operating voltages and fault patterns, and consistently improves robustness, efficiency and mission performance, achieving up to 15.62% reduction in flight energy, 18.51% increase in the number of successful missions, and 3.43x processing energy reduction.
Real-Time Fully Unsupervised Domain Adaptation for Lane Detection in Autonomous Driving
Bhardwaj, Kshitij, Wan, Zishen, Raychowdhury, Arijit, Goldhahn, Ryan
Abstract--While deep neural networks are being utilized heavily for autonomous driving, they need to be adapted to new unseen environmental conditions for which they were not trained. We focus on a safety critical application of lane detection, and propose a lightweight, fully unsupervised, real-time adaptation approach that only adapts the batch-normalization parameters of the model. We demonstrate that our technique can perform inference, followed by on-device adaptation, under a tight constraint of 30 FPS on Nvidia Jetson Orin. It shows similar accuracy (avg. of 92.19%) as a state-of-the-art semi-supervised adaptation These models are typically trained using simulators hyperparameters based only on unlabeled target data. However, the training data (source domain) demonstrate this technique on Nvidia Jetson Orin, where we can be significantly different from real-world conditions (target show that inference, followed by model adaptation, using each domain). Deep learning models will need to be adapted from incoming 1280 720 image can be achieved in tight realtime the labeled source domain to the unlabeled target domain performance constraints of up to 30 FPS.
Non-Uniform Interpolation in Integrated Gradients for Low-Latency Explainable-AI
Bhat, Ashwin, Raychowdhury, Arijit
There has been a surge in Explainable-AI (XAI) methods that provide insights into the workings of Deep Neural Network (DNN) models. Integrated Gradients (IG) is a popular XAI algorithm that attributes relevance scores to input features commensurate with their contribution to the model's output. However, it requires multiple forward \& backward passes through the model. Thus, compared to a single forward-pass inference, there is a significant computational overhead to generate the explanation which hinders real-time XAI. This work addresses the aforementioned issue by accelerating IG with a hardware-aware algorithm optimization. We propose a novel non-uniform interpolation scheme to compute the IG attribution scores which replaces the baseline uniform interpolation. Our algorithm significantly reduces the total interpolation steps required without adversely impacting convergence. Experiments on the ImageNet dataset using a pre-trained InceptionV3 model demonstrate \textit{2.6-3.6}$\times$ performance speedup on GPU systems for iso-convergence. This includes the minimal \textit{0.2-3.2}\% latency overhead introduced by the pre-processing stage of computing the non-uniform interpolation step-sizes.
MAVFI: An End-to-End Fault Analysis Framework with Anomaly Detection and Recovery for Micro Aerial Vehicles
Hsiao, Yu-Shun, Wan, Zishen, Jia, Tianyu, Ghosal, Radhika, Mahmoud, Abdulrahman, Raychowdhury, Arijit, Brooks, David, Wei, Gu-Yeon, Reddi, Vijay Janapa
Safety and resilience are critical for autonomous unmanned aerial vehicles (UAVs). We introduce MAVFI, the micro aerial vehicles (MAVs) resilience analysis methodology to assess the effect of silent data corruption (SDC) on UAVs' mission metrics, such as flight time and success rate, for accurately measuring system resilience. To enhance the safety and resilience of robot systems bound by size, weight, and power (SWaP), we offer two low-overhead anomaly-based SDC detection and recovery algorithms based on Gaussian statistical models and autoencoder neural networks. Our anomaly error protection techniques are validated in numerous simulated environments. We demonstrate that the autoencoder-based technique can recover up to all failure cases in our studied scenarios with a computational overhead of no more than 0.0062%. Our application-aware resilience analysis framework, MAVFI, can be utilized to comprehensively test the resilience of other Robot Operating System (ROS)-based applications and is publicly available at https://github.com/harvard-edge/MAVBench/tree/mavfi.
A Survey of FPGA-Based Robotic Computing
Wan, Zishen, Yu, Bo, Li, Thomas Yuang, Tang, Jie, Zhu, Yuhao, Wang, Yu, Raychowdhury, Arijit, Liu, Shaoshan
Recent researches on robotics have shown significant improvement, spanning from algorithms, mechanics to hardware architectures. Robotics, including manipulators, legged robots, drones, and autonomous vehicles, are now widely applied in diverse scenarios. However, the high computation and data complexity of robotic algorithms pose great challenges to its applications. On the one hand, CPU platform is flexible to handle multiple robotic tasks. GPU platform has higher computational capacities and easy-touse development frameworks, so they have been widely adopted in several applications. On the other hand, FPGA-based robotic accelerators are becoming increasingly competitive alternatives, especially in latency-critical and power-limited scenarios. With specialized designed hardware logic and algorithm kernels, FPGA-based accelerators can surpass CPU and GPU in performance and energy efficiency. In this paper, we give an overview of previous work on FPGA-based robotic accelerators covering different stages of the robotic system pipeline. An analysis of software and hardware optimization techniques and main technical issues is presented, along with some commercial and space applications, to serve as a guide for future work. Therefore, the computation and storage complexity, as well as real-time and power constraints of the robotic system, Over the last decade, we have seen significant progress hinders its wide application in latency-critical or power-limited in the development of robotics, spanning from algorithms, scenarios [13]. Various robotic systems, like Therefore, it is essential to choose a proper compute platform manipulators, legged robots, unmanned aerial vehicles, selfdriving for the robotic system. CPU and GPU are two widely cars have been designed for search and rescue [1], [2], used commercial compute platforms. CPU is designed to exploration [3], [4], package delivery [5], entertainment [6], handle a wide range of tasks quickly and is often used to [7] and more applications and scenarios. These robots are develop novel algorithms. A typical CPU can achieve 10-on the rise of demonstrating their full potential. Take drones, 100 GFLOPS with below 1GOP/J power efficiency [14]. In a type of aerial robots, for example, the number of drones contrast, GPU is designed with thousands of processor cores has grown by 2.83x between 2015 and 2019 based on the running simultaneously, which enable massive parallelism. The typical GPU can perform up to 10 TOPS performance and registered number has reached 1.32 million in 2019, and the become a good candidate for high-performance scenarios. Recently, FFA expects this number will come to 1.59 billion by 2024.