aspen
ASPEN: Breaking Operator Barriers for Efficient Parallelization of Deep Neural Networks
Modern Deep Neural Network (DNN) frameworks use tensor operators as the main building blocks of DNNs. However, we observe that operator-based construction of DNNs incurs significant drawbacks in parallelism in the form of synchronization barriers. Synchronization barriers of operators confine the scope of parallel computation to each operator and obscure the rich parallel computation opportunities that exist across operators. To this end, we present ASPEN, a novel parallel computation solution for DNNs that achieves fine-grained dynamic execution of DNNs, which (1) removes the operator barriers and expresses DNNs in dataflow graphs of fine-grained tiles to expose the parallel computation opportunities across operators, and (2) exploits these opportunities by dynamically locating and scheduling them in runtime. This novel approach of ASPEN enables opportunistic parallelism, a new class of parallelism for DNNs that is unavailable in the existing operator-based approaches. ASPEN also achieves high resource utilization and memory reuse by letting each resource asynchronously traverse depthwise in the DNN graph to its full computing potential. We provide challenges and solutions to our approach and show that our proof-of-concept implementation of ASPEN on CPU shows exceptional performance, outperforming state-of-the-art inference systems of TorchScript and TVM by up to 3.2$\times$ and 4.3$\times$, respectively.
Adaptive Spiking with Plasticity for Energy Aware Neuromorphic Systems
Calle-Ortiz, Eduardo, Guan, Hui, Ganesan, Deepak, Nguyen, Phuc
This paper presents ASPEN, a novel energy-aware technique for neuromorphic systems that could unleash the future of intelligent, always-on, ultra-low-power, and low-burden wearables. Our main research objectives are to explore the feasibility of neuromorphic computing for wearables, identify open research directions, and demonstrate the feasibility of developing an adaptive spiking technique for energy-aware computation, which can be game-changing for resource-constrained devices in always-on applications. As neuromorphic computing systems operate based on spike events, their energy consumption is closely related to spiking activity, i.e., each spike incurs computational and power costs; consequently, minimizing the number of spikes is a critical strategy for operating under constrained energy budgets. To support this goal, ASPEN utilizes stochastic perturbations to the neuronal threshold during training to not only enhance the network's robustness across varying thresholds, which can be controlled at inference time, but also act as a regularizer that improves generalization, reduces spiking activity, and enables energy control without the need for complex retraining or pruning. More specifically, ASPEN adaptively adjusts intrinsic neuronal parameters as a lightweight and scalable technique for dynamic energy control without reconfiguring the entire model. Our evaluation on neuromorphic emulator and hardware shows that ASPEN significantly reduces spike counts and energy consumption while maintaining accuracy comparable to state-of-the-art methods.
ASPEN: Breaking Operator Barriers for Efficient Parallelization of Deep Neural Networks
Modern Deep Neural Network (DNN) frameworks use tensor operators as the main building blocks of DNNs. However, we observe that operator-based construction of DNNs incurs significant drawbacks in parallelism in the form of synchronization barriers. Synchronization barriers of operators confine the scope of parallel computation to each operator and obscure the rich parallel computation opportunities that exist across operators. To this end, we present ASPEN, a novel parallel computation solution for DNNs that achieves fine-grained dynamic execution of DNNs, which (1) removes the operator barriers and expresses DNNs in dataflow graphs of fine-grained tiles to expose the parallel computation opportunities across operators, and (2) exploits these opportunities by dynamically locating and scheduling them in runtime. This novel approach of ASPEN enables opportunistic parallelism, a new class of parallelism for DNNs that is unavailable in the existing operator-based approaches.
Steve Jobs Knew the Moment the Future Had Arrived. It's Calling Again
Steve Jobs is 28 years old, and seems a little nervous as he starts his speech to a group of designers gathered under a large tent in Aspen, Colorado. He fiddles with his bow tie and soon removes his suit jacket, dropping it to the floor when he finds no other place to set it down. It is 1983, and he's about to ask designers for their help in improving the look of the coming wave of personal computers. But first he will tell them that those computers will shatter the lives they have led to date. "How many of you are 36 years โฆ older than 36?" he asks.
ASPEN: High-Throughput LoRA Fine-Tuning of Large Language Models with a Single GPU
Ye, Zhengmao, Li, Dengchun, Tian, Jingqi, Lan, Tingfeng, Zuo, Jie, Duan, Lei, Lu, Hui, Jiang, Yexi, Sha, Jian, Zhang, Ke, Tang, Mingjie
Transformer-based large language models (LLMs) have demonstrated outstanding performance across diverse domains, particularly when fine-turned for specific domains. Recent studies suggest that the resources required for fine-tuning LLMs can be economized through parameter-efficient methods such as Low-Rank Adaptation (LoRA). While LoRA effectively reduces computational burdens and resource demands, it currently supports only a single-job fine-tuning setup. In this paper, we present ASPEN, a high-throughput framework for fine-tuning LLMs. ASPEN efficiently trains multiple jobs on a single GPU using the LoRA method, leveraging shared pre-trained model and adaptive scheduling. ASPEN is compatible with transformer-based language models like LLaMA and ChatGLM, etc. Experiments show that ASPEN saves 53% of GPU memory when training multiple LLaMA-7B models on NVIDIA A100 80GB GPU and boosts training throughput by about 17% compared to existing methods when training with various pre-trained models on different GPUs. The adaptive scheduling algorithm reduces turnaround time by 24%, end-to-end training latency by 12%, prioritizing jobs and preventing out-of-memory issues.
Careers - Airgility, Inc.
Aspen Avionics is aligning their graphical user interface development with Airgility and we will in turn/time begin to align our algorithms and AI development into their avionics products. Since the vision is to cross-align each other's company capabilities, while the position can be held remotely, it is highly preferable to have the new hires that are able to physically be present at Airgility in College Park (MD) or in Albuquerque (NM). The physical presence of the Software Engineer(s) will allow the new hire access to learn about the robotics work performed at Airgility. Therefore, future development that inlays Airgility's work into Aspen's Avionics products will likely create a smoother workflow since a portion of Aspen's engineering team is deployed within Airgility. If the hire is located at Airgility, travel to Albuquerque (NM) is required.
How AI Influences Children
Aspen is disappointed as he replies, "I don't want to go to bed" and looks pleadingly at his dad. His dad shrugs and says it's not up to him. The virtual voice persists: "I need you to cooperate" and starts counting down from 10. By six, Aspen gives in and retires to his room. Aspen's father then explains to his guests how the virtual assistant, Lady, has helped him'disrupt fatherhood', where he gets to be the good cop, and the Lady gets all the bad rap.
Leveraging Multiple Artificial Intelligence Techniques to Improve the Responsiveness in Operations Planning: ASPEN for Orbital Express
The challenging timeline for DARPA's Orbital Express mission demanded a flexible, responsive, and (above all) safe approach to mission planning. Mission planning for space is challenging because of the mixture of goals and constraints. Every space mission tries to squeeze all of the capacity possible out of the spacecraft. For Orbital Express, this means performing as many experiments as possible, while still keeping the spacecraft safe. Keeping the spacecraft safe can be very challenging because we need to maintain the correct thermal environment (or batteries might freeze), we need to avoid pointing cameras and sensitive sensors at the sun, we need to keep the spacecraft batteries charged, and we need to keep the two spacecraft from colliding... made more difficult as only one of the spacecraft had thrusters.