Goto

Collaborating Authors

 hardware platform


Rapid Model Architecture Adaption for Meta-Learning

Neural Information Processing Systems

Network Architecture Search (NAS) methods have recently gathered much attention. They design networks with better performance and use a much shorter search time compared to traditional manual tuning. Despite their efficiency in model deployments, most NAS algorithms target a single task on a fixed hardware system. However, real-life few-shot learning environments often cover a great number of tasks ($T$) and deployments on a wide variety of hardware platforms ($H$). The combinatorial search complexity $T \times H$ creates a fundamental search efficiency challenge if one naively applies existing NAS methods to these scenarios. To overcome this issue, we show, for the first time, how to rapidly adapt model architectures to new tasks in a \emph{many-task many-hardware} few-shot learning setup by integrating Model Agnostic Meta Learning (MAML) into the NAS flow. The proposed NAS method (H-Meta-NAS) is hardware-aware and performs optimisation in the MAML framework. MetaNAS shows a Pareto dominance compared to a variety of NAS and manual baselines in popular few-shot learning benchmarks with various hardware platforms and constraints. In particular, on the 5-way 1-shot Mini-ImageNet classification task, the proposed method outperforms the best manual baseline by a large margin ($5.21\%$ in accuracy) using $60\%$ less computation.




A Modular and Scalable System Architecture for Heterogeneous UAV Swarms Using ROS 2 and PX4-Autopilot

Pommeranz, Robert, Tebbe, Kevin, Heynicke, Ralf, Scholl, Gerd

arXiv.org Artificial Intelligence

In this paper a modular and scalable architecture for heterogeneous swarm-based Counter Unmanned Aerial Systems (C-UASs) built on PX4-Autopilot and Robot Operating System 2 (ROS 2) framework is presented. The proposed architecture emphasizes seamless integration of hardware components by introducing independent ROS 2 nodes for each component of a Unmanned Aerial Vehicle (UAV). Communication between swarm participants is abstracted in software, allowing the use of various technologies without architectural changes. Key functionalities are supported, e.g. leader following and formation flight to maneuver the swarm. The system also allows computer vision algorithms to be integrated for the detection and tracking of UAVs. Additionally, a ground station control is integrated for the coordination of swarm operations. Swarm-based Unmanned Aerial System (UAS) architecture is verified within a Gazebo simulation environment but also in real-world demonstrations.


Deploying Atmospheric and Oceanic AI Models on Chinese Hardware and Framework: Migration Strategies, Performance Optimization and Analysis

Sun, Yuze, Luo, Wentao, Xiang, Yanfei, Pan, Jiancheng, Li, Jiahao, Zhang, Quan, Huang, Xiaomeng

arXiv.org Artificial Intelligence

With the growing role of artificial intelligence in climate and weather research, efficient model training and inference are in high demand. Current models like FourCastNet and AI-GOMS depend heavily on GPUs, limiting hardware independence, especially for Chinese domestic hardware and frameworks. To address this issue, we present a framework for migrating large-scale atmospheric and oceanic models from PyTorch to MindSpore and optimizing for Chinese chips, and evaluating their performance against GPUs. The framework focuses on software-hardware adaptation, memory optimization, and parallelism. Furthermore, the model's performance is evaluated across multiple metrics, including training speed, inference speed, model accuracy, and energy efficiency, with comparisons against GPU-based implementations. Experimental results demonstrate that the migration and optimization process preserves the models' original accuracy while significantly reducing system dependencies and improving operational efficiency by leveraging Chinese chips as a viable alternative for scientific computing. This work provides valuable insights and practical guidance for leveraging Chinese domestic chips and frameworks in atmospheric and oceanic AI model development, offering a pathway toward greater technological independence.


RAM-NAS: Resource-aware Multiobjective Neural Architecture Search Method for Robot Vision Tasks

Mao, Shouren, Qin, Minghao, Dong, Wei, Liu, Huajian, Gao, Yongzhuo

arXiv.org Artificial Intelligence

Neural architecture search (NAS) has shown great promise in automatically designing lightweight models. However, conventional approaches are insufficient in training the supernet and pay little attention to actual robot hardware resources. To meet such challenges, we propose RAM-NAS, a resource-aware multi-objective NAS method that focuses on improving the supernet pretrain and resource-awareness on robot hardware devices. We introduce the concept of subnets mutual distillation, which refers to mutually distilling all subnets sampled by the sandwich rule. Additionally, we utilize the Decoupled Knowledge Distillation (DKD) loss to enhance logits distillation performance. To expedite the search process with consideration for hardware resources, we used data from three types of robotic edge hardware to train Latency Surrogate predictors. These predictors facilitated the estimation of hardware inference latency during the search phase, enabling a unified multi-objective evolutionary search to balance model accuracy and latency trade-offs. Our discovered model family, RAM-NAS models, can achieve top-1 accuracy ranging from 76.7% to 81.4% on ImageNet. In addition, the resource-aware multi-objective NAS we employ significantly reduces the model's inference latency on edge hardware for robots. We conducted experiments on downstream tasks to verify the scalability of our methods. The inference time for detection and segmentation is reduced on all three hardware types compared to MobileNetv3-based methods. Our work fills the gap in NAS for robot hardware resource-aware.


Generating Energy-Efficient Code via Large-Language Models -- Where are we now?

Apsan, Radu, Stoico, Vincenzo, Albonico, Michel, Dhar, Rudra, Vaidhyanathan, Karthik, Malavolta, Ivano

arXiv.org Artificial Intelligence

Context. The rise of Large Language Models (LLMs) has led to their widespread adoption in development pipelines. Goal. We empirically assess the energy efficiency of Python code generated by LLMs against human-written code and code developed by a Green software expert. Method. We test 363 solutions to 9 coding problems from the EvoEval benchmark using 6 widespread LLMs with 4 prompting techniques, and comparing them to human-developed solutions. Energy consumption is measured on three different hardware platforms: a server, a PC, and a Raspberry Pi for a total of ~881h (36.7 days). Results. Human solutions are 16% more energy-efficient on the server and 3% on the Raspberry Pi, while LLMs outperform human developers by 25% on the PC. Prompting does not consistently lead to energy savings, where the most energy-efficient prompts vary by hardware platform. The code developed by a Green software expert is consistently more energy-efficient by at least 17% to 30% against all LLMs on all hardware platforms. Conclusions. Even though LLMs exhibit relatively good code generation capabilities, no LLM-generated code was more energy-efficient than that of an experienced Green software developer, suggesting that as of today there is still a great need of human expertise for developing energy-efficient Python code.




MiCo: End-to-End Mixed Precision Neural Network Co-Exploration Framework for Edge AI

Jiang, Zijun, Lyu, Yangdi

arXiv.org Artificial Intelligence

--Quantized Neural Networks (QNN) with extremely low-bitwidth data have proven promising in efficient storage and computation on edge devices. T o further reduce the accuracy drop while increasing speedup, layer-wise mixed-precision quantization (MPQ) becomes a popular solution. However, existing algorithms for exploring MPQ schemes are limited in flexibility and efficiency. Comprehending the complex impacts of different MPQ schemes on post-training quantization and quantization-aware training results is a challenge for conventional methods. Furthermore, an end-to-end framework for the optimization and deployment of MPQ models is missing in existing work. In this paper, we propose the MiCo framework, a holistic MPQ exploration and deployment framework for edge AI applications. The framework adopts a novel optimization algorithm to search for optimal quantization schemes with the highest accuracies while meeting latency constraints. Hardware-aware latency models are built for different hardware targets to enable fast explorations. After the exploration, the framework enables direct deployment from PyT orch MPQ models to bare-metal C codes, leading to end-to-end speedup with minimal accuracy drops. Tiny machine learning (ML) and edge artificial intelligence (AI) are becoming increasingly important and valuable in today's AI ecosystem. However, deploying AI models on edge devices is challenging due to the tight resource constraints.