AITopics | cgra4ml

Collaborating Authors

cgra4ml

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Real-Time Semantic Segmentation on FPGA for Autonomous Vehicles Using LMIINet with the CGRA4ML Framework

Hosseini, Amir Mohammad Khadem, Mirzakuchaki, Sattar

arXiv.org Artificial IntelligenceOct-28-2025

Semantic segmentation has emerged as a fundamental problem in computer vision, gaining particular importance in real-time applications such as autonomous driving. The main challenge is achieving high accuracy while operating under computational and hardware constraints. In this research, we present an FPGA-based implementation of real-time semantic segmentation leveraging the lightweight LMIINet architecture and the Coarse-Grained Reconfigurable Array for Machine Learning (CGRA4ML) hardware framework. The model was trained using Quantization-Aware Training (QAT) with 8-bit precision on the Cityscapes dataset, reducing memory footprint by a factor of four while enabling efficient fixed-point computations. Necessary modifications were applied to adapt the model to CGRA4ML constraints, including simplifying skip connections, employing hardware-friendly operations such as depthwise-separable and 1A-1 convolutions, and redesigning parts of the Flatten Transformer. Our implementation achieves approximately 90% pixel accuracy and 45% mean Intersection-over-Union (mIoU), operating in real-time at 20 frames per second (FPS) with 50.1 ms latency on the ZCU104 FPGA board. The results demonstrate the potential of CGRA4ML, with its flexibility in mapping modern layers and off-chip memory utilization for skip connections, provides a path for implementing advanced semantic segmentation networks on FPGA for real-time applications to outperform traditional GPU solutions in terms of power efficiency while maintaining competitive accuracy. The code for this project is publicly available at https://github.com/STAmirr/ cgra4ml_semantic_segmentation

machine learning, real time system, segmentation, (20 more...)

arXiv.org Artificial Intelligence

2510.22243

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Transportation > Ground > Road (0.49)
Automobiles & Trucks (0.49)
Information Technology > Robotics & Automation (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.72)

Add feedback

CGRA4ML: A Framework to Implement Modern Neural Networks for Scientific Edge Computing

Abarajithan, G, Ma, Zhenghua, Li, Zepeng, Koparkar, Shrideep, Munasinghe, Ravidu, Restuccia, Francesco, Kastner, Ryan

arXiv.org Artificial IntelligenceAug-28-2024

Scientific edge computing increasingly relies on hardware-accelerated neural networks to implement complex, near-sensor processing at extremely high throughputs and low latencies. Existing frameworks like HLS4ML are effective for smaller models, but struggle with larger, modern neural networks due to their requirement of spatially implementing the neural network layers and storing all weights in on-chip memory. CGRA4ML is an open-source, modular framework designed to bridge the gap between neural network model complexity and extreme performance requirements. CGRA4ML extends the capabilities of HLS4ML by allowing off-chip data storage and supporting a broader range of neural network architectures, including models like ResNet, PointNet, and transformers. Unlike HLS4ML, CGRA4ML generates SystemVerilog RTL, making it more suitable for targeting ASIC and FPGA design flows. We demonstrate the effectiveness of our framework by implementing and scaling larger models that were previously unattainable with HLS4ML, showcasing its adaptability and efficiency in handling complex computations. CGRA4ML also introduces an extensive verification framework, with a generated runtime firmware that enables its integration into different SoC platforms. CGRA4ML's minimal and modular infrastructure of Python API, SystemVerilog hardware, Tcl toolflows, and C runtime, facilitates easy integration and experimentation, allowing scientists to focus on innovation rather than the intricacies of hardware design and optimization.

cgra4ml, hl 4, neural network, (16 more...)

arXiv.org Artificial Intelligence

2408.15561

Country:

North America > United States > Illinois > Kane County > Batavia (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Asia > Sri Lanka (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine (0.68)
Semiconductors & Electronics (0.48)
Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback