AITopics | pragma

Collaborating Authors

pragma

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Towards a Comprehensive Benchmark for High-Level Synthesis Targeted to FPGAs

Neural Information Processing SystemsDec-26-2025, 07:57:39 GMT

comprehensive benchmark, high-level synthesis targeted, name change, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.36)

Add feedback

PRAGMA: A Profiling-Reasoned Multi-Agent Framework for Automatic Kernel Optimization

Lei, Kelun, Yang, Hailong, Zhang, Huaitao, You, Xin, Zhang, Kaige, Luan, Zhongzhi, Liu, Yi, Qian, Depei

arXiv.org Artificial IntelligenceNov-25-2025

Abstract--Designing high-performance kernels requires expert-level tuning and a deep understanding of hardware characteristics. Recent advances in large language models (LLMs) have enabled automated kernel generation, yet most existing systems rely solely on correctness or execution time feedback, lacking the ability to reason about low-level performance bottlenecks. In this paper, we introduce PRAGMA, a profile-guided AI kernel generation framework that integrates execution feedback and fine-grained hardware profiling into the reasoning loop. PRAGMA enables LLMs to identify performance bottlenecks, preserve historical best versions, and iteratively refine code quality. Results show that PRAGMA consistently outperforms baseline N-PRAGMA without profiling enabled and achieves 2.81 and 2.30 averaged speedups against T orch on CPU and GPU platforms, respectively. Optimizing computational kernels is fundamental to achieving high performance in modern AI and HPC systems. Traditionally, reaching near-peak efficiency has required extensive manual tuning and deep expertise in architecture-specific optimization, making the development and maintenance of high-performance kernels both labor-intensive and error-prone.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2511.06345

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Towards a Comprehensive Benchmark for High-Level Synthesis Targeted to FPGAs

Neural Information Processing SystemsOct-9-2025, 01:05:30 GMT

Compiler directives in the form of pragmas play a crucial role in modifying the microarchitecture within the HLS framework. However, the number of possible microarchitectures grows exponentially with the number of pragmas.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe (0.04)

Industry:

Semiconductors & Electronics (0.47)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

VibeCodeHPC: An Agent-Based Iterative Prompting Auto-Tuner for HPC Code Generation Using LLMs

Hayashi, Shun-ichiro, Morita, Koki, Mukunoki, Daichi, Hoshino, Tetsuya, Katagiri, Takahiro

arXiv.org Artificial IntelligenceOct-2-2025

We propose VibeCodeHPC, an automatic tuning system for HPC programs based on multi-agent LLMs for code generation. VibeCodeHPC tunes programs through multi-agent role allocation and iterative prompt refinement. We describe the system configuration with four roles: Project Manager (PM), System Engineer (SE), Programmer (PG), and Continuous Delivery (CD). We introduce dynamic agent deployment and activity monitoring functions to facilitate effective multi-agent collaboration. In our case study, we convert and optimize CPU-based matrix-matrix multiplication code written in C to GPU code using CUDA. The multi-agent configuration of VibeCodeHPC achieved higher-quality code generation per unit time compared to a solo-agent configuration. Additionally, the dynamic agent deployment and activity monitoring capabilities facilitated more effective identification of requirement violations and other issues.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.00031

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

ACCeLLiuM: Supervised Fine-Tuning for Automated OpenACC Pragma Generation

Jhaveri, Samyak, Klotzmann, Vanessa, Lopes, Crista

arXiv.org Artificial IntelligenceSep-29-2025

The increasing ubiquity of GPUs is accompanied by the increasing complexity of their hardware and parallel programming frameworks. Directive-based parallel programming standards like OpenACC simplify GPU programming to some extent by abstracting away low-level complexities, but a fair amount of expertise is still required in order to use those directives effectively. We introduce ACCeLLiuM, two open weights Large Language Models specifically fine-tuned for generating expert OpenACC directives for data-parallel loops, along with the supervised fine-tuning dataset that was used to train them. The ACCeLLiuM SFT dataset contains 4,033 OpenACC pragma-loop pairs mined from public GitHub C/C++ repositories, with 3,223 pairs for training and 810 for testing. Experimental evaluations show a pronounced performance gap in generating correct OpenACC pragmas between base LLMs and our fine-tuned versions. On the held-out test set, base LLMs fail to consistently generate valid pragmas, whereas LLMs fine-tuned on the ACCeLLiuM dataset generate valid pragmas with the correct directive type for $87\%$ of the data-parallel loops, and exact pragmas--including directives, clauses, clause order, and clause variables--for $50\%$ of the cases. Even when not exact, generated pragmas frequently incorporate the correct clauses in a different order than the ground-truth label, or include additional clauses that enable finer control over parallel execution, data movement, and concurrency, offering practical value beyond strict string-matching. By publicly releasing the code, models, and dataset as ACCeLLiuM we hope to establish a reproducible benchmark for LLM-powered OpenACC pragma generation, and lower the barrier to automated GPU offloading of serially written programs.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2509.2038

Country: North America > United States > California > Orange County > Irvine (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization

Lange, Robert Tjarko, Sun, Qi, Prasad, Aaditya, Faldor, Maxence, Tang, Yujin, Ha, David

arXiv.org Artificial IntelligenceSep-19-2025

Recent advances in large language models (LLMs) demonstrate their effectiveness in scaling test-time compute for software engineering tasks. However, these approaches often focus on high-level solutions, with limited attention to optimizing low-level CUDA kernel implementations. Additionally, existing kernel generation benchmarks suffer from exploitable loopholes and insufficient diversity in testing conditions, hindering true generalization assessment. To address these limitations, we introduce robust-kbench, a new benchmark for rigorous evaluation of kernel performance and correctness across varied scenarios. Furthermore, we present a comprehensive agentic framework that automates CUDA kernel discovery, verification, and optimization. This pipeline enables frontier LLMs to translate torch code to CUDA kernels and iteratively improve their runtime within our robust evaluation setting. Our sequential workflow first translates PyTorch code into equivalent CUDA kernels. It then optimizes their runtime using a novel evolutionary meta-generation procedure tailored to the CUDA ecosystem, guided by LLM-based verifiers for correctness and efficient filtering. Evaluated on robust-kbench, our approach produces CUDA kernels outperforming torch implementations for practical applications, including forward and backward passes. It can fuse operations and deploy various runtime optimization strategies. The verifier workflow accurately classifies incorrect kernels, enhancing hardware verification efficiency.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2509.14279

Genre:

Research Report (0.81)
Workflow (0.54)

Industry: Information Technology > Hardware (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ForgeHLS: A Large-Scale, Open-Source Dataset for High-Level Synthesis

Peng, Zedong, Li, Zeju, Gao, Mingzhe, Xu, Qiang, Zhang, Chen, Zhao, Jieru

arXiv.org Artificial IntelligenceAug-5-2025

High-Level Synthesis (HLS) plays a crucial role in modern hardware design by transforming high-level code into optimized hardware implementations. However, progress in applying machine learning (ML) to HLS optimization has been hindered by a shortage of sufficiently large and diverse datasets. To bridge this gap, we introduce ForgeHLS, a large-scale, open-source dataset explicitly designed for ML-driven HLS research. ForgeHLS comprises over 400k diverse designs generated from 846 kernels covering a broad range of application domains, consuming over 200k CPU hours during dataset construction. Each kernel includes systematically automated pragma insertions (loop unrolling, pipelining, array partitioning), combined with extensive design space exploration using Bayesian optimization. Compared to existing datasets, ForgeHLS significantly enhances scale, diversity, and design coverage. We further define and evaluate representative downstream tasks in Quality of Result (QoR) prediction and automated pragma exploration, clearly demonstrating ForgeHLS utility for developing and improving ML-based HLS optimization methodologies. The dataset and code are public at https://github.com/zedong-peng/ForgeHLS.

large language model, machine learning, programming language, (19 more...)

arXiv.org Artificial Intelligence

2507.03255

Country: Asia > China (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Software > Programming Languages (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

LIFT: LLM-Based Pragma Insertion for HLS via GNN Supervised Fine-Tuning

Prakriya, Neha, Ding, Zijian, Sun, Yizhou, Cong, Jason

arXiv.org Artificial IntelligenceMay-1-2025

--FPGAs are increasingly adopted in datacenter environments for their reconfigurability and energy efficiency. High-Level Synthesis (HLS) tools have eased FPGA programming by raising the abstraction level from RTL to untimed C/C++, yet attaining high performance still demands expert knowledge and iterative manual insertion of optimization pragmas to modify the microarchitecture. T o address this challenge, we propose LIFT, a large language model (LLM)-based coding assistant for HLS that automatically generates performance-critical pragmas given a C/C++ design. On average, LIFT produces designs that improve performance by 3.52 and 2.16 than prior state-of the art AutoDSE and HARP respectively, and 66 than GPT -4o. Data center applications require high-performance, low-power, scalable, and reconfigurable hardware. With the end of Dennard's scaling [1], these requirements are becoming increasingly critical to address. FPGAs emerge as a powerful solution and in recent years have been adopted by major cloud providers such as A WS, Microsoft, and Alibaba in their servers. Despite their potential, FPGAs remain challenging to program and deploy efficiently. High-Level Synthesis (HLS) tools such as Vitis HLS [2], Merlin [3], and Intel HLS [4] aim to bridge this gap by raising the abstraction level from low-level RTL to C/C++.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2504.21187

Country: North America > United States > California (0.28)

Genre: Research Report (0.65)

Industry: Information Technology > Services (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Towards a Comprehensive Benchmark for High-Level Synthesis Targeted to FPGAs

Neural Information Processing SystemsJan-19-2025, 14:30:54 GMT

High-level synthesis (HLS) aims to raise the abstraction layer in hardware design, enabling the design of domain-specific accelerators (DSAs) like field-programmable gate arrays (FPGAs) using C/C instead of hardware description languages (HDLs). Compiler directives in the form of pragmas play a crucial role in modifying the microarchitecture within the HLS framework. However, the space of possible microarchitectures grows exponentially with the number of pragmas. To accelerate this process, machine learning models have been used to predict design quality in milliseconds. However, existing open-source datasets for training such models are limited in terms of design complexity and available optimizations. It contains more complex programs with a wider range of optimization pragmas, making it a comprehensive dataset for training and evaluating design quality prediction models.

comprehensive benchmark, fpga, high-level synthesis targeted, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.79)

Add feedback