AITopics | Khailany, Brucek

Collaborating Authors

Khailany, Brucek

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

GauRast: Enhancing GPU Triangle Rasterizers to Accelerate 3D Gaussian Splatting

Li, Sixu, Keller, Ben, Lin, Yingyan Celine, Khailany, Brucek

arXiv.org Artificial IntelligenceMar-20-2025

Abstract--3D intelligence leverages rich 3D features and stands as a promising frontier in AI, with 3D rendering fundamental to many downstream applications. Previous efforts to accelerate 3DGS rely on dedicated accelerators that require substantial integration overhead and hardware costs. These platforms are increasingly crucial due to AI by leveraging rich 3D features to enhance understanding the growing demand for 3D processing in mobile and embedded and interaction within complex environments. Specifically, 3DGS achieves only Fei Li, co-founder of ImageNet, emphasized, "...we need 2-5 FPS on these platforms [22] with commonly used realworld, spatially intelligent AI that can model the world and reason large-scale datasets [3], falling short of the performance about objects, places, and interactions in 3D space and requirement for most practical applications. This underscores the importance of 3D intelligent gap poses challenges for deploying advanced 3D intelligence applications such as autonomous driving [39], robotics [32], in resource-constrained environments, highlighting the need and augmented/virtual reality (AR/VR) [4] shown in Figure 1.

artificial intelligence, gaussian, rasterization, (16 more...)

arXiv.org Artificial Intelligence

2503.16681

Country: North America > United States (1.00)

Genre: Research Report (0.82)

Industry: Government > Regional Government (0.68)

Technology:

Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

BCQ: Block Clustered Quantization for 4-bit (W4A4) LLM Inference

Elangovan, Reena, Sakr, Charbel, Raghunathan, Anand, Khailany, Brucek

arXiv.org Artificial IntelligenceFeb-7-2025

Post-training quantization (PTQ) is a promising approach to reducing the storage and computational requirements of large language models (LLMs) without additional training cost. Recent PTQ studies have primarily focused on quantizing only weights to sub-8-bits while maintaining activations at 8-bits or higher. Accurate sub-8-bit quantization for both weights and activations without relying on quantization-aware training remains a significant challenge. We propose a novel quantization method called block clustered quantization (BCQ) wherein each operand tensor is decomposed into blocks (a block is a group of contiguous scalars), blocks are clustered based on their statistics, and a dedicated optimal quantization codebook is designed for each cluster. As a specific embodiment of this approach, we propose a PTQ algorithm called Locally-Optimal BCQ (LO-BCQ) that iterates between the steps of block clustering and codebook design to greedily minimize the quantization mean squared error. When weight and activation scalars are encoded to W4A4 format (with 0.5-bits of overhead for storing scaling factors and codebook selectors), we advance the current state-of-the-art by demonstrating <1% loss in inference accuracy across several LLMs and downstream tasks.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2502.05376

Country:

North America > United States > New York (0.14)
North America > United States > Indiana > Tippecanoe County (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

SQ-DM: Accelerating Diffusion Models with Aggressive Quantization and Temporal Sparsity

Fan, Zichen, Dai, Steve, Venkatesan, Rangharajan, Sylvester, Dennis, Khailany, Brucek

arXiv.org Artificial IntelligenceJan-26-2025

Diffusion models have gained significant popularity in image generation tasks. However, generating high-quality content remains notably slow because it requires running model inference over many time steps. To accelerate these models, we propose to aggressively quantize both weights and activations, while simultaneously promoting significant activation sparsity. We further observe that the stated sparsity pattern varies among different channels and evolves across time steps. To support this quantization and sparsity scheme, we present a novel diffusion model accelerator featuring a heterogeneous mixed-precision dense-sparse architecture, channel-last address mapping, and a time-step-aware sparsity detector for efficient handling of the sparsity pattern. Our 4-bit quantization technique demonstrates superior generation quality compared to existing 4-bit methods. Our custom accelerator achieves 6.91x speed-up and 51.5% energy reduction compared to traditional dense accelerators.

artificial intelligence, diffusion model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2501.15448

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

ESPACE: Dimensionality Reduction of Activations for Model Compression

Sakr, Charbel, Khailany, Brucek

arXiv.org Artificial IntelligenceOct-7-2024

We propose ESPACE, an LLM compression technique based on dimensionality reduction of activations. Unlike prior works on weight-centric tensor decomposition, ESPACE projects activations onto a pre-calibrated set of principal components. The activation-centrality of the approach enables retraining LLMs with no loss of expressivity; while at inference, weight decomposition is obtained as a byproduct of matrix multiplication associativity. Theoretical results on the construction of projection matrices with optimal computational accuracy are provided. Experimentally, we find ESPACE enables 50% compression of GPT3, Llama2, and Nemotron4 models with small accuracy degradation, as low as a 0.18 perplexity increase on GPT3-22B. At lower compression rates of 20% to 40%, ESPACE drives GPT3 models to outperforming their baseline, by up to a 0.38 decrease in perplexity for GPT3-8B. ESPACE also reduces GEMM execution time and prefill inference latency on existing hardware. Comparison with related works on compressing Llama2-7B via matrix factorization shows that ESPACE is a first step in advancing the state-of-the-art in tensor decomposition compression of LLMs.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2410.05437

Country: Europe > Denmark (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

VerilogEval: Evaluating Large Language Models for Verilog Code Generation

Liu, Mingjie, Pinckney, Nathaniel, Khailany, Brucek, Ren, Haoxing

arXiv.org Artificial IntelligenceDec-9-2023

The increasing popularity of large language models (LLMs) has paved the way for their application in diverse domains. This paper proposes a benchmarking framework tailored specifically for evaluating LLM performance in the context of Verilog code generation for hardware design and verification. We present a comprehensive evaluation dataset consisting of 156 problems from the Verilog instructional website HDLBits. The evaluation set consists of a diverse set of Verilog code generation tasks, ranging from simple combinational circuits to complex finite state machines. The Verilog code completions can be automatically tested for functional correctness by comparing the transient simulation outputs of the generated design with a golden solution. We also demonstrate that the Verilog code generation capability of pretrained language models could be improved with supervised fine-tuning by bootstrapping with LLM generated synthetic problem-code pairs.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2309.07544

Country: Europe (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

ChipNeMo: Domain-Adapted LLMs for Chip Design

Liu, Mingjie, Ene, Teodor-Dumitru, Kirby, Robert, Cheng, Chris, Pinckney, Nathaniel, Liang, Rongjian, Alben, Jonah, Anand, Himyanshu, Banerjee, Sanmitra, Bayraktaroglu, Ismet, Bhaskaran, Bonita, Catanzaro, Bryan, Chaudhuri, Arjun, Clay, Sharon, Dally, Bill, Dang, Laura, Deshpande, Parikshit, Dhodhi, Siddhanth, Halepete, Sameer, Hill, Eric, Hu, Jiashang, Jain, Sumit, Khailany, Brucek, Kokai, George, Kunal, Kishor, Li, Xiaowei, Lind, Charley, Liu, Hao, Oberman, Stuart, Omar, Sujeet, Pratty, Sreedhar, Raiman, Jonathan, Sarkar, Ambar, Shao, Zhengjiang, Sun, Hanfei, Suthar, Pratik P, Tej, Varun, Turner, Walker, Xu, Kaizhe, Ren, Haoxing

arXiv.org Artificial IntelligenceDec-2-2023

ChipNeMo aims to explore the applications of large language models (LLMs) for industrial chip design. Instead of directly deploying off-the-shelf commercial or open-source LLMs, we instead adopt the following domain adaptation techniques: custom tokenizers, domain-adaptive continued pretraining, supervised fine-tuning (SFT) with domain-specific instructions, and domain-adapted retrieval models. We evaluate these methods on three selected LLM applications for chip design: an engineering assistant chatbot, EDA script generation, and bug summarization and analysis. Our results show that these domain adaptation techniques enable significant LLM performance improvements over general-purpose base models across the three evaluated applications, enabling up to 5x model size reduction with similar or better performance on a range of design tasks. Our findings also indicate that there's still room for improvement between our current results and ideal outcomes. We believe that further investigation of domain-adapted LLM approaches will help close this gap in the future.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2311.00176

Country:

Europe (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Semiconductors & Electronics (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer Compression

Gu, Jiaqi, Keller, Ben, Kossaifi, Jean, Anandkumar, Anima, Khailany, Brucek, Pan, David Z.

arXiv.org Artificial IntelligenceNov-30-2022

Transformers have attained superior performance in natural language processing and computer vision. Their self-attention and feedforward layers are overparameterized, limiting inference speed and energy efficiency. Tensor decomposition is a promising technique to reduce parameter redundancy by leveraging tensor algebraic properties to express the parameters in a factorized form. Prior efforts used manual or heuristic factorization settings without hardware-aware customization, resulting in poor hardware efficiencies and large performance degradation. In this work, we propose a hardware-aware tensor decomposition framework, dubbed HEAT, that enables efficient exploration of the exponential space of possible decompositions and automates the choice of tensorization shape and decomposition rank with hardware-aware co-optimization. We jointly investigate tensor contraction path optimizations and a fused Einsum mapping strategy to bridge the gap between theoretical benefits and real hardware efficiency improvement. Our two-stage knowledge distillation flow resolves the trainability bottleneck and thus significantly boosts the final accuracy of factorized Transformers. Overall, we experimentally show that our hardware-aware factorized BERT variants reduce the energy-delay product by 5.7x with less than 1.1% accuracy loss and achieve a better efficiency-accuracy Pareto frontier than hand-tuned and heuristic baselines.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2211.16749

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

An Adversarial Active Sampling-based Data Augmentation Framework for Manufacturable Chip Design

Liu, Mingjie, Yang, Haoyu, Li, Zongyi, Sastry, Kumara, Mukhopadhyay, Saumyadip, Dogru, Selim, Anandkumar, Anima, Pan, David Z., Khailany, Brucek, Ren, Haoxing

arXiv.org Artificial IntelligenceOct-27-2022

Lithography modeling is a crucial problem in chip design to ensure a chip design mask is manufacturable. It requires rigorous simulations of optical and chemical models that are computationally expensive. Recent developments in machine learning have provided alternative solutions in replacing the time-consuming lithography simulations with deep neural networks. However, the considerable accuracy drop still impedes its industrial adoption. Most importantly, the quality and quantity of the training dataset directly affect the model performance. To tackle this problem, we propose a litho-aware data augmentation (LADA) framework to resolve the dilemma of limited data and improve the machine learning model performance. First, we pretrain the neural networks for lithography modeling and a gradient-friendly StyleGAN2 generator. We then perform adversarial active sampling to generate informative and synthetic in-distribution mask designs. These synthetic mask images will augment the original limited training dataset used to finetune the lithography model for improved performance. Experimental results demonstrate that LADA can successfully exploits the neural network capacity by narrowing down the performance gap between the training and testing data instances.

artificial intelligence, deep learning, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2210.15765

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.66)

Industry:

Semiconductors & Electronics (1.00)
Information Technology (0.96)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Large Scale Mask Optimization Via Convolutional Fourier Neural Operator and Litho-Guided Self Training

Yang, Haoyu, Li, Zongyi, Sastry, Kumara, Mukhopadhyay, Saumyadip, Anandkumar, Anima, Khailany, Brucek, Singh, Vivek, Ren, Haoxing

arXiv.org Artificial IntelligenceJul-8-2022

Machine learning techniques have been extensively studied for mask optimization problems, aiming at better mask printability, shorter turnaround time, better mask manufacturability, and so on. However, most of these researches are focusing on the initial solution generation of small design regions. To further realize the potential of machine learning techniques on mask optimization tasks, we present a Convolutional Fourier Neural Operator (CFNO) that can efficiently learn layout tile dependencies and hence promise stitch-less large-scale mask optimization with the limited intervention of legacy tools. We discover the possibility of litho-guided self-training (LGST) through a trained machine learning model when solving non-convex optimization problems, which allows iterative model and dataset update and brings significant model performance improvement. Experimental results show that, for the first time, our machine learning-based framework outperforms state-of-the-art academic numerical mask optimizers with an order of magnitude speedup.

artificial intelligence, latexit sha1, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2207.04056

Country: North America > United States (0.16)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback