AITopics | Huang, Shan

Collaborating Authors

Huang, Shan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Enhancing High-Quality Code Generation in Large Language Models with Comparative Prefix-Tuning

Jiang, Yuan, Zhang, Yujian, Lu, Liang, Treude, Christoph, Su, Xiaohong, Huang, Shan, Wang, Tiantian

arXiv.org Artificial IntelligenceMar-19-2025

Large Language Models (LLMs) have been widely adopted in commercial code completion engines, significantly enhancing coding efficiency and productivity. However, LLMs may generate code with quality issues that violate coding standards and best practices, such as poor code style and maintainability, even when the code is functionally correct. This necessitates additional effort from developers to improve the code, potentially negating the efficiency gains provided by LLMs. To address this problem, we propose a novel comparative prefix-tuning method for controllable high-quality code generation. Our method introduces a single, property-specific prefix that is prepended to the activations of the LLM, serving as a lightweight alternative to fine-tuning. Unlike existing methods that require training multiple prefixes, our approach trains only one prefix and leverages pairs of high-quality and low-quality code samples, introducing a sequence-level ranking loss to guide the model's training. This comparative approach enables the model to better understand the differences between high-quality and low-quality code, focusing on aspects that impact code quality. Additionally, we design a data construction pipeline to collect and annotate pairs of high-quality and low-quality code, facilitating effective training. Extensive experiments on the Code Llama 7B model demonstrate that our method improves code quality by over 100% in certain task categories, while maintaining functional correctness. We also conduct ablation studies and generalization experiments, confirming the effectiveness of our method's components and its strong generalization capability.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.0902

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.92)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Speculative Decoding for Verilog: Speed and Quality, All in One

Xu, Changran, Liu, Yi, Zhou, Yunhao, Huang, Shan, Xu, Ningyi, Xu, Qiang

arXiv.org Artificial IntelligenceMar-18-2025

The rapid advancement of large language models (LLMs) has revolutionized code generation tasks across various programming languages. However, the unique characteristics of programming languages, particularly those like Verilog with specific syntax and lower representation in training datasets, pose significant challenges for conventional tokenization and decoding approaches. In this paper, we introduce a novel application of speculative decoding for Verilog code generation, showing that it can improve both inference speed and output quality, effectively achieving speed and quality all in one. Unlike standard LLM tokenization schemes, which often fragment meaningful code structures, our approach aligns decoding stops with syntactically significant tokens, making it easier for models to learn the token distribution. This refinement addresses inherent tokenization issues and enhances the model's ability to capture Verilog's logical constructs more effectively. Our experimental results show that our method achieves up to a 5.05x speedup in Verilog code generation and increases pass@10 functional accuracy on RTLLM by up to 17.19% compared to conventional training strategies. These findings highlight speculative decoding as a promising approach to bridge the quality gap in code generation for specialized programming languages.

code generation, verilog, verilog code generation, (14 more...)

arXiv.org Artificial Intelligence

2503.14153

Country: Asia > China (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

DeepGate4: Efficient and Effective Representation Learning for Circuit Design at Scale

Zheng, Ziyang, Huang, Shan, Zhong, Jianyuan, Shi, Zhengyuan, Dai, Guohao, Xu, Ningyi, Xu, Qiang

arXiv.org Artificial IntelligenceFeb-10-2025

Circuit representation learning has become pivotal in electronic design automation, enabling critical tasks such as testability analysis, logic reasoning, power estimation, and SAT solving. However, existing models face significant challenges in scaling to large circuits due to limitations like over-squashing in graph neural networks and the quadratic complexity of transformer-based models. To address these issues, we introduce DeepGate4, a scalable and efficient graph transformer specifically designed for large-scale circuits. DeepGate4 incorporates several key innovations: (1) an update strategy tailored for circuit graphs, which reduce memory complexity to sub-linear and is adaptable to any graph transformer; (2) a GAT-based sparse transformer with global and local structural encodings for AIGs; and (3) an inference acceleration CUDA kernel that fully exploit the unique sparsity patterns of AIGs. Our extensive experiments on the ITC99 and EPFL benchmarks show that DeepGate4 significantly surpasses state-of-the-art methods, achieving 15.5% and 31.1% performance improvements over the next-best models. Furthermore, the Fused-DeepGate4 variant reduces runtime by 35.1% and memory usage by 46.8%, making it highly efficient for large-scale circuit analysis. These results demonstrate the potential of DeepGate4 to handle complex EDA tasks while offering superior scalability and efficiency.

artificial intelligence, machine learning, transformer, (16 more...)

arXiv.org Artificial Intelligence

2502.01681

Country: Asia > China (0.28)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective

Li, Jinhao, Xu, Jiaming, Huang, Shan, Chen, Yonghua, Li, Wen, Liu, Jun, Lian, Yaoxiu, Pan, Jiayi, Ding, Li, Zhou, Hao, Wang, Yu, Dai, Guohao

arXiv.org Artificial IntelligenceOct-14-2024

Large Language Models (LLMs) have demonstrated remarkable capabilities across various fields, from natural language understanding to text generation. Compared to non-generative LLMs like BERT and DeBERTa, generative LLMs like GPT series and Llama series are currently the main focus due to their superior algorithmic performance. The advancements in generative LLMs are closely intertwined with the development of hardware capabilities. Various hardware platforms exhibit distinct hardware characteristics, which can help improve LLM inference performance. Therefore, this paper comprehensively surveys efficient generative LLM inference on different hardware platforms. First, we provide an overview of the algorithm architecture of mainstream generative LLMs and delve into the inference process. Then, we summarize different optimization methods for different platforms such as CPU, GPU, FPGA, ASIC, and PIM/NDP, and provide inference results for generative LLMs. Furthermore, we perform a qualitative and quantitative comparison of inference performance with batch sizes 1 and 8 on different hardware platforms by considering hardware power consumption, absolute inference speed (tokens/s), and energy efficiency (tokens/J). We compare the performance of the same optimization methods across different hardware platforms, the performance across different hardware platforms, and the performance of different methods on the same hardware platform. This provides a systematic and comprehensive summary of existing inference acceleration work by integrating software optimization methods and hardware platforms, which can point to the future trends and potential developments of generative LLMs and hardware technology for edge-side scenarios.

arxiv preprint arxiv, large language model, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2410.04466

Country:

North America > United States (0.14)
Asia (0.14)

Genre: Overview (1.00)

Industry:

Semiconductors & Electronics (1.00)
Information Technology (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Stephanie: Step-by-Step Dialogues for Mimicking Human Interactions in Social Conversations

Yang, Hao, Lu, Hongyuan, Zeng, Xinhua, Liu, Yang, Zhang, Xiang, Yang, Haoran, Zhang, Yumeng, Huang, Shan, Wei, Yiran, Lam, Wai

arXiv.org Artificial IntelligenceJul-12-2024

In the rapidly evolving field of natural language processing, dialogue systems primarily employ a single-step dialogue paradigm. Although this paradigm is efficient, it lacks the depth and fluidity of human interactions and does not appear natural. We introduce a novel \textbf{Step}-by-Step Dialogue Paradigm (Stephanie), designed to mimic the ongoing dynamic nature of human conversations. By employing a dual learning strategy and a further-split post-editing method, we generated and utilized a high-quality step-by-step dialogue dataset to fine-tune existing large language models, enabling them to perform step-by-step dialogues. We thoroughly present Stephanie. Tailored automatic and human evaluations are conducted to assess its effectiveness compared to the traditional single-step dialogue paradigm. We will release code, Stephanie datasets, and Stephanie LLMs to facilitate the future of chatbot eras.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2407.04093

Country:

North America > United States (0.28)
North America > Canada > Ontario > Toronto (0.14)
Europe > Middle East > Malta (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Enabling Fast 2-bit LLM on GPUs: Memory Alignment and Asynchronous Dequantization

Li, Jinhao, Li, Shiyao, Xu, Jiaming, Huang, Shan, Lian, Yaoxiu, Liu, Jun, Wang, Yu, Dai, Guohao

arXiv.org Artificial IntelligenceDec-13-2023

Large language models (LLMs) have demonstrated impressive abilities in various domains while the inference cost is expensive. The state-of-the-art methods use 2-bit quantization for mainstream LLMs. However, challenges still exist: (1) Nonnegligible accuracy loss for 2-bit quantization. Weights are quantized by groups, while the ranges of weights are large in some groups, resulting in large quantization errors and nonnegligible accuracy loss (e.g. >3% for Llama2-7b with 2-bit quantization in GPTQ and Greenbit). (2) Limited accuracy improvement by adding 4-bit weights. Increasing 10% extra average bit more 4-bit weights only leads to <0.5% accuracy improvement on a quantized Llama2-7b. (3) Time-consuming dequantization operations on GPUs. The dequantization operations lead to >50% execution time, hindering the potential of reducing LLM inference cost. To tackle these challenges, we propose the following techniques: (1) We only quantize a small fraction of groups with the larger range using 4-bit with memory alignment consideration on GPUs.(2) We design the asynchronous dequantization on GPUs, leading to up to 3.92X speedup. We conduct extensive experiments on different model sizes. We achieve 2.85-bit for each weight and the end-to-end speedup for Llama2-7b is 1.74X over the original model, and we reduce both runtime cost and hardware cost by up to 2.70X and 2.81X with less GPU requirements.

large language model, machine learning, quantization, (16 more...)

arXiv.org Artificial Intelligence

2311.16442

Genre: Research Report > Promising Solution (0.55)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Contrastive Credibility Propagation for Reliable Semi-Supervised Learning

Kutt, Brody, Ramteke, Pralay, Mignot, Xavier, Toman, Pamela, Ramanan, Nandini, Chhetri, Sujit Rokka, Huang, Shan, Du, Min, Hewlett, William

arXiv.org Artificial IntelligenceAug-29-2023

Consequently, such systems necessitate external components like Out-of-Distribution (OOD) A fundamental goal of semi-supervised learning (SSL) is to detectors to prevent failures, albeit at the cost of increased ensure the use of unlabeled data results in a classifier that outperforms complexity. Instead of maximizing the robustness to any one a baseline trained only on labeled data (supervised data variable, we strive to build an SSL algorithm that is baseline). However, this is often not the case (Oliver et al. robust to all data variables, i.e. can match or outperform a 2018). The problem is often overlooked as SSL algorithms supervised baseline. To address this challenge, we first hypothesize are frequently evaluated only on clean and balanced datasets that sensitivity to pseudo-label errors is the root where the sole experimental variable is the number of given cause of all failures. This rationale is based on the simple labels. Worse, in the pursuit of maximizing label efficiency, fact that a hypothetical SSL algorithm consisting of a pseudolabeler many modern SSL algorithms such as (Berthelot et al. 2019; with a rejection option and means to build a classifier Sohn et al. 2020; Zheng et al. 2022; Li, Xiong, and Hoi 2021) could always match or outperform its supervised baseline if and others rely on a mechanism that directly encourages the the pseudo-labeler made no mistakes. Such a pseudo-labeler marginal distribution of label predictions to be close to the is unrealistic, of course. Instead, we build into our solution marginal distribution of ground truth labels (known as distribution means to work around those inevitable errors.

artificial intelligence, iteration, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2211.09929

Country:

North America > United States > Michigan (0.14)
North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Qatar (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.71)

Add feedback

TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities

Zhao, Zhe, Li, Yudong, Hou, Cheng, Zhao, Jing, Tian, Rong, Liu, Weijie, Chen, Yiren, Sun, Ningyuan, Liu, Haoyan, Mao, Weiquan, Guo, Han, Guo, Weigang, Wu, Taiqiang, Zhu, Tao, Shi, Wenhang, Chen, Chen, Huang, Shan, Chen, Sihong, Liu, Liqun, Li, Feifei, Chen, Xiaoshuai, Sun, Xingwu, Kang, Zhanhui, Du, Xiaoyong, Shen, Linlin, Yan, Kimmo

arXiv.org Artificial IntelligenceJul-11-2023

Recently, the success of pre-training in text domain has been fully extended to vision, audio, and cross-modal scenarios. The proposed pre-training models of different modalities are showing a rising trend of homogeneity in their model structures, which brings the opportunity to implement different pre-training models within a uniform framework. In this paper, we present TencentPretrain, a toolkit supporting pre-training models of different modalities. The core feature of TencentPretrain is the modular design. The toolkit uniformly divides pre-training models into 5 components: embedding, encoder, target embedding, decoder, and target. As almost all of common modules are provided in each component, users can choose the desired modules from different components to build a complete pre-training model. The modular design enables users to efficiently reproduce existing pre-training models or build brand-new one. We test the toolkit on text, vision, and audio benchmarks and show that it can match the performance of the original implementations.

artificial intelligence, machine learning, tencentpretrain, (17 more...)

arXiv.org Artificial Intelligence

2212.06385

Country: Asia > China (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

TransMRSR: Transformer-based Self-Distilled Generative Prior for Brain MRI Super-Resolution

Huang, Shan, Liu, Xiaohong, Tan, Tao, Hu, Menghan, Wei, Xiaoer, Chen, Tingli, Sheng, Bin

arXiv.org Artificial IntelligenceJun-11-2023

Magnetic resonance images (MRI) acquired with low through-plane resolution compromise time and cost. The poor resolution in one orientation is insufficient to meet the requirement of high resolution for early diagnosis of brain disease and morphometric study. The common Single image super-resolution (SISR) solutions face two main challenges: (1) local detailed and global anatomical structural information combination; and (2) large-scale restoration when applied for reconstructing thick-slice MRI into high-resolution (HR) iso-tropic data. To address these problems, we propose a novel two-stage network for brain MRI SR named TransMRSR based on the convolutional blocks to extract local information and transformer blocks to capture long-range dependencies. TransMRSR consists of three modules: the shallow local feature extraction, the deep non-local feature capture, and the HR image reconstruction. We perform a generative task to encapsulate diverse priors into a generative network (GAN), which is the decoder sub-module of the deep non-local feature capture part, in the first stage. The pre-trained GAN is used for the second stage of SR task. We further eliminate the potential latent space shift caused by the two-stage training strategy through the self-distilled truncation trick. The extensive experiments show that our method achieves superior performance to other SSIR methods on both public and private datasets. Code is released at https://github.com/goddesshs/TransMRSR.git .

artificial intelligence, machine learning, proceedings, (18 more...)

arXiv.org Artificial Intelligence

2306.06669

Country:

Asia > China (0.16)
Europe > Spain (0.14)
Europe > Netherlands (0.14)
Europe > France (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Novel Interpretable and Generalizable Re-synchronization Model for Cued Speech based on a Multi-Cuer Corpus

Gao, Lufei, Huang, Shan, Liu, Li

arXiv.org Artificial IntelligenceJun-5-2023

Cued Speech (CS) is a multi-modal visual coding system combining lip reading with several hand cues at the phonetic level to make the spoken language visible to the hearing impaired. Previous studies solved asynchronous problems between lip and hand movements by a cuer\footnote{The people who perform Cued Speech are called the cuer.}-dependent piecewise linear model for English and French CS. In this work, we innovatively propose three statistical measure on the lip stream to build an interpretable and generalizable model for predicting hand preceding time (HPT), which achieves cuer-independent by a proper normalization. Particularly, we build the first Mandarin CS corpus comprising annotated videos from five speakers including three normal and two hearing impaired individuals. Consequently, we show that the hand preceding phenomenon exists in Mandarin CS production with significant differences between normal and hearing impaired people. Extensive experiments demonstrate that our model outperforms the baseline and the previous state-of-the-art methods.

artificial intelligence, machine learning, recognition, (17 more...)

arXiv.org Artificial Intelligence

2306.02596

Country: Asia > China > Guangdong Province (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback