Goto

Collaborating Authors

 Li, Jinhao


SoftmAP: Software-Hardware Co-design for Integer-Only Softmax on Associative Processors

arXiv.org Artificial Intelligence

Abstract--Recent research efforts focus on reducing the computational and memory overheads of Large Language Models (LLMs) to make them feasible on resource-constrained devices. Despite advancements in compression techniques, non-linear operators like Softmax and Layernorm remain bottlenecks due to their sensitivity to quantization. We propose SoftmAP, a softwarehardware co-design methodology that implements an integeronly low-precision Softmax using In-Memory Compute (IMC) hardware. Our method achieves up to three orders of magnitude improvement in the energy-delay product compared to A100 and RTX3090 GPUs, making LLMs more deployable without compromising performance. Softmax contributes up to 38% of the run time for longer sequence lengths.


Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated remarkable capabilities across various fields, from natural language understanding to text generation. Compared to non-generative LLMs like BERT and DeBERTa, generative LLMs like GPT series and Llama series are currently the main focus due to their superior algorithmic performance. The advancements in generative LLMs are closely intertwined with the development of hardware capabilities. Various hardware platforms exhibit distinct hardware characteristics, which can help improve LLM inference performance. Therefore, this paper comprehensively surveys efficient generative LLM inference on different hardware platforms. First, we provide an overview of the algorithm architecture of mainstream generative LLMs and delve into the inference process. Then, we summarize different optimization methods for different platforms such as CPU, GPU, FPGA, ASIC, and PIM/NDP, and provide inference results for generative LLMs. Furthermore, we perform a qualitative and quantitative comparison of inference performance with batch sizes 1 and 8 on different hardware platforms by considering hardware power consumption, absolute inference speed (tokens/s), and energy efficiency (tokens/J). We compare the performance of the same optimization methods across different hardware platforms, the performance across different hardware platforms, and the performance of different methods on the same hardware platform. This provides a systematic and comprehensive summary of existing inference acceleration work by integrating software optimization methods and hardware platforms, which can point to the future trends and potential developments of generative LLMs and hardware technology for edge-side scenarios.


S2O: An Integrated Driving Decision-making Performance Evaluation Method Bridging Subjective Feeling to Objective Evaluation

arXiv.org Artificial Intelligence

Autonomous driving decision-making is one of the critical modules towards intelligent transportation systems, and how to evaluate the driving performance comprehensively and precisely is a crucial challenge. A biased evaluation misleads and hinders decision-making modification and development. Current planning evaluation metrics include deviation from the real driver trajectory and objective driving experience indicators. The former category does not necessarily indicate good driving performance since human drivers also make errors and has been proven to be ineffective in interactive close-loop systems. On the other hand, existing objective driving experience models only consider limited factors, lacking comprehensiveness. And the integration mechanism of various factors relies on intuitive experience, lacking precision. In this research, we propose S2O, a novel integrated decision-making evaluation method bridging subjective human feeling to objective evaluation. First, modified fundamental models of four kinds of driving factors which are safety, time efficiency, comfort, and energy efficiency are established to cover common driving factors. Then based on the analysis of human rating distribution regularity, a segmental linear fitting model in conjunction with a complementary SVM segment classifier is designed to express human's subjective rating by objective driving factor terms. Experiments are conducted on the D2E dataset, which includes approximately 1,000 driving cases and 40,000 human rating scores. Results show that S2O achieves a mean absolute error of 4.58 to ground truth under a percentage scale. Compared with baselines, the evaluation error is reduced by 32.55%. Implementation on the SUMO platform proves the real-time efficiency of online evaluation, and validation on performance evaluation of three autonomous driving planning algorithms proves the feasibility.


Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models

arXiv.org Artificial Intelligence

It has recently been discovered that using a pre-trained vision-language model (VLM), e.g., CLIP, to align a whole query image with several finer text descriptions generated by a large language model can significantly enhance zero-shot performance. However, in this paper, we empirically find that the finer descriptions tend to align more effectively with local areas of the query image rather than the whole image, and then we theoretically validate this finding. Thus, we present a method called weighted visual-text cross alignment (WCA). This method begins with a localized visual prompting technique, designed to identify local visual areas within the query image. The local visual areas are then cross-aligned with the finer descriptions by creating a similarity matrix using the pre-trained VLM. To determine how well a query image aligns with each category, we develop a score function based on the weighted similarities in this matrix. Extensive experiments demonstrate that our method significantly improves zero-shot performance across various datasets, achieving results that are even comparable to few-shot learning methods.


D2E-An Autonomous Decision-making Dataset involving Driver States and Human Evaluation

arXiv.org Artificial Intelligence

With the advancement of deep learning technology, data-driven methods are increasingly used in the decision-making of autonomous driving, and the quality of datasets greatly influenced the model performance. Although current datasets have made significant progress in the collection of vehicle and environment data, emphasis on human-end data including the driver states and human evaluation is not sufficient. In addition, existing datasets consist mostly of simple scenarios such as car following, resulting in low interaction levels. In this paper, we introduce the Driver to Evaluation dataset (D2E), an autonomous decision-making dataset that contains data on driver states, vehicle states, environmental situations, and evaluation scores from human reviewers, covering a comprehensive process of vehicle decision-making. Apart from regular agents and surrounding environment information, we not only collect driver factor data including first-person view videos, physiological signals, and eye attention data, but also provide subjective rating scores from 40 human volunteers. The dataset is mixed of driving simulator scenes and real-road ones. High-interaction situations are designed and filtered to ensure behavior diversity. Through data organization, analysis, and preprocessing, D2E contains over 1100 segments of interactive driving case data covering from human driver factor to evaluation results, supporting the development of data-driven decision-making related algorithms.


Temporal-Aware Deep Reinforcement Learning for Energy Storage Bidding in Energy and Contingency Reserve Markets

arXiv.org Artificial Intelligence

The battery energy storage system (BESS) has immense potential for enhancing grid reliability and security through its participation in the electricity market. BESS often seeks various revenue streams by taking part in multiple markets to unlock its full potential, but effective algorithms for joint-market participation under price uncertainties are insufficiently explored in the existing research. To bridge this gap, we develop a novel BESS joint bidding strategy that utilizes deep reinforcement learning (DRL) to bid in the spot and contingency frequency control ancillary services (FCAS) markets. Our approach leverages a transformer-based temporal feature extractor to effectively respond to price fluctuations in seven markets simultaneously and helps DRL learn the best BESS bidding strategy in joint-market participation. Additionally, unlike conventional "black-box" DRL model, our approach is more interpretable and provides valuable insights into the temporal bidding behavior of BESS in the dynamic electricity market. We validate our method using realistic market prices from the Australian National Electricity Market. The results show that our strategy outperforms benchmarks, including both optimization-based and other DRL-based strategies, by substantial margins. Our findings further suggest that effective temporal-aware bidding can significantly increase profits in the spot and contingency FCAS markets compared to individual market participation.


Attentive Convolutional Deep Reinforcement Learning for Optimizing Solar-Storage Systems in Real-Time Electricity Markets

arXiv.org Artificial Intelligence

This paper studies the synergy of solar-battery energy storage system (BESS) and develops a viable strategy for the BESS to unlock its economic potential by serving as a backup to reduce solar curtailments while also participating in the electricity market. We model the real-time bidding of the solar-battery system as two Markov decision processes for the solar farm and the BESS, respectively. We develop a novel deep reinforcement learning (DRL) algorithm to solve the problem by leveraging attention mechanism (AC) and multi-grained feature convolution to process DRL input for better bidding decisions. Simulation results demonstrate that our AC-DRL outperforms two optimization-based and one DRL-based benchmarks by generating 23%, 20%, and 11% higher revenue, as well as improving curtailment responses. The excess solar generation can effectively charge the BESS to bid in the market, significantly reducing solar curtailments by 76% and creating synergy for the solar-battery system to be more viable.


Enabling Fast 2-bit LLM on GPUs: Memory Alignment and Asynchronous Dequantization

arXiv.org Artificial Intelligence

Large language models (LLMs) have demonstrated impressive abilities in various domains while the inference cost is expensive. The state-of-the-art methods use 2-bit quantization for mainstream LLMs. However, challenges still exist: (1) Nonnegligible accuracy loss for 2-bit quantization. Weights are quantized by groups, while the ranges of weights are large in some groups, resulting in large quantization errors and nonnegligible accuracy loss (e.g. >3% for Llama2-7b with 2-bit quantization in GPTQ and Greenbit). (2) Limited accuracy improvement by adding 4-bit weights. Increasing 10% extra average bit more 4-bit weights only leads to <0.5% accuracy improvement on a quantized Llama2-7b. (3) Time-consuming dequantization operations on GPUs. The dequantization operations lead to >50% execution time, hindering the potential of reducing LLM inference cost. To tackle these challenges, we propose the following techniques: (1) We only quantize a small fraction of groups with the larger range using 4-bit with memory alignment consideration on GPUs.(2) We design the asynchronous dequantization on GPUs, leading to up to 3.92X speedup. We conduct extensive experiments on different model sizes. We achieve 2.85-bit for each weight and the end-to-end speedup for Llama2-7b is 1.74X over the original model, and we reduce both runtime cost and hardware cost by up to 2.70X and 2.81X with less GPU requirements.


Deep Reinforcement Learning for Wind and Energy Storage Coordination in Wholesale Energy and Ancillary Service Markets

arXiv.org Artificial Intelligence

Wind energy has been increasingly adopted to mitigate climate change. However, the variability of wind energy causes wind curtailment, resulting in considerable economic losses for wind farm owners. Wind curtailment can be reduced using battery energy storage systems (BESS) as onsite backup sources. Yet, this auxiliary role may significantly weaken the economic potential of BESS in energy trading. Ideal BESS scheduling should balance onsite wind curtailment reduction and market bidding, but practical implementation is challenging due to coordination complexity and the stochastic nature of energy prices and wind generation. We investigate the joint-market bidding strategy of a co-located wind-battery system in the spot and Regulation Frequency Control Ancillary Service markets. We propose a novel deep reinforcement learning-based approach that decouples the system's market participation into two related Markov decision processes for each facility, enabling the BESS to absorb onsite wind curtailment while performing joint-market bidding to maximize overall operational revenues. Using realistic wind farm data, we validated the coordinated bidding strategy, with outcomes surpassing the optimization-based benchmark in terms of higher revenue by approximately 25\% and more wind curtailment reduction by 2.3 times. Our results show that joint-market bidding can significantly improve the financial performance of wind-battery systems compared to participating in each market separately. Simulations also show that using curtailed wind generation as a power source for charging the BESS can lead to additional financial gains. The successful implementation of our algorithm would encourage co-location of generation and storage assets to unlock wider system benefits.


Optimal Energy Storage Scheduling for Wind Curtailment Reduction and Energy Arbitrage: A Deep Reinforcement Learning Approach

arXiv.org Artificial Intelligence

Wind energy has been rapidly gaining popularity as a means for combating climate change. However, the variable nature of wind generation can undermine system reliability and lead to wind curtailment, causing substantial economic losses to wind power producers. Battery energy storage systems (BESS) that serve as onsite backup sources are among the solutions to mitigate wind curtailment. However, such an auxiliary role of the BESS might severely weaken its economic viability. This paper addresses the issue by proposing joint wind curtailment reduction and energy arbitrage for the BESS. We decouple the market participation of the co-located wind-battery system and develop a joint-bidding framework for the wind farm and BESS. It is challenging to optimize the joint-bidding because of the stochasticity of energy prices and wind generation. Therefore, we leverage deep reinforcement learning to maximize the overall revenue from the spot market while unlocking the BESS's potential in concurrently reducing wind curtailment and conducting energy arbitrage. We validate the proposed strategy using realistic wind farm data and demonstrate that our joint-bidding strategy responds better to wind curtailment and generates higher revenues than the optimization-based benchmark. Our simulations also reveal that the extra wind generation used to be curtailed can be an effective power source to charge the BESS, resulting in additional financial returns.