Ding, Li
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Li, Jinhao, Xu, Jiaming, Huang, Shan, Chen, Yonghua, Li, Wen, Liu, Jun, Lian, Yaoxiu, Pan, Jiayi, Ding, Li, Zhou, Hao, Wang, Yu, Dai, Guohao
Large Language Models (LLMs) have demonstrated remarkable capabilities across various fields, from natural language understanding to text generation. Compared to non-generative LLMs like BERT and DeBERTa, generative LLMs like GPT series and Llama series are currently the main focus due to their superior algorithmic performance. The advancements in generative LLMs are closely intertwined with the development of hardware capabilities. Various hardware platforms exhibit distinct hardware characteristics, which can help improve LLM inference performance. Therefore, this paper comprehensively surveys efficient generative LLM inference on different hardware platforms. First, we provide an overview of the algorithm architecture of mainstream generative LLMs and delve into the inference process. Then, we summarize different optimization methods for different platforms such as CPU, GPU, FPGA, ASIC, and PIM/NDP, and provide inference results for generative LLMs. Furthermore, we perform a qualitative and quantitative comparison of inference performance with batch sizes 1 and 8 on different hardware platforms by considering hardware power consumption, absolute inference speed (tokens/s), and energy efficiency (tokens/J). We compare the performance of the same optimization methods across different hardware platforms, the performance across different hardware platforms, and the performance of different methods on the same hardware platform. This provides a systematic and comprehensive summary of existing inference acceleration work by integrating software optimization methods and hardware platforms, which can point to the future trends and potential developments of generative LLMs and hardware technology for edge-side scenarios.
Pareto-Optimal Learning from Preferences with Hidden Context
Boldi, Ryan, Ding, Li, Spector, Lee, Niekum, Scott
Ensuring AI models align with human values is essential for their safety and functionality. Reinforcement learning from human feedback (RLHF) uses human preferences to achieve this alignment. However, preferences sourced from diverse populations can result in point estimates of human values that may be sub-optimal or unfair to specific groups. We propose Pareto Optimal Preference Learning (POPL), which frames discrepant group preferences as objectives with potential trade-offs, aiming for policies that are Pareto-optimal on the preference dataset. POPL utilizes Lexicase selection, an iterative process to select diverse and Pareto-optimal solutions. Our empirical evaluations demonstrate that POPL surpasses baseline methods in learning sets of reward functions, effectively catering to distinct groups without access to group numbers or membership labels. Furthermore, we illustrate that POPL can serve as a foundation for techniques optimizing specific notions of group fairness, ensuring inclusive and equitable AI model alignment.
DALex: Lexicase-like Selection via Diverse Aggregation
Ni, Andrew, Ding, Li, Spector, Lee
Lexicase selection has been shown to provide advantages over other selection algorithms in several areas of evolutionary computation and machine learning. In its standard form, lexicase selection filters a population or other collection based on randomly ordered training cases that are considered one at a time. This iterated filtering process can be time-consuming, particularly in settings with large numbers of training cases. In this paper, we propose a new method that is nearly equivalent to lexicase selection in terms of the individuals that it selects, but which does so significantly more quickly. The new method, called DALex (for Diversely Aggregated Lexicase), selects the best individual with respect to a weighted sum of training case errors, where the weights are randomly sampled. This allows us to formulate the core computation required for selection as matrix multiplication instead of recursive loops of comparisons, which in turn allows us to take advantage of optimized and parallel algorithms designed for matrix multiplication for speedup. Furthermore, we show that we can interpolate between the behavior of lexicase selection and its "relaxed" variants, such as epsilon or batch lexicase selection, by adjusting a single hyperparameter, named "particularity pressure," which represents the importance granted to each individual training case. Results on program synthesis, deep learning, symbolic regression, and learning classifier systems demonstrate that DALex achieves significant speedups over lexicase selection and its relaxed variants while maintaining almost identical problem-solving performance. Under a fixed computational budget, these savings free up resources that can be directed towards increasing population size or the number of generations, enabling the potential for solving more difficult problems.
Optimizing Neural Networks with Gradient Lexicase Selection
Ding, Li, Spector, Lee
One potential drawback of using aggregated performance measurement in machine learning is that models may learn to accept higher errors on some training cases as compromises for lower errors on others, with the lower errors actually being instances of overfitting. This can lead to both stagnation at local optima and poor generalization. Lexicase selection is an uncompromising method developed in evolutionary computation, which selects models on the basis of sequences of individual training case errors instead of using aggregated metrics such as loss and accuracy. In this paper, we investigate how lexicase selection, in its general form, can be integrated into the context of deep learning to enhance generalization. We propose Gradient Lexicase Selection, an optimization framework that combines gradient descent and lexicase selection in an evolutionary fashion. Our experimental results demonstrate that the proposed method improves the generalization performance of various widely-used deep neural network architectures across three image classification benchmarks. Additionally, qualitative analysis suggests that our method assists networks in learning more diverse representations. Modern data-driven learning algorithms, in general, define an optimization objective, e.g., a fitness function for parent selection in genetic algorithms (Holland, 1992) or a loss function for gradient descent in deep learning (LeCun et al., 2015), which computes the aggregate performance on the training data to guide the optimization process. Taking the image classification problem as an example, most recent approaches use Cross-Entropy loss with gradient descent (Bottou, 2010) and backpropagation (Rumelhart et al., 1985) to train deep neural networks (DNNs) on batches of training images. Despite the success that advanced DNNs can reach human-level performance on the image recognition task (Russakovsky et al., 2015), one potential drawback for such aggregated performance measurement is that the model may learn to seek "compromises" during the learning procedure, e.g., optimizing model weights to intentionally keep some errors in order to gain higher likelihood on correct predictions.
Quality Diversity through Human Feedback
Ding, Li, Zhang, Jenny, Clune, Jeff, Spector, Lee, Lehman, Joel
Reinforcement Learning from Human Feedback (RLHF) has shown potential in qualitative tasks where clear objectives are lacking. However, its effectiveness is not fully realized when it is conceptualized merely as a tool to optimize average human preferences, especially in generative tasks that demand diverse model responses. Meanwhile, Quality Diversity (QD) algorithms excel at identifying diverse and high-quality solutions but often rely on manually crafted diversity metrics. This paper introduces Quality Diversity through Human Feedback (QDHF), a novel approach integrating human feedback into the QD framework. QDHF infers diversity metrics from human judgments of similarity among solutions, thereby enhancing the applicability and effectiveness of QD algorithms. Our empirical studies show that QDHF significantly outperforms state-of-the-art methods in automatic diversity discovery and matches the efficacy of using manually crafted metrics for QD on standard benchmarks in robotics and reinforcement learning. Notably, in a latent space illumination task, QDHF substantially enhances the diversity in images generated by a diffusion model and was more favorably received in user studies. We conclude by analyzing QDHF's scalability and the quality of its derived diversity metrics, emphasizing its potential to improve exploration and diversity in complex, open-ended optimization tasks. Source code is available on GitHub: https://github.com/ld-ing/qdhf.
Ever Evolving Evaluator (EV3): Towards Flexible and Reliable Meta-Optimization for Knowledge Distillation
Ding, Li, Zoghi, Masrour, Tennenholtz, Guy, Karimzadehgan, Maryam
We introduce EV3, a novel meta-optimization framework designed to efficiently train scalable machine learning models through an intuitive explore-assess-adapt protocol. In each iteration of EV3, we explore various model parameter updates, assess them using pertinent evaluation methods, and then adapt the model based on the optimal updates and previous progress history. EV3 offers substantial flexibility without imposing stringent constraints like differentiability on the key objectives relevant to the tasks of interest, allowing for exploratory updates with intentionally-biased gradients and through a diversity of losses and optimizers. Additionally, the assessment phase provides reliable safety controls to ensure robust generalization, and can dynamically prioritize tasks in scenarios with multiple objectives. With inspiration drawn from evolutionary algorithms, meta-learning, and neural architecture search, we investigate an application of EV3 to knowledge distillation. Our experimental results illustrate EV3's capability to safely explore the modeling landscape, while hinting at its potential applicability across numerous domains due to its inherent flexibility and adaptability.
Particularity
Spector, Lee, Ding, Li, Boldi, Ryan
We describe a design principle for adaptive systems under which adaptation is driven by particular challenges that the environment poses, as opposed to average or otherwise aggregated measures of performance over many challenges. We trace the development of this "particularity" approach from the use of lexicase selection in genetic programming to "particularist" approaches to other forms of machine learning and to the design of adaptive systems more generally.
Probabilistic Lexicase Selection
Ding, Li, Pantridge, Edward, Spector, Lee
Lexicase selection is a widely used parent selection algorithm in genetic programming, known for its success in various task domains such as program synthesis, symbolic regression, and machine learning. Due to its non-parametric and recursive nature, calculating the probability of each individual being selected by lexicase selection has been proven to be an NP-hard problem, which discourages deeper theoretical understanding and practical improvements to the algorithm. In this work, we introduce probabilistic lexicase selection (plexicase selection), a novel parent selection algorithm that efficiently approximates the probability distribution of lexicase selection. Our method not only demonstrates superior problem-solving capabilities as a semantic-aware selection method, but also benefits from having a probabilistic representation of the selection process for enhanced efficiency and flexibility. Experiments are conducted in two prevalent domains in genetic programming: program synthesis and symbolic regression, using standard benchmarks including PSB and SRBench. The empirical results show that plexicase selection achieves state-of-the-art problem-solving performance that is competitive to the lexicase selection, and significantly outperforms lexicase selection in computation efficiency.
Arguing Machines: Human Supervision of Black Box AI Systems That Make Life-Critical Decisions
Fridman, Lex, Ding, Li, Jenik, Benedikt, Reimer, Bryan
We consider the paradigm of a black box AI system that makes life-critical decisions. We propose an "arguing machines" framework that pairs the primary AI system with a secondary one that is independently trained to perform the same task. We show that disagreement between the two systems, without any knowledge of underlying system design or operation, is sufficient to arbitrarily improve the accuracy of the overall decision pipeline given human supervision over disagreements. We demonstrate this system in two applications: (1) an illustrative example of image classification and (2) on large-scale real-world semi-autonomous driving data. For the first application, we apply this framework to image classification achieving a reduction from 8.0% to 2.8% top-5 error on ImageNet. For the second application, we apply this framework to Tesla Autopilot and demonstrate the ability to predict 90.4% of system disengagements that were labeled by human annotators as challenging and needing human supervision.
Reports of the AAAI 2011 Fall Symposia
Blisard, Sam (Naval Research Laboratory) | Carmichael, Ted (University of North Carolina at Charlotte) | Ding, Li (University of Maryland, Baltimore County) | Finin, Tim (University of Maryland, Baltimore County) | Frost, Wende (Naval Research Laboratory) | Graesser, Arthur (University of Memphis) | Hadzikadic, Mirsad (University of North Carolina at Charlotte) | Kagal, Lalana (Massachusetts Institute of Technology) | Kruijff, Geert-Jan M. (German Research Center for Artificial Intelligence) | Langley, Pat (Arizona State University) | Lester, James (North Carolina State University) | McGuinness, Deborah L. (Rensselaer Polytechnic Institute) | Mostow, Jack (Carnegie Mellon University) | Papadakis, Panagiotis (University of Sapienza, Rome) | Pirri, Fiora (Sapienza University of Rome) | Prasad, Rashmi (University of Wisconsin-Milwaukee) | Stoyanchev, Svetlana (Columbia University) | Varakantham, Pradeep (Singapore Management University)
The Association for the Advancement of Artificial Intelligence was pleased to present the 2011 Fall Symposium Series, held Friday through Sunday, November 4–6, at the Westin Arlington Gateway in Arlington, Virginia. The titles of the seven symposia are as follows: (1) Advances in Cognitive Systems; (2) Building Representations of Common Ground with Intelligent Agents; (3) Complex Adaptive Systems: Energy, Information and Intelligence; (4) Multiagent Coordination under Uncertainty; (5) Open Government Knowledge: AI Opportunities and Challenges; (6) Question Generation; and (7) Robot-Human Teamwork in Dynamic Adverse Environment. The highlights of each symposium are presented in this report.