Overview
On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards
Zhao, Zhimin, Bangash, Abdul Ali, Côgo, Filipe Roseiro, Adams, Bram, Hassan, Ahmed E.
Foundation models (FM), such as large language models (LLMs), which are large-scale machine learning (ML) models, have demonstrated remarkable adaptability in various downstream software engineering (SE) tasks, such as code completion, code understanding, and software development. As a result, FM leaderboards, especially those hosted on cloud platforms, have become essential tools for SE teams to compare and select the best third-party FMs for their specific products and purposes. However, the lack of standardized guidelines for FM evaluation and comparison threatens the transparency of FM leaderboards and limits stakeholders' ability to perform effective FM selection. As a first step towards addressing this challenge, our research focuses on understanding how these FM leaderboards operate in real-world scenarios ("leaderboard operations") and identifying potential leaderboard pitfalls and areas for improvement ("leaderboard smells"). In this regard, we perform a multivocal literature review to collect up to 721 FM leaderboards, after which we examine their documentation and engage in direct communication with leaderboard operators to understand their workflow patterns. Using card sorting and negotiated agreement, we identify 5 unique workflow patterns and develop a domain model that outlines the essential components and their interaction within FM leaderboards. We then identify 8 unique types of leaderboard smells in LBOps. By mitigating these smells, SE teams can improve transparency, accountability, and collaboration in current LBOps practices, fostering a more robust and responsible ecosystem for FM comparison and selection.
Hamilton-Jacobi Reachability in Reinforcement Learning: A Survey
Ganai, Milan, Gao, Sicun, Herbert, Sylvia
Recent literature has proposed approaches that learn control policies with high performance while maintaining safety guarantees. Synthesizing Hamilton-Jacobi (HJ) reachable sets has become an effective tool for verifying safety and supervising the training of reinforcement learning-based control policies for complex, high-dimensional systems. Previously, HJ reachability was limited to verifying low-dimensional dynamical systems -- this is because the computational complexity of the dynamic programming approach it relied on grows exponentially with the number of system states. To address this limitation, in recent years, there have been methods that compute the reachability value function simultaneously with learning control policies to scale HJ reachability analysis while still maintaining a reliable estimate of the true reachable set. These HJ reachability approximations are used to improve the safety, and even reward performance, of learned control policies and can solve challenging tasks such as those with dynamic obstacles and/or with lidar-based or vision-based observations. In this survey paper, we review the recent developments in the field of HJ reachability estimation in reinforcement learning that would provide a foundational basis for further research into reliability in high-dimensional systems.
Learning Coordinated Maneuver in Adversarial Environments
Hu, Zechen, Limbu, Manshi, Shishika, Daigo, Xiao, Xuesu, Wang, Xuan
This paper aims to solve the coordination of a team of robots traversing a route in the presence of adversaries with random positions. Our goal is to minimize the overall cost of the team, which is determined by (i) the accumulated risk when robots stay in adversary-impacted zones and (ii) the mission completion time. During traversal, robots can reduce their speed and act as a `guard' (the slower, the better), which will decrease the risks certain adversary incurs. This leads to a trade-off between the robots' guarding behaviors and their travel speeds. The formulated problem is highly non-convex and cannot be efficiently solved by existing algorithms. Our approach includes a theoretical analysis of the robots' behaviors for the single-adversary case. As the scale of the problem expands, solving the optimal solution using optimization approaches is challenging, therefore, we employ reinforcement learning techniques by developing new encoding and policy-generating methods. Simulations demonstrate that our learning methods can efficiently produce team coordination behaviors. We discuss the reasoning behind these behaviors and explain why they reduce the overall team cost.
Inference Optimization of Foundation Models on AI Accelerators
Park, Youngsuk, Budhathoki, Kailash, Chen, Liangfu, Kübler, Jonas, Huang, Jiaji, Kleindessner, Matthäus, Huan, Jun, Cevher, Volkan, Wang, Yida, Karypis, George
Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI across various industries. Industry and research community have witnessed a large number of new applications, based on those foundation models. Such applications include question and answer, customer services, image and video generation, and code completions, among others. However, as the number of model parameters reaches to hundreds of billions, their deployment incurs prohibitive inference costs and high latency in real-world scenarios. As a result, the demand for cost-effective and fast inference using AI accelerators is ever more higher. To this end, our tutorial offers a comprehensive discussion on complementary inference optimization techniques using AI accelerators. Beginning with an overview of basic Transformer architectures and deep learning system frameworks, we deep dive into system optimization techniques for fast and memory-efficient attention computations and discuss how they can be implemented efficiently on AI accelerators. Next, we describe architectural elements that are key for fast transformer inference. Finally, we examine various model compression and fast decoding strategies in the same context.
Procedural Content Generation via Generative Artificial Intelligence
Mao, Xinyu, Yu, Wanli, Yamada, Kazunori D, Zielewski, Michael R.
The attempt to utilize machine learning in PCG has been made in the past. In this survey paper, we investigate how generative artificial intelligence (AI), which saw a significant increase in interest in the mid-2010s, is being used for PCG. We review applications of generative AI for the creation of various types of content, including terrains, items, and even storylines. While generative AI is effective for PCG, one significant issues it faces is that building high-performance generative AI requires vast amounts of training data. Because content generally highly customized, domain-specific training data is scarce, and straightforward approaches to generative AI models may not work well. For PCG research to advance further, issues related to limited training data must be overcome. Thus, we also give special consideration to research that addresses the challenges posed by limited training data.
A Review of Nine Physics Engines for Reinforcement Learning Research
Kaup, Michael, Wolff, Cornelius, Hwang, Hyerim, Mayer, Julius, Bruni, Elia
We present a review of popular simulation engines and frameworks used in reinforcement learning (RL) research, aiming to guide researchers in selecting tools for creating simulated physical environments for RL and training setups. It evaluates nine frameworks (Brax, Chrono, Gazebo, MuJoCo, ODE, PhysX, PyBullet, Webots, and Unity) based on their popularity, feature range, quality, usability, and RL capabilities. We highlight the challenges in selecting and utilizing physics engines for RL research, including the need for detailed comparisons and an understanding of each framework's capabilities. Key findings indicate MuJoCo as the leading framework due to its performance and flexibility, despite usability challenges. Unity is noted for its ease of use but lacks scalability and simulation fidelity. The study calls for further development to improve simulation engines' usability and performance and stresses the importance of transparency and reproducibility in RL research. This review contributes to the RL community by offering insights into the selection process for simulation engines, facilitating informed decision-making.
Transformer Circuit Faithfulness Metrics are not Robust
Miller, Joseph, Chughtai, Bilal, Saunders, William
Mechanistic interpretability work attempts to reverse engineer the learned algorithms present inside neural networks. One focus of this work has been to discover 'circuits' -- subgraphs of the full model that explain behaviour on specific tasks. But how do we measure the performance of such circuits? Prior work has attempted to measure circuit 'faithfulness' -- the degree to which the circuit replicates the performance of the full model. In this work, we survey many considerations for designing experiments that measure circuit faithfulness by ablating portions of the model's computation. Concerningly, we find existing methods are highly sensitive to seemingly insignificant changes in the ablation methodology. We conclude that existing circuit faithfulness scores reflect both the methodological choices of researchers as well as the actual components of the circuit - the task a circuit is required to perform depends on the ablation used to test it. The ultimate goal of mechanistic interpretability work is to understand neural networks, so we emphasize the need for more clarity in the precise claims being made about circuits. We open source a library at https://github.com/UFO-101/auto-circuit that includes highly efficient implementations of a wide range of ablation methodologies and circuit discovery algorithms.
Brain Tumor Segmentation in MRI Images with 3D U-Net and Contextual Transformer
Nguyen, Thien-Qua T., Nguyen, Hieu-Nghia, Bui, Thanh-Hieu, Nguyen-Tat, Thien B., Ngo, Vuong M.
This research presents an enhanced approach for precise segmentation of brain tumor masses in magnetic resonance imaging (MRI) using an advanced 3D-UNet model combined with a Context Transformer (CoT). By architectural expansion CoT, the proposed model extends its architecture to a 3D format, integrates it smoothly with the base model to utilize the complex contextual information found in MRI scans, emphasizing how elements rely on each other across an extended spatial range. The proposed model synchronizes tumor mass characteristics from CoT, mutually reinforcing feature extraction, facilitating the precise capture of detailed tumor mass structures, including location, size, and boundaries. Several experimental results present the outstanding segmentation performance of the proposed method in comparison to current state-of-the-art approaches, achieving Dice score of 82.0%, 81.5%, 89.0% for Enhancing Tumor, Tumor Core and Whole Tumor, respectively, on BraTS2019.
Toward accessible comics for blind and low vision readers
Rigaud, Christophe, Burie, Jean-Christophe, Petit, Samuel
This work explores how to fine-tune large language models using prompt engineering techniques with contextual information for generating an accurate text description of the full story, ready to be forwarded to off-the-shelve speech synthesis tools. We propose to use existing computer vision and optical character recognition techniques to build a grounded context from the comic strip image content, such as panels, characters, text, reading order and the association of bubbles and characters. Then we infer character identification and generate comic book script with context-aware panel description including character's appearance, posture, mood, dialogues etc. We believe that such enriched content description can be easily used to produce audiobook and eBook with various voices for characters, captions and playing sound effects. Keywords: comics understanding large language model prompt engineering character identification comic book script accessible comics.
Brief state of the art in social information mining: Practical application in analysis of trends in French legislative 2024
The analysis of social media information has undergone significant evolution in the last decade due to advancements in artificial intelligence (AI) and machine learning (ML). This paper provides an overview of the state-of-the-art techniques in social media mining, with a practical application in analyzing trends in the 2024 French legislative elections. We leverage natural language processing (NLP) tools to gauge public opinion by extracting and analyzing comments and reactions from the AgoraVox platform. The study reveals that the National Rally party, led by Marine Le Pen, maintains a high level of engagement on social media, outperforming traditional parties. This trend is corroborated by user interactions, indicating a strong digital presence. The results highlight the utility of advanced AI models, such as transformers and large language models (LLMs), in capturing nuanced public sentiments and predicting political leanings, demonstrating their potential in real-time reputation management and crisis response.