Anwar, Ali
Safety Aware Task Planning via Large Language Models in Robotics
Khan, Azal Ahmad, Andrev, Michael, Murtaza, Muhammad Ali, Aguilera, Sergio, Zhang, Rui, Ding, Jie, Hutchinson, Seth, Anwar, Ali
The integration of large language models (LLMs) into robotic task planning has unlocked better reasoning capabilities for complex, long-horizon workflows. However, ensuring safety in LLM-driven plans remains a critical challenge, as these models often prioritize task completion over risk mitigation. This paper introduces SAFER (Safety-Aware Framework for Execution in Robotics), a multi-LLM framework designed to embed safety awareness into robotic task planning. SAFER employs a Safety Agent that operates alongside the primary task planner, providing safety feedback. Additionally, we introduce LLM-as-a-Judge, a novel metric leveraging LLMs as evaluators to quantify safety violations within generated task plans. Our framework integrates safety feedback at multiple stages of execution, enabling real-time risk assessment, proactive error correction, and transparent safety evaluation. We also integrate a control framework using Control Barrier Functions (CBFs) to ensure safety guarantees within SAFER's task planning. We evaluated SAFER against state-of-the-art LLM planners on complex long-horizon tasks involving heterogeneous robotic agents, demonstrating its effectiveness in reducing safety violations while maintaining task efficiency. We also verify the task planner and safety planner through actual hardware experiments involving multiple robots and a human.
FLStore: Efficient Federated Learning Storage for non-training workloads
Khan, Ahmad Faraz, Fountain, Samuel, Abdelmoniem, Ahmed M., Butt, Ali R., Anwar, Ali
Federated Learning (FL) is an approach for privacy-preserving Machine Learning (ML), enabling model training across multiple clients without centralized data collection. With an aggregator server coordinating training, aggregating model updates, and storing metadata across rounds. In addition to training, a substantial part of FL systems are the non-training workloads such as scheduling, personalization, clustering, debugging, and incentivization. Most existing systems rely on the aggregator to handle non-training workloads and use cloud services for data storage. This results in high latency and increased costs as non-training workloads rely on large volumes of metadata, including weight parameters from client updates, hyperparameters, and aggregated updates across rounds, making the situation even worse. We propose FLStore, a serverless framework for efficient FL non-training workloads and storage. FLStore unifies the data and compute planes on a serverless cache, enabling locality-aware execution via tailored caching policies to reduce latency and costs. Per our evaluations, compared to cloud object store based aggregator server FLStore reduces per request average latency by 71% and costs by 92.45%, with peak improvements of 99.7% and 98.8%, respectively. Compared to an in-memory cloud cache based aggregator server, FLStore reduces average latency by 64.6% and costs by 98.83%, with peak improvements of 98.8% and 99.6%, respectively. FLStore integrates seamlessly with existing FL frameworks with minimal modifications, while also being fault-tolerant and highly scalable.
LADs: Leveraging LLMs for AI-Driven DevOps
Khan, Ahmad Faraz, Khan, Azal Ahmad, Mohamed, Anas, Ali, Haider, Moolinti, Suchithra, Haroon, Sabaat, Tahir, Usman, Fazzini, Mattia, Butt, Ali R., Anwar, Ali
Automating cloud configuration and deployment remains a critical challenge due to evolving infrastructures, heterogeneous hardware, and fluctuating workloads. Existing solutions lack adaptability and require extensive manual tuning, leading to inefficiencies and misconfigurations. We introduce LADs, the first LLM-driven framework designed to tackle these challenges by ensuring robustness, adaptability, and efficiency in automated cloud management. Instead of merely applying existing techniques, LADs provides a principled approach to configuration optimization through in-depth analysis of what optimization works under which conditions. By leveraging Retrieval-Augmented Generation, Few-Shot Learning, Chain-of-Thought, and Feedback-Based Prompt Chaining, LADs generates accurate configurations and learns from deployment failures to iteratively refine system settings. Our findings reveal key insights into the trade-offs between performance, cost, and scalability, helping practitioners determine the right strategies for different deployment scenarios. For instance, we demonstrate how prompt chaining-based adaptive feedback loops enhance fault tolerance in multi-tenant environments and how structured log analysis with example shots improves configuration accuracy. Through extensive evaluations, LADs reduces manual effort, optimizes resource utilization, and improves system reliability. By open-sourcing LADs, we aim to drive further innovation in AI-powered DevOps automation.
Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing
Le, Qi, Diao, Enmao, Wang, Ziyan, Wang, Xinran, Ding, Jie, Yang, Li, Anwar, Ali
We introduce Probe Pruning (PP), a novel framework for online, dynamic, structured pruning of Large Language Models (LLMs) applied in a batch-wise manner. PP leverages the insight that not all samples and tokens contribute equally to the model's output, and probing a small portion of each batch effectively identifies crucial weights, enabling tailored dynamic pruning for different batches. It comprises three main stages: probing, history-informed pruning, and full inference. In the probing stage, PP selects a small yet crucial set of hidden states, based on residual importance, to run a few model layers ahead. During the history-informed pruning stage, PP strategically integrates the probing states with historical states. Subsequently, it structurally prunes weights based on the integrated states and the PP importance score, a metric developed specifically to assess the importance of each weight channel in maintaining performance. In the final stage, full inference is conducted on the remaining weights. A major advantage of PP is its compatibility with existing models, as it operates without requiring additional neural network modules or fine-tuning. Comprehensive evaluations of PP on LLaMA-2/3 and OPT models reveal that even minimal probing-using just 1.5% of FLOPs-can substantially enhance the efficiency of structured pruning of LLMs. For instance, when evaluated on LLaMA-2-7B with WikiText2, PP achieves a 2.56 times lower ratio of performance degradation per unit of runtime reduction compared to the state-of-the-art method at a 40% pruning ratio. Our code is available at https://github.com/Qi-Le1/Probe_Pruning.
Evaluating Robustness of Reinforcement Learning Algorithms for Autonomous Shipping
Lesy, Bavo, Anwar, Ali, Mercelis, Siegfried
Recently, there has been growing interest in autonomous shipping due to its potential to improve maritime efficiency and safety. The use of advanced technologies, such as artificial intelligence, can address the current navigational and operational challenges in autonomous shipping. In particular, inland waterway transport (IWT) presents a unique set of challenges, such as crowded waterways and variable environmental conditions. In such dynamic settings, the reliability and robustness of autonomous shipping solutions are critical factors for ensuring safe operations. This paper examines the robustness of benchmark deep reinforcement learning (RL) algorithms, implemented for IWT within an autonomous shipping simulator, and their ability to generate effective motion planning policies. We demonstrate that a model-free approach can achieve an adequate policy in the simulator, successfully navigating port environments never encountered during training. We focus particularly on Soft-Actor Critic (SAC), which we show to be inherently more robust to environmental disturbances compared to MuZero, a state-of-the-art model-based RL algorithm. In this paper, we take a significant step towards developing robust, applied RL frameworks that can be generalized to various vessel types and navigate complex port- and inland environments and scenarios.
MAP: Multi-Human-Value Alignment Palette
Wang, Xinran, Le, Qi, Ahmed, Ammar, Diao, Enmao, Zhou, Yi, Baracaldo, Nathalie, Ding, Jie, Anwar, Ali
Ensuring that generative AI systems align with human values is essential but challenging, especially when considering multiple human values and their potential trade-offs. Since human values can be personalized and dynamically change over time, the desirable levels of value alignment vary across different ethnic groups, industry sectors, and user cohorts. Within existing frameworks, it is hard to define human values and align AI systems accordingly across different directions simultaneously, such as harmlessness, helpfulness, and positiveness. To address this, we develop a novel, first-principle approach called Multi-Human-Value Alignment Palette (MAP), which navigates the alignment across multiple human values in a structured and reliable way. MAP formulates the alignment problem as an optimization task with user-defined constraints, which define human value targets. It can be efficiently solved via a primal-dual approach, which determines whether a user-defined alignment target is achievable and how to achieve it. We conduct a detailed theoretical analysis of MAP by quantifying the trade-offs between values, the sensitivity to constraints, the fundamental connection between multi-value alignment and sequential alignment, and proving that linear weighted rewards are sufficient for multi-value alignment. Extensive experiments demonstrate MAP's ability to align multiple values in a principled manner while delivering strong empirical performance across various tasks. Recent advancements in artificial intelligence (AI) have highlighted the critical need for aligning AI systems with human values, a concept known as human value alignment (Griffith et al., 2013; Arumugam et al., 2019; Gabriel, 2020). The alignment can serve the purpose of generating outcomes that are better suited for human ethics (Griffith et al., 2013), personalized needs (Kirk et al., 2024), or reduced harmful content (Bai et al., 2022). This alignment has traditionally been pursued by adjusting AI behavior to adhere to specific attributes via preference datasets or reward functions. This formulation has deep conceptual roots in the Bayesian decision theoretic framework (Bissiri et al., 2016).
Data selection method for assessment of autonomous vehicles
Trinh, Linh, Anwar, Ali, Mercelis, Siegfried
As the popularity of autonomous vehicles has grown, many standards and regulators, such as ISO, NHTSA, and Euro NCAP, require safety validation to ensure a sufficient level of safety before deploying them in the real world. Manufacturers gather a large amount of public road data for this purpose. However, the majority of these validation activities are done manually by humans. Furthermore, the data used to validate each driving feature may differ. As a result, it is essential to have an efficient data selection method that can be used flexibly and dynamically for verification and validation while also accelerating the validation process. In this paper, we present a data selection method that is practical, flexible, and efficient for assessment of autonomous vehicles. Our idea is to optimize the similarity between the metadata distribution of the selected data and a predefined metadata distribution that is expected for validation. Our experiments on the large dataset BDD100K show that our method can perform data selection tasks efficiently. These results demonstrate that our methods are highly reliable and can be used to select appropriate data for the validation of various safety functions.
ColA: Collaborative Adaptation with Gradient Learning
Diao, Enmao, Le, Qi, Wu, Suya, Wang, Xinran, Anwar, Ali, Ding, Jie, Tarokh, Vahid
A primary function of back-propagation is to compute both the gradient of hidden representations and parameters for optimization with gradient descent. Training large models requires high computational costs due to their vast parameter sizes. While Parameter-Efficient Fine-Tuning (PEFT) methods aim to train smaller auxiliary models to save computational space, they still present computational overheads, especially in Fine-Tuning as a Service (FTaaS) for numerous users. We introduce Collaborative Adaptation (ColA) with Gradient Learning (GL), a parameter-free, model-agnostic fine-tuning approach that decouples the computation of the gradient of hidden representations and parameters. In comparison to PEFT methods, ColA facilitates more cost-effective FTaaS by offloading the computation of the gradient to low-cost devices. We also provide a theoretical analysis of ColA and experimentally demonstrate that ColA can perform on par or better than existing PEFT methods on various benchmarks.
Privacy-Aware Semantic Cache for Large Language Models
Gill, Waris, Elidrisi, Mohamed, Kalapatapu, Pallavi, Anwar, Ali, Gulzar, Muhammad Ali
Large Language Models (LLMs) like ChatGPT and Llama2 have revolutionized natural language processing and search engine dynamics. However, these models incur exceptionally high computational costs. For instance, GPT-3 consists of 175 billion parameters where inference demands billions of floating-point operations. Caching is a natural solution to reduce LLM inference costs on repeated queries which constitute about 31% of the total queries. However, existing caching methods are incapable of finding semantic similarities among LLM queries, leading to unacceptable false hit-and-miss rates. This paper introduces MeanCache, a user-centric semantic cache for LLMs that identifies semantically similar queries to determine cache hit or miss. Using MeanCache, the response to a user's semantically similar query can be retrieved from a local cache rather than re-querying the LLM, thus reducing costs, service provider load, and environmental impact. Existing caching solutions for LLMs raise privacy and scalability concerns and perform wasteful query requests. MeanCache leverages Federated Learning (FL) to collaboratively train a query similarity model across LLM users without violating privacy. By placing a local cache in each user's device and using FL, MeanCache reduces the latency and costs and enhances model performance, resulting in lower false hit rates. MeanCache compresses the embedding dimensions to minimize cache storage and also finds the optimal cosine similarity threshold. Our experiments benchmarked against the state-of-the-art caching method, reveal that MeanCache attains an approximately 17% higher F-score and a 20% increase in precision during semantic cache hit-and-miss decisions. It also reduces the storage requirement by 83% and accelerates semantic cache hit-and-miss decisions by 11%.
Everything You Always Wanted to Know About Storage Compressibility of Pre-Trained ML Models but Were Afraid to Ask
Su, Zhaoyuan, Ahmed, Ammar, Wang, Zirui, Anwar, Ali, Cheng, Yue
As the number of pre-trained machine learning (ML) models is growing exponentially, data reduction tools are not catching up. Existing data reduction techniques are not specifically designed for pre-trained model (PTM) dataset files. This is largely due to a lack of understanding of the patterns and characteristics of these datasets, especially those relevant to data reduction and compressibility. This paper presents the first, exhaustive analysis to date of PTM datasets on storage compressibility. Our analysis spans different types of data reduction and compression techniques, from hash-based data deduplication, data similarity detection, to dictionary-coding compression. Our analysis explores these techniques at three data granularity levels, from model layers, model chunks, to model parameters. We draw new observations that indicate that modern data reduction tools are not effective when handling PTM datasets. There is a pressing need for new compression methods that take into account PTMs' data characteristics for effective storage reduction. Motivated by our findings, we design ELF, a simple yet effective, error-bounded, lossy floating-point compression method. ELF transforms floating-point parameters in such a way that the common exponent field of the transformed parameters can be completely eliminated to save storage space. We develop Elves, a compression framework that integrates ELF along with several other data reduction methods. Elves uses the most effective method to compress PTMs that exhibit different patterns. Evaluation shows that Elves achieves an overall compression ratio of $1.52\times$, which is $1.31\times$, $1.32\times$ and $1.29\times$ higher than a general-purpose compressor (zstd), an error-bounded lossy compressor (SZ3), and the uniform model quantization, respectively, with negligible model accuracy loss.