Overview
Large Language Models for Multi-Robot Systems: A Survey
Li, Peihan, An, Zijian, Abrar, Shams, Zhou, Lifeng
The rapid advancement of Large Language Models (LLMs) has opened new possibilities in Multi-Robot Systems (MRS), enabling enhanced communication, task planning, and human-robot interaction. Unlike traditional single-robot and multi-agent systems, MRS poses unique challenges, including coordination, scalability, and real-world adaptability. This survey provides the first comprehensive exploration of LLM integration into MRS. It systematically categorizes their applications across high-level task allocation, mid-level motion planning, low-level action generation, and human intervention. We highlight key applications in diverse domains, such as household robotics, construction, formation control, target tracking, and robot games, showcasing the versatility and transformative potential of LLMs in MRS. Furthermore, we examine the challenges that limit adapting LLMs in MRS, including mathematical reasoning limitations, hallucination, latency issues, and the need for robust benchmarking systems. Finally, we outline opportunities for future research, emphasizing advancements in fine-tuning, reasoning techniques, and task-specific models. This survey aims to guide researchers in the intelligence and real-world deployment of MRS powered by LLMs. Based on the fast-evolving nature of research in the field, we keep updating the papers in the open-source Github repository.
Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position Encoding
Chen, Zhanpeng, Li, Mingxiao, Chen, Ziyang, Du, Nan, Li, Xiaolong, Zou, Yuexian
Vision-language Models (VLMs) have shown remarkable capabilities in advancing general artificial intelligence, yet the irrational encoding of visual positions persists in inhibiting the models' comprehensive perception performance across different levels of granularity. In this work, we propose Pyramid-descent Visual Position Encoding (PyPE), a novel approach designed to enhance the perception of visual tokens within VLMs. By assigning visual position indexes from the periphery to the center and expanding the central receptive field incrementally, PyPE addresses the limitations of traditional raster-scan methods and mitigates the long-term decay effects induced by Rotary Position Embedding (RoPE). Our method reduces the relative distance between interrelated visual elements and instruction tokens, promoting a more rational allocation of attention weights and allowing for a multi-granularity perception of visual elements and countering the over-reliance on anchor tokens. Extensive experimental evaluations demonstrate that PyPE consistently improves the general capabilities of VLMs across various sizes. Code is available at https://github.com/SakuraTroyChen/PyPE.
New Bounds for Sparse Variational Gaussian Processes
Sparse variational Gaussian processes (GPs) construct tractable posterior approximations to GP models. At the core of these methods is the assumption that the true posterior distribution over training function values ${\bf f}$ and inducing variables ${\bf u}$ is approximated by a variational distribution that incorporates the conditional GP prior $p({\bf f} | {\bf u})$ in its factorization. While this assumption is considered as fundamental, we show that for model training we can relax it through the use of a more general variational distribution $q({\bf f} | {\bf u})$ that depends on $N$ extra parameters, where $N$ is the number of training examples. In GP regression, we can analytically optimize the evidence lower bound over the extra parameters and express a tractable collapsed bound that is tighter than the previous bound. The new bound is also amenable to stochastic optimization and its implementation requires minor modifications to existing sparse GP code. Further, we also describe extensions to non-Gaussian likelihoods. On several datasets we demonstrate that our method can reduce bias when learning the hyperpaparameters and can lead to better predictive performance.
WENDy for Nonlinear-in-Parameter ODEs
Rummel, Nic, Messenger, Daniel A., Becker, Stephen, Dukic, Vanja, Bortz, David M.
The Weak-form Estimation of Non-linear Dynamics (WENDy) algorithm is extended to accommodate systems of ordinary differential equations that are nonlinear-in-parameters (NiP). The extension rests on derived analytic expressions for a likelihood function, its gradient and its Hessian matrix. WENDy makes use of these to approximate a maximum likelihood estimator based on optimization routines suited for non-convex optimization problems. The resulting parameter estimation algorithm has better accuracy, a substantially larger domain of convergence, and is often orders of magnitude faster than the conventional output error least squares method (based on forward solvers). The WENDy.jl algorithm is efficiently implemented in Julia. We demonstrate the algorithm's ability to accommodate the weak form optimization for both additive normal and multiplicative log-normal noise, and present results on a suite of benchmark systems of ordinary differential equations. In order to demonstrate the practical benefits of our approach, we present extensive comparisons between our method and output error methods in terms of accuracy, precision, bias, and coverage.
Knowledge-Guided Wasserstein Distributionally Robust Optimization
Wang, Zitao, Wang, Ziyuan, Liu, Molei, Si, Nian
Transfer learning is a popular strategy to leverage external knowledge and improve statistical efficiency, particularly with a limited target sample. We propose a novel knowledge-guided Wasserstein Distributionally Robust Optimization (KG-WDRO) framework that adaptively incorporates multiple sources of external knowledge to overcome the conservativeness of vanilla WDRO, which often results in overly pessimistic shrinkage toward zero. Our method constructs smaller Wasserstein ambiguity sets by controlling the transportation along directions informed by the source knowledge. This strategy can alleviate perturbations on the predictive projection of the covariates and protect against information loss. Theoretically, we establish the equivalence between our WDRO formulation and the knowledge-guided shrinkage estimation based on collinear similarity, ensuring tractability and geometrizing the feasible set. This also reveals a novel and general interpretation for recent shrinkage-based transfer learning approaches from the perspective of distributional robustness. In addition, our framework can adjust for scaling differences in the regression models between the source and target and accommodates general types of regularization such as lasso and ridge. Extensive simulations demonstrate the superior performance and adaptivity of KG-WDRO in enhancing small-sample transfer learning.
Safety at Scale: A Comprehensive Survey of Large Model Safety
Ma, Xingjun, Gao, Yifeng, Wang, Yixu, Wang, Ruofan, Wang, Xin, Sun, Ye, Ding, Yifan, Xu, Hengyuan, Chen, Yunhao, Zhao, Yunhan, Huang, Hanxun, Li, Yige, Zhang, Jiaming, Zheng, Xiang, Bai, Yang, Wu, Zuxuan, Qiu, Xipeng, Zhang, Jingfeng, Li, Yiming, Sun, Jun, Wang, Cong, Gu, Jindong, Wu, Baoyuan, Chen, Siheng, Zhang, Tianwei, Liu, Yang, Gong, Mingming, Liu, Tongliang, Pan, Shirui, Xie, Cihang, Pang, Tianyu, Dong, Yinpeng, Jia, Ruoxi, Zhang, Yang, Ma, Shiqing, Zhang, Xiangyu, Gong, Neil, Xiao, Chaowei, Erfani, Sarah, Li, Bo, Sugiyama, Masashi, Tao, Dacheng, Bailey, James, Jiang, Yu-Gang
The rapid advancement of large models, driven by their exceptional abilities in learning and generalization through large-scale pre-training, has reshaped the landscape of Artificial Intelligence (AI). These models are now foundational to a wide range of applications, including conversational AI, recommendation systems, autonomous driving, content generation, medical diagnostics, and scientific discovery. However, their widespread deployment also exposes them to significant safety risks, raising concerns about robustness, reliability, and ethical implications. This survey provides a systematic review of current safety research on large models, covering Vision Foundation Models (VFMs), Large Language Models (LLMs), Vision-Language Pre-training (VLP) models, Vision-Language Models (VLMs), Diffusion Models (DMs), and large-model-based Agents. Our contributions are summarized as follows: (1) We present a comprehensive taxonomy of safety threats to these models, including adversarial attacks, data poisoning, backdoor attacks, jailbreak and prompt injection attacks, energy-latency attacks, data and model extraction attacks, and emerging agent-specific threats. (2) We review defense strategies proposed for each type of attacks if available and summarize the commonly used datasets and benchmarks for safety research. (3) Building on this, we identify and discuss the open challenges in large model safety, emphasizing the need for comprehensive safety evaluations, scalable and effective defense mechanisms, and sustainable data practices. More importantly, we highlight the necessity of collective efforts from the research community and international collaboration. Our work can serve as a useful reference for researchers and practitioners, fostering the ongoing development of comprehensive defense systems and platforms to safeguard AI models.
Enhancing Human-Robot Collaboration through Existing Guidelines: A Case Study Approach
Matsubara, Yutaka, Morikawa, Akihisa, Mizuguchi, Daichi, Fujiwara, Kiyoshi
As AI systems become more prevalent, concerns about their development, operation, and societal impact intensify. Establishing ethical, social, and safety standards amidst evolving AI capabilities poses significant challenges. Global initiatives are underway to establish guidelines for AI system development and operation. With the increasing use of collaborative human-AI task execution, it's vital to continuously adapt AI systems to meet user and environmental needs. Failure to synchronize AI evolution with changes in users and the environment could result in ethical and safety issues. This paper evaluates the applicability of existing guidelines in human-robot collaborative systems, assesses their effectiveness, and discusses limitations. Through a case study, we examine whether our target system meets requirements outlined in existing guidelines and propose improvements to enhance human-robot interactions. Our contributions provide insights into interpreting and applying guidelines, offer concrete examples of system enhancement, and highlight their applicability and limitations. We believe these contributions will stimulate discussions and influence system assurance and certification in future AI-infused critical systems.
Maturity Framework for Enhancing Machine Learning Quality
Castelli, Angelantonio, Chouliaras, Georgios Christos, Goldenberg, Dmitri
With the rapid integration of Machine Learning (ML) in business applications and processes, it is crucial to ensure the quality, reliability and reproducibility of such systems. We suggest a methodical approach towards ML system quality assessment and introduce a structured Maturity framework for governance of ML. We emphasize the importance of quality in ML and the need for rigorous assessment, driven by issues in ML governance and gaps in existing frameworks. Our primary contribution is a comprehensive open-sourced quality assessment method, validated with empirical evidence, accompanied by a systematic maturity framework tailored to ML systems. Drawing from applied experience at Booking.com, we discuss challenges and lessons learned during large-scale adoption within organizations. The study presents empirical findings, highlighting quality improvement trends and showcasing business outcomes. The maturity framework for ML systems, aims to become a valuable resource to reshape industry standards and enable a structural approach to improve ML maturity in any organization.
Review for NeurIPS paper: Understanding Deep Architecture with Reasoning Layer
The analysis connects underlying algorithm property and the performance of the deep learning models. In the learning theory analysis, the local Rademacher complexity technique is utilized to obtain tighter bound, which enables to reveal trade-off corresponding to the number of layers. The theoretical findings are justified from numerical experiments. This paper deals with a new problem setting and gives a nice first step. Although its problem setting is quite simple, it is expected that this kind of study will open up a new direction of researches.
Entity Linking using LLMs for Automated Product Carbon Footprint Estimation
Castle, Steffen, Schneider, Julian Moreno, Hennig, Leonhard, Rehm, Georg
Growing concerns about climate change and sustainability are driving manufacturers to take significant steps toward reducing their carbon footprints. For these manufacturers, a first step towards this goal is to identify the environmental impact of the individual components of their products. We propose a system leveraging large language models (LLMs) to automatically map components from manufacturer Bills of Materials (BOMs) to Life Cycle Assessment (LCA) database entries by using LLMs to expand on available component information. Our approach reduces the need for manual data processing, paving the way for more accessible sustainability practices.