Goto

Collaborating Authors

 Paphos







OSIL: Learning Offline Safe Imitation Policies with Safety Inferred from Non-preferred Trajectories

Burnwal, Returaj, Bhatt, Nirav Pravinbhai, Ravindran, Balaraman

arXiv.org Machine Learning

This work addresses the problem of offline safe imitation learning (IL), where the goal is to learn safe and reward-maximizing policies from demonstrations that do not have per-timestep safety cost or reward information. In many real-world domains, online learning in the environment can be risky, and specifying accurate safety costs can be difficult. However, it is often feasible to collect trajectories that reflect undesirable or unsafe behavior, implicitly conveying what the agent should avoid. We refer to these as non-preferred trajectories. We propose a novel offline safe IL algorithm, OSIL, that infers safety from non-preferred demonstrations. We formulate safe policy learning as a Constrained Markov Decision Process (CMDP). Instead of relying on explicit safety cost and reward annotations, OSIL reformulates the CMDP problem by deriving a lower bound on reward maximizing objective and learning a cost model that estimates the likelihood of non-preferred behavior. Our approach allows agents to learn safe and reward-maximizing behavior entirely from offline demonstrations. We empirically demonstrate that our approach can learn safer policies that satisfy cost constraints without degrading the reward performance, thus outperforming several baselines.


178b6cd141e003fa5ff808c7d7d8e2cc-Paper-Conference.pdf

Neural Information Processing Systems

AStackelberg game[30, 31] is a strategic interaction between two utility-maximizing players in which one player (theleader) is able to commit to a (possibly mixed) strategy before the other player (thefollower)takesanaction.


ACM SIGAI Autonomous Agents Award 2026 open for nominations

AIHub

Nominations are solicited for the 2026 ACM SIGAI Autonomous Agents Research Award. This award is made for excellence in research in the area of autonomous agents. It is intended to recognize researchers in autonomous agents whose current work is an important influence on the field. The award is an official ACM award, funded by an endowment created by ACM SIGAI from the proceeds of previous Autonomous Agents conferences. The recipient of the award will receive a monetary prize and a certificate, and will be invited to present a plenary talk at the AAMAS 2026 conference.


Enhancing the development of Cherenkov Telescope Array control software with Large Language Models

Kostunin, Dmitriy, Jones, Elisa, Sotnikov, Vladimir, Sotnikov, Valery, Golovachev, Sergo, Strube, Alexandre

arXiv.org Artificial Intelligence

We develop AI agents based on instruction-finetuned large language models (LLMs) to assist in the engineering and operation of the Cherenkov Telescope Array Observatory (CT AO) Control and Data Acquisition Software (ACADA). These agents align with project-specific documentation and codebases, understand contextual information, interact with external APIs, and communicate with users in natural language.


FLAD: Federated Learning for LLM-based Autonomous Driving in Vehicle-Edge-Cloud Networks

Xiang, Tianao, Zhi, Mingjian, Bi, Yuanguo, Cai, Lin, Chen, Yuhao

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have impressive data fusion and reasoning capabilities for autonomous driving (AD). However, training LLMs for AD faces significant challenges including high computation transmission costs, and privacy concerns associated with sensitive driving data. Federated Learning (FL) is promising for enabling autonomous vehicles (AVs) to collaboratively train models without sharing raw data. We present Federated LLM-based Autonomous Driving (FLAD), an FL framework that leverages distributed multimodal sensory data across AVs in heterogeneous environment. FLAD has three key innovations: (1) a cloud-edge-vehicle collaborative architecture that reduces communication delay and preserving data privacy; (2) an intelligent parallelized collaborative training with a communication scheduling mechanism that optimizes training efficiency, leveraging end-devices otherwise having insufficient resources for model training; and (3) a knowledge distillation method that personalizes LLM according to heterogeneous edge data. In addition, we prototype FLAD in a testbed with NVIDIA Jetsons, overcoming practical implementation challenges including CPU/GPU memory sharing in resource-constrained devices, dynamic model partitions, and fault-tolerant training.Extensive experimental evaluation demonstrates that FLAD achieves superior end-to-end AD performance while efficiently utilizing distributed vehicular resources, opening up new possibilities for future collaborative AD model training and knowledge sharing.