AITopics

2412.15862

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.48)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Artificial IntelligenceDec-20-2024

Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration

Shao, Yijia, Samuel, Vinay, Jiang, Yucheng, Yang, John, Yang, Diyi

Recent advancements in language models (LMs) have sparked growing interest in developing LM agents. While fully autonomous agents could excel in many scenarios, numerous use cases inherently require them to collaborate with humans due to humans' latent preferences, domain expertise, or need for control. To facilitate the study of human-agent collaboration, we present Collaborative Gym (Co-Gym), a general framework enabling asynchronous, tripartite interaction among agents, humans, and task environments. We instantiate Co-Gym with three representative tasks in both simulated and real-world conditions, and propose an evaluation framework that assesses both the collaboration outcomes and processes. Our findings reveal that collaborative agents consistently outperform their fully autonomous counterparts in task performance within those delivered cases, achieving win rates of 86% in Travel Planning, 74% in Tabular Analysis, and 66% in Related Work when evaluated by real users. However, our study also highlights significant challenges in developing collaborative agents, requiring advancements in core aspects of intelligence -- communication capabilities, situational awareness, and balancing autonomy and human control.

artificial intelligence, deep learning, machine learning, (16 more...)

2412.15701

Country:

North America > United States (1.00)
Asia (0.93)

Genre: Research Report > New Finding (1.00)

Industry:

Government (1.00)
Consumer Products & Services > Travel (0.67)

Technology:

Information Technology > Communications > Collaboration (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Kapoor, Aditya, Swamy, Sushant, Tessera, Kale-ab, Baranwal, Mayank, Sun, Mingfei, Khadilkar, Harshad, Albrecht, Stefano V.

Agent-Temporal Credit Assignment for Optimal Policy Preservation in Sparse Multi-Agent Reinforcement Learning

In multi-agent environments, agents often struggle to learn optimal policies due to sparse or delayed global rewards, particularly in long-horizon tasks where it is challenging to evaluate actions at intermediate time steps. We introduce Temporal-Agent Reward Redistribution (TAR$^2$), a novel approach designed to address the agent-temporal credit assignment problem by redistributing sparse rewards both temporally and across agents. TAR$^2$ decomposes sparse global rewards into time-step-specific rewards and calculates agent-specific contributions to these rewards. We theoretically prove that TAR$^2$ is equivalent to potential-based reward shaping, ensuring that the optimal policy remains unchanged. Empirical results demonstrate that TAR$^2$ stabilizes and accelerates the learning process. Additionally, we show that when TAR$^2$ is integrated with single-agent reinforcement learning algorithms, it performs as well as or better than traditional multi-agent reinforcement learning methods.

credit assignment, machine learning, reinforcement learning, (13 more...)

2412.14779

Country:

Asia > India > Maharashtra > Mumbai (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre:

Overview (0.88)
Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

TRAIL: Trust-Aware Client Scheduling for Semi-Decentralized Federated Learning

Hu, Gangqiang, Lu, Jianfeng, Han, Jianmin, Cao, Shuqin, Liu, Jing, Fu, Hao

Due to the sensitivity of data, Federated Learning (FL) is employed to enable distributed machine learning while safeguarding data privacy and accommodating the requirements of various devices. However, in the context of semi-decentralized FL, clients' communication and training states are dynamic. This variability arises from local training fluctuations, heterogeneous data distributions, and intermittent client participation. Most existing studies primarily focus on stable client states, neglecting the dynamic challenges inherent in real-world scenarios. To tackle this issue, we propose a TRust-Aware clIent scheduLing mechanism called TRAIL, which assesses client states and contributions, enhancing model training efficiency through selective client participation. We focus on a semi-decentralized FL framework where edge servers and clients train a shared global model using unreliable intra-cluster model aggregation and inter-cluster model consensus. First, we propose an adaptive hidden semi-Markov model to estimate clients' communication states and contributions. Next, we address a client-server association optimization problem to minimize global training loss. Using convergence analysis, we propose a greedy client scheduling algorithm. Finally, our experiments conducted on real-world datasets demonstrate that TRAIL outperforms state-of-the-art baselines, achieving an improvement of 8.7% in test accuracy and a reduction of 15.3% in training loss.

artificial intelligence, machine learning, mechanism, (16 more...)

2412.11448

Country:

Asia > China > Liaoning Province > Dalian (0.04)
Asia > China > Hubei Province > Wuhan (0.04)

Genre: Research Report (0.64)

Industry:

Energy (0.93)
Information Technology > Security & Privacy (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Online MDP with Transition Prototypes: A Robust Adaptive Approach

Sun, Shuo, Qi, Meng, Shen, Zuo-Jun Max

In this work, we consider an online robust Markov Decision Process (MDP) where we have the information of finitely many prototypes of the underlying transition kernel. We consider an adaptively updated ambiguity set of the prototypes and propose an algorithm that efficiently identifies the true underlying transition kernel while guaranteeing the performance of the corresponding robust policy. To be more specific, we provide a sublinear regret of the subsequent optimal robust policy. We also provide an early stopping mechanism and a worst-case performance bound of the value function. In numerical experiments, we demonstrate that our method outperforms existing approaches, particularly in the early stage with limited data. This work contributes to robust MDPs by considering possible prior information about the underlying transition probability and online learning, offering both theoretical insights and practical algorithms for improved decision-making under uncertainty.

artificial intelligence, machine learning, prototype, (15 more...)

2412.14075

Country:

Asia > China > Hong Kong (0.04)
North America > United States > New York > Tompkins County > Ithaca (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (0.64)

Industry:

Education > Educational Setting (0.48)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Bettini, Matteo, Kortvelesy, Ryan, Prorok, Amanda

Neural diversity is key to collective artificial learning

Many of the world's most pressing issues, such as climate change and global peace, require complex collective problem-solving skills. Recent studies indicate that diversity in individuals' behaviors is key to developing such skills and increasing collective performance. Yet behavioral diversity in collective artificial learning is understudied, with today's machine learning paradigms commonly favoring homogeneous agent strategies over heterogeneous ones, mainly due to computational considerations. In this work, we employ novel diversity measurement and control paradigms to study the impact of behavioral heterogeneity in several facets of collective artificial learning. Through experiments in team play and other cooperative tasks, we show the emergence of unbiased behavioral roles that improve team outcomes; how neural diversity synergizes with morphological diversity; how diverse agents are more effective at finding cooperative solutions in sparse reward settings; and how behaviorally heterogeneous teams learn and retain latent skills to overcome repeated disruptions. Overall, our results indicate that, by controlling diversity, we can obtain non-trivial benefits over homogeneous training paradigms, demonstrating that diversity is a fundamental component of collective artificial learning, an insight thus far overlooked.

agent, artificial intelligence, machine learning, (16 more...)

2412.16244

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment > Sports > Soccer (1.00)
Leisure & Entertainment > Games (1.00)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

VLM-RL: A Unified Vision Language Models and Reinforcement Learning Framework for Safe Autonomous Driving

Huang, Zilin, Sheng, Zihao, Qu, Yansong, You, Junwei, Chen, Sikai

In recent years, reinforcement learning (RL)-based methods for learning driving policies have gained increasing attention in the autonomous driving community and have achieved remarkable progress in various driving scenarios. However, traditional RL approaches rely on manually engineered rewards, which require extensive human effort and often lack generalizability. To address these limitations, we propose \textbf{VLM-RL}, a unified framework that integrates pre-trained Vision-Language Models (VLMs) with RL to generate reward signals using image observation and natural language goals. The core of VLM-RL is the contrasting language goal (CLG)-as-reward paradigm, which uses positive and negative language goals to generate semantic rewards. We further introduce a hierarchical reward synthesis approach that combines CLG-based semantic rewards with vehicle state information, improving reward stability and offering a more comprehensive reward signal. Additionally, a batch-processing technique is employed to optimize computational efficiency during training. Extensive experiments in the CARLA simulator demonstrate that VLM-RL outperforms state-of-the-art baselines, achieving a 10.5\% reduction in collision rate, a 104.6\% increase in route completion rate, and robust generalization to unseen driving scenarios. Furthermore, VLM-RL can seamlessly integrate almost any standard RL algorithms, potentially revolutionizing the existing RL paradigm that relies on manual reward engineering and enabling continuous performance improvements. The demo video and code can be accessed at: https://zilin-huang.github.io/VLM-RL-website.

machine learning, natural language, reinforcement learning, (21 more...)

2412.15544

Country: North America > United States > Wisconsin (0.28)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Novelty-Guided Data Reuse for Efficient and Diversified Multi-Agent Reinforcement Learning

Chen, Yangkun, Yang, Kai, Tao, Jian, Lyu, Jiafei

Recently, deep Multi-Agent Reinforcement Learning (MARL) has demonstrated its potential to tackle complex cooperative tasks, pushing the boundaries of AI in collaborative environments. However, the efficiency of these systems is often compromised by inadequate sample utilization and a lack of diversity in learning strategies. To enhance MARL performance, we introduce a novel sample reuse approach that dynamically adjusts policy updates based on observation novelty. Specifically, we employ a Random Network Distillation (RND) network to gauge the novelty of each agent's current state, assigning additional sample update opportunities based on the uniqueness of the data. We name our method Multi-Agent Novelty-GuidEd sample Reuse (MANGER). This method increases sample efficiency and promotes exploration and diverse agent behaviors. Our evaluations confirm substantial improvements in MARL effectiveness in complex cooperative scenarios such as Google Research Football and super-hard StarCraft II micromanagement tasks.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

2412.15517

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Kumawat, Hemant, Mukhopadhyay, Saibal

AdaCred: Adaptive Causal Decision Transformers with Feature Crediting

Reinforcement learning (RL) can be formulated as a sequence modeling problem, where models predict future actions based on historical state-action-reward sequences. Current approaches typically require long trajectory sequences to model the environment in offline RL settings. However, these models tend to over-rely on memorizing long-term representations, which impairs their ability to effectively attribute importance to trajectories and learned representations based on task-specific relevance. In this work, we introduce AdaCred, a novel approach that represents trajectories as causal graphs built from short-term action-reward-state sequences. Our model adaptively learns control policy by crediting and pruning low-importance representations, retaining only those most relevant for the downstream task. Our experiments demonstrate that AdaCred-based policies require shorter trajectory sequences and consistently outperform conventional methods in both offline reinforcement learning and imitation learning environments.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2412.15427

Country: North America > United States (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Artificial IntelligenceDec-18-2024

Heterogeneous Multi-Agent Reinforcement Learning for Distributed Channel Access in WLANs

Yu, Jiaming, Liang, Le, Guo, Chongtao, Guo, Ziyang, Jin, Shi, Li, Geoffrey Ye

This paper investigates the use of multi-agent reinforcement learning (MARL) to address distributed channel access in wireless local area networks. In particular, we consider the challenging yet more practical case where the agents heterogeneously adopt value-based or policy-based reinforcement learning algorithms to train the model. We propose a heterogeneous MARL training framework, named QPMIX, which adopts a centralized training with distributed execution paradigm to enable heterogeneous agents to collaborate. Moreover, we theoretically prove the convergence of the proposed heterogeneous MARL method when using the linear value function approximation. Our method maximizes the network throughput and ensures fairness among stations, therefore, enhancing the overall network performance. Simulation results demonstrate that the proposed QPMIX algorithm improves throughput, mean delay, delay jitter, and collision rates compared with conventional carrier-sense multiple access with collision avoidance in the saturated traffic scenario. Furthermore, the QPMIX is shown to be robust in unsaturated and delay-sensitive traffic scenarios, and promotes cooperation among heterogeneous agents.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2412.14218

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (1.00)

Industry: Telecommunications > Networks (0.41)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)