AITopics | Agents

Collaborating Authors

Agents

News Overviews Instructional Materials AI-Alerts Classics

Hierarchically Structured Scheduling and Execution of Tasks in a Multi-Agent Environment

arXiv.org Machine LearningMar-6-2022

In a warehouse environment, tasks appear dynamically. Consequently, a task management system that matches them with the workforce too early (e.g., weeks in advance) is necessarily sub-optimal. Also, the rapidly increasing size of the action space of such a system consists of a significant problem for traditional schedulers. Reinforcement learning, however, is suited to deal with issues requiring making sequential decisions towards a long-term, often remote, goal. In this work, we set ourselves on a problem that presents itself with a hierarchical structure: the task-scheduling, by a centralised agent, in a dynamic warehouse multi-agent environment and the execution of one such schedule, by decentralised agents with only partial observability thereof. We propose to use deep reinforcement learning to solve both the high-level scheduling problem and the low-level multi-agent problem of schedule execution. Finally, we also conceive the case where centralisation is impossible at test time and workers must learn how to cooperate in executing the tasks in an environment with no schedule and only partial observability.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

2203.03021

Country:

North America > United States > Utah (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report (0.65)
Instructional Material (0.48)

Industry:

Education (0.68)
Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.30)

Add feedback

How to Train your Decision-Making AIs

#artificialintelligenceMar-5-2022, 10:25:27 GMT

The combination of deep learning and decision learning has led to several impressive stories in decision-making AI research, including AIs that can play a variety of games (Atari video games, board games, complex real-time strategy game Starcraft II), control robots (in simulation and in the real world), and even fly a weather balloon. These are examples of sequential decision tasks, in which the AI agent needs to make a sequence of decisions to achieve its goal. Today, the two main approaches for training such agents are reinforcement learning (RL) and imitation learning (IL). In reinforcement learning, humans provide rewards for completing discrete tasks, with the rewards typically being delayed and sparse. For example, 100 points are given for solving the first room of Montezuma's revenge (Fig.1). In the imitation learning setting, humans can transfer knowledge and skills through step-by-step action demonstrations (Fig.2), and the agent then learns to mimic human actions.

agent, imitation, reinforcement, (15 more...)

#artificialintelligence

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Fully Decentralized, Scalable Gaussian Processes for Multi-Agent Federated Learning

Kontoudis, George P., Stilwell, Daniel J.

arXiv.org Machine LearningMar-5-2022

In this paper, we propose decentralized and scalable algorithms for Gaussian process (GP) training and prediction in multi-agent systems. To decentralize the implementation of GP training optimization algorithms, we employ the alternating direction method of multipliers (ADMM). A closed-form solution of the decentralized proximal ADMM is provided for the case of GP hyper-parameter training with maximum likelihood estimation. Multiple aggregation techniques for GP prediction are decentralized with the use of iterative and consensus methods. In addition, we propose a covariance-based nearest neighbor selection strategy that enables a subset of agents to perform predictions. The efficacy of the proposed methods is illustrated with numerical experiments on synthetic and real data.

agent, gaussian process, scalable gaussian process, (16 more...)

arXiv.org Machine Learning

2203.02865

Country:

North America > United States > Virginia (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
(2 more...)

Genre: Research Report (0.49)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

Add feedback

DEC-LOS-RRT: Decentralized Path Planning for Multi-robot Systems with Line-of-sight Constrained Communication

Tuck, Victoria, Pant, Yash Vardhan, Seshia, Sanjit A., Sastry, S. Shankar

arXiv.org Artificial IntelligenceMar-4-2022

Decentralized planning for multi-agent systems, such as fleets of robots in a search-and-rescue operation, is often constrained by limitations on how agents can communicate with each other. One such limitation is the case when agents can communicate with each other only when they are in line-of-sight (LOS). Developing decentralized planning methods that guarantee safety is difficult in this case, as agents that are occluded from each other might not be able to communicate until it's too late to avoid a safety violation. In this paper, we develop a decentralized planning method that explicitly avoids situations where lack of visibility of other agents would lead to an unsafe situation. Building on top of an existing Rapidly-exploring Random Tree (RRT)-based approach, our method guarantees safety at each iteration. Simulation studies show the effectiveness of our method and compare the degradation in performance with respect to a clairvoyant decentralized planning algorithm where agents can communicate despite not being in LOS of each other.

agent, artificial intelligence, subgraph, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/CCTA48906.2021.9659247

2203.02609

Country:

Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > United States > Alaska > Anchorage Municipality > Anchorage (0.04)

Genre: Research Report (0.50)

Industry: Transportation (0.47)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.69)

Add feedback

AutoDIME: Automatic Design of Interesting Multi-Agent Environments

Kanitscheider, Ingmar, Edwards, Harri

arXiv.org Machine LearningMar-4-2022

Designing a distribution of environments in which RL agents can learn interesting and useful skills is a challenging and poorly understood task, for multi-agent environments the difficulties are only exacerbated. One approach is to train a second RL agent, called a teacher, who samples environments that are conducive for the learning of student agents. However, most previous proposals for teacher rewards do not generalize straightforwardly to the multi-agent setting. We examine a set of intrinsic teacher rewards derived from prediction problems that can be applied in multi-agent settings and evaluate them in Mujoco tasks such as multiagent Hide and Seek [1] as well as a diagnostic single-agent maze task. Of the intrinsic rewards considered we found value disagreement to be most consistent across tasks, leading to faster and more reliable emergence of advanced skills in Hide and Seek and the maze task. Another candidate intrinsic reward considered, value prediction error, also worked well in Hide and Seek but was susceptible to noisy-TV style distractions in stochastic environments. Policy disagreement performed well in the maze task but did not speed up learning in Hide and Seek. Our results suggest that intrinsic teacher rewards, and in particular value disagreement, are a promising approach for automating both single and multi-agent environment design.

agent, disagreement, teacher reward, (14 more...)

arXiv.org Machine Learning

2203.02481

Country: Europe > Middle East > Malta (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Leisure & Entertainment > Games (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Predicting Like A Pilot: Dataset and Method to Predict Socially-Aware Aircraft Trajectories in Non-Towered Terminal Airspace

Patrikar, Jay, Moon, Brady, Oh, Jean, Scherer, Sebastian

arXiv.org Artificial IntelligenceMar-2-2022

Pilots operating aircraft in un-towered airspace rely on their situational awareness and prior knowledge to predict the future trajectories of other agents. These predictions are conditioned on the past trajectories of other agents, agent-agent social interactions and environmental context such as airport location and weather. This paper provides a dataset, $\textit{TrajAir}$, that captures this behaviour in a non-towered terminal airspace around a regional airport. We also present a baseline socially-aware trajectory prediction algorithm, $\textit{TrajAirNet}$, that uses the dataset to predict the trajectories of all agents. The dataset is collected for 111 days over 8 months and contains ADS-B transponder data along with the corresponding METAR weather data. The data is processed to be used as a benchmark with other publicly available social navigation datasets. To the best of authors' knowledge, this is the first 3D social aerial navigation dataset thus introducing social navigation for autonomous aviation. $\textit{TrajAirNet}$ combines state-of-the-art modules in social navigation to provide predictions in a static environment with a dynamic context. Both the $\textit{TrajAir}$ dataset and $\textit{TrajAirNet}$ prediction algorithm are open-source. The dataset, codebase, and video are available at https://theairlab.org/trajair/, https://github.com/castacks/trajairnet, and https://youtu.be/elAQXrxB2gw respectively.

machine learning, prediction, trajectory, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICRA46639.2022.9811972

2109.15158

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.28)
North America > United States > Iowa > Story County > Ames (0.04)
Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)

Genre: Research Report (0.64)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Air (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Aerospace & Defense (0.94)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Competitors-Aware Stochastic Lap Strategy Optimisation for Race Hybrid Vehicles

Braghin, Francesco, Paparusso, Luca, Riani, Manuel, Ruggeri, Fabio

arXiv.org Artificial IntelligenceFeb-28-2022

World Endurance Championship (WEC) racing events are characterised by a relevant performance gap among competitors. The fastest vehicles category, consisting in hybrid vehicles, has to respect energy usage constraints set by the technical regulation. Considering absence of competitors, i.e. traffic conditions, the optimal energy usage strategy for lap time minimisation is typically computed through a constrained optimisation problem. To the best of our knowledge, the majority of state-of-the-art works neglects competitors. This leads to a mismatch with the real world, where traffic generates considerable time losses. To bridge this gap, we propose a new framework to offline compute optimal strategies for the powertrain energy management considering competitors. Through analysis of the available data from previous events, statistics on the sector times and overtaking probabilities are extracted to encode the competitors' behaviour. Adopting a multi-agent model, the statistics are then used to generate realistic Monte Carlo (MC) simulation of their position along the track. The simulator is then adopted to identify the optimal strategy as follows. We develop a longitudinal vehicle model for the ego-vehicle and implement an optimisation problem for lap time minimisation in absence of traffic, based on Genetic Algorithms. Solving the optimisation problem for a variety of constraints generates a set of candidate optimal strategies. Stochastic Dynamic Programming is finally implemented to choose the best strategy considering competitors, whose motion is generated by the MC simulator. Our approach, validated on data from a real stint of race, allows to significantly reduce the lap time.

artificial intelligence, evolutionary algorithm, machine learning, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TVT.2022.3215171

2203.00084

Country:

Asia > Middle East > Bahrain (0.04)
North America > Aruba > Oranjestad (0.04)
Europe > Italy > Lombardy > Milan (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (1.00)
Energy (1.00)
Automobiles & Trucks (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.35)

Add feedback

Provably Efficient Convergence of Primal-Dual Actor-Critic with Nonlinear Function Approximation

Dong, Jing, Shen, Li, Xu, Yinggan, Wang, Baoxiang

arXiv.org Machine LearningFeb-28-2022

We study the convergence of the actor-critic algorithm with nonlinear function approximation under a nonconvex-nonconcave primal-dual formulation. Stochastic gradient descent ascent is applied with an adaptive proximal term for robust learning rates. We show the first efficient convergence result with primal-dual actor-critic with a convergence rate of $\mathcal{O}\left(\sqrt{\frac{\ln \left(N d G^2 \right)}{N}}\right)$ under Markovian sampling, where $G$ is the element-wise maximum of the gradient, $N$ is the number of iterations, and $d$ is the dimension of the gradient. Our result is presented with only the Polyak-\L{}ojasiewicz condition for the dual variables, which is easy to verify and applicable to a wide range of reinforcement learning (RL) scenarios. The algorithm and analysis are general enough to be applied to other RL settings, like multi-agent RL. Empirical results on OpenAI Gym continuous control tasks corroborate our theoretical findings.

algorithm, assumption 5, reinforcement, (13 more...)

arXiv.org Machine Learning

2202.13863

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Why teaching robots to play hide-and-seek could be the key to next-gen A.I.

#artificialintelligenceFeb-24-2022, 02:35:22 GMT

Artificial general intelligence, the idea of an intelligent A.I. agent that's able to understand and learn any intellectual task that humans can do, has long been a component of science fiction. As A.I. gets smarter and smarter -- especially with breakthroughs in machine learning tools that are able to rewrite their code to learn from new experiences -- it's increasingly widely a part of real artificial intelligence conversations as well. But how do we measure AGI when it does arrive? Over the years, researchers have laid out a number of possibilities. The most famous remains the Turing Test, in which a human judge interacts, sight unseen, with both humans and a machine, and must try and guess which is which.

agent, intelligence, play hide-and-seek, (12 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games > Computer Games (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.41)

Add feedback

Stanford AI Lab Papers and Talks at AAAI 2022

#artificialintelligenceFeb-22-2022, 20:32:27 GMT

The 36th AAAI Conference on Artificial Intelligence (AAAI 2022) is being hosted virtually from February 22th - March 1st. We're excited to share all the work from SAIL that's being presented, and you'll find links to papers, videos and blogs below. Feel free to reach out to the contact authors directly to learn more about the work that's happening at Stanford. We look forward to seeing you at AAAI 2022.

aaai 2022, keyword, stanford, (7 more...)

#artificialintelligence

Country: North America > United States > California > Santa Clara County > Palo Alto (0.63)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.90)

Add feedback