AITopics

2106.10015

Country:

Europe > Netherlands > North Brabant > Eindhoven (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (1.00)
Education > Curriculum (1.00)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

He, Keyang, Doshi, Prashant, Banerjee, Bikramjit

Many Agent Reinforcement Learning Under Partial Observability

Recent renewed interest in multi-agent reinforcement learning (MARL) has generated an impressive array of techniques that leverage deep reinforcement learning, primarily actor-critic architectures, and can be applied to a limited range of settings in terms of observability and communication. However, a continuing limitation of much of this work is the curse of dimensionality when it comes to representations based on joint actions, which grow exponentially with the number of agents. In this paper, we squarely focus on this challenge of scalability. We apply the key insight of action anonymity, which leads to permutation invariance of joint actions, to two recently presented deep MARL algorithms, MADDPG and IA2C, and compare these instantiations to another recent technique that leverages action anonymity, viz., mean-field MARL. We show that our instantiations can learn the optimal behavior in a broader class of agent networks than the mean-field method, using a recently introduced pragmatic domain.

agent, configuration, joint action, (16 more...)

2106.09825

Country: North America > United States > Mississippi (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Liyanage, Teshan, Fernando, Subha

Optimizing robotic swarm based construction tasks

Social insects in nature such as ants, termites and bees construct their colonies collaboratively in a very efficient process. In these swarms, each insect contributes to the construction task individually showing redundant and parallel behavior of individual entities. But the robotics adaptations of these swarm's behaviors haven't yet made it to the real world at a large enough scale of commonly being used due to the limitations in the existing approaches to the swarm robotics construction. This paper presents an approach that combines the existing swarm construction approaches which results in a swarm robotic system, capable of constructing a given 2 dimensional shape in an optimized manner.

construction, construction task, robot, (12 more...)

2106.09749

Country:

Asia > Sri Lanka > Western Province > Colombo > Colombo (0.05)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
Africa (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Hi-Phy: A Benchmark for Hierarchical Physical Reasoning

Xue, Cheng, Pinto, Vimukthini, Gamage, Chathura, Zhang, Peng, Renz, Jochen

Reasoning about the behaviour of physical objects is a key capability of agents operating in physical worlds. Humans are very experienced in physical reasoning while it remains a major challenge for AI. To facilitate research addressing this problem, several benchmarks have been proposed recently. However, these benchmarks do not enable us to measure an agent's granular physical reasoning capabilities when solving a complex reasoning task. In this paper, we propose a new benchmark for physical reasoning that allows us to test individual physical reasoning capabilities. Inspired by how humans acquire these capabilities, we propose a general hierarchy of physical reasoning capabilities with increasing complexity. Our benchmark tests capabilities according to this hierarchy through generated physical reasoning tasks in the video game Angry Birds. This benchmark enables us to conduct a comprehensive agent evaluation by measuring the agent's granular physical reasoning capabilities. We conduct an evaluation with human players, learning agents, and heuristic agents and determine their capabilities. Our evaluation shows that learning agents, with good local generalization ability, still struggle to learn the underlying physical reasoning capabilities and perform worse than current state-of-the-art heuristic agents and humans. We believe that this benchmark will encourage researchers to develop intelligent agents with advanced, human-like physical reasoning capabilities. URL: https://github.com/Cheng-Xue/Hi-Phy

agent, physical reasoning capability, reasoning capability, (15 more...)

2106.09692

Country:

Oceania > Australia > Australian Capital Territory > Canberra (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)

Multi-Agent Training beyond Zero-Sum with Correlated Equilibrium Meta-Solvers

Marris, Luke, Muller, Paul, Lanctot, Marc, Tuyls, Karl, Grapael, Thore

Outside of normal form (NF) games, this problem setting Two-player, constant-sum games are well studied arises in multi-agent training when dealing with empirical in the literature, but there has been limited games (also called meta-games), where a game payoff progress outside of this setting. We propose Joint tensor is populated with expected outcomes between Policy-Space Response Oracles (JPSRO), an algorithm agents playing an extensive form (EF) game, for example for training agents in n-player, general-sum the StarCraft League (Vinyals et al., 2019) and Policy-Space extensive form games, which provably converges Response Oracles (PSRO) (Lanctot et al., 2017), a recent to an equilibrium. We further suggest correlated variant of which reached state-of-the-art results in Stratego equilibria (CE) as promising meta-solvers, and Barrage (McAleer et al., 2020).

equilibrium, jpsro, multi-agent training, (14 more...)

2106.09435

Country:

North America > United States > California > Monterey County > Monterey (0.04)
North America > Puerto Rico > San Juan > San Juan (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Cooperative Multi-Agent Reinforcement Learning Based Distributed Dynamic Spectrum Access in Cognitive Radio Networks

Tan, Xiang, Zhou, Li, Wang, Haijun, Sun, Yuli, Zhao, Haitao, Seet, Boon-Chong, Wei, Jibo, Leung, Victor C. M.

This work has been submitted to the IEEE for possible publication. Abstract With the development of the 5G and Internet of Things, amounts of wireless devices need to share the limited spectrum resources. Dynamic spectrum access (DSA) is a promising paradigm to remedy the problem of inef!cient spectrum utilization brought upon by the historical command-and-control approach to spectrum allocation. In this paper, we investigate the distributed DSA problem for multiuser in a typical multi-channel cognitive radio network. The problem is formulated as a decentralized partially observable Markov decision process (Dec-POMDP), and we proposed a centralized off-line training and distributed on-line execution framework based on cooperative multi-agent reinforcement learning (MARL). We employ the deep recurrent Q-network (DRQN) to address the partial observability of the state for each cognitive user. The ultimate goal is to learn a cooperative strategy which maximizes the sum throughput of cognitive radio network in distributed fashion without coordination information exchange between cognitive users. This work was supported in part by the National Natural Science Foundation of China under Grant 6193000305. X. Tan, L. Zhou, Y. Sun, H. Wang, H. Zhao and J. Wei are all with College of Electronic Science and Technology, National University of Defense Technology, Changsha, 410073, China (E-mail: {tanxiang, zhouli2035, haijunwang14, sunyuli19, haitaozhao, wjbhw}@nudt.edu.cn). Boon-Chong Seet is with the Department of Electrical and Electronic Engineering, Auckland University of Technology, Auckland 1142, New Zealand (E-mail: boon-chong.seet@aut.ac.nz). Victor C. M. Leung is with Shenzhen University, Shenzhen, China and the University of British Columbia, Vancouver, Canada (E-mail: vleung@ieee.org). 2 From the simulation results, we can observe that the proposed algorithm can converge fast and achieve almost the optimal performance. The future network is involving into the Internet of Everything.

algorithm, cognitive user, time slot, (16 more...)

2106.09274

Country:

Oceania > New Zealand > North Island > Auckland Region > Auckland (0.44)
Asia > China > Guangdong Province > Shenzhen (0.44)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.24)
(3 more...)

Genre:

Research Report (1.00)
Overview (0.67)

Industry:

Media (0.68)
Leisure & Entertainment (0.46)
Information Technology > Smart Houses & Appliances (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Seeing Differently, Acting Similarly: Imitation Learning with Heterogeneous Observations

Cai, Xin-Qiang, Ding, Yao-Xiang, Chen, Zi-Xuan, Jiang, Yuan, Sugiyama, Masashi, Zhou, Zhi-Hua

In many real-world imitation learning tasks, the demonstrator and the learner have to act in different but full observation spaces. This situation generates significant obstacles for existing imitation learning approaches to work, even when they are combined with traditional space adaptation techniques. The main challenge lies in bridging expert's occupancy measures to learner's dynamically changing occupancy measures under the different observation spaces. In this work, we model the above learning problem as Heterogeneous Observations Imitation Learning (HOIL). We propose the Importance Weighting with REjection (IWRE) algorithm based on the techniques of importance-weighting, learning with rejection, and active querying to solve the key challenge of occupancy measure matching. Experimental results show that IWRE can successfully solve HOIL tasks, including the challenging task of transforming the vision-based demonstrations to random access memory (RAM)-based policies under the Atari domain.

demonstration, international conference, observation space, (16 more...)

2106.09256

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.14)
Oceania > Australia > Queensland > Brisbane (0.04)
(17 more...)

Genre: Research Report > New Finding (0.66)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Artificial IntelligenceJun-16-2021

A discrete optimisation approach for target path planning whilst evading sensors

Beasley, J. E.

In this paper we deal with a practical problem that arises in military situations. The problem is to plan a path for one (or more) agents to reach a target without being detected by enemy sensors. Agents are not passive, rather they can (within limits) initiate actions which aid evasion, namely knockout (completely disable sensors) and confusion (reduce sensor detection probabilities). Agent actions are path dependent and time limited. Here by path dependent we mean that an agent needs to be sufficiently close to a sensor to knock it out. By time limited we mean that a limit is imposed on how long a sensor is knocked out or confused before it reverts back to its original operating state. The approach adopted breaks the continuous space in which agents move into a discrete space. This enables the problem to be represented (formulated) mathematically as a zero-one integer program with linear constraints. The advantage of representing the problem in this manner is that powerful commercial software optimisation packages exist to solve the problem to proven global optimality. Computational results are presented for a number of randomly generated test problems.

agent, equation, sensor, (15 more...)

2106.08826

Country: Europe > United Kingdom (0.04)

Genre: Research Report (0.82)

Industry: Government > Military (0.68)

Technology:

Information Technology > Communications > Networks > Sensor Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.47)

Kwon, Minae, Karamcheti, Siddharth, Cuellar, Mariano-Florentino, Sadigh, Dorsa

Targeted Data Acquisition for Evolving Negotiation Agents

arXiv.org Artificial IntelligenceJun-16-2021

Consider a standard non-cooperative negotiation game (Deming et al., 1944; Successful negotiators must learn how to balance Nash, 1950; 1951) as shown in Figure 1 where two agents - optimizing for self-interest and cooperation. Yet Alice and Bob - are trying to agree on an allocation of shared current artificial negotiation agents often heavily resources. Both have high utility associated with the hats depend on the quality of the static datasets they and balls, though Alice also cares about books. Effectively were trained on, limiting their capacity to fashion employing negotiation is crucial, and is the only way to an adaptive response balancing self-interest and reach an equitable outcome - dividing the hats and balls cooperation. For this reason, we find that these evenly, while giving Alice the book. Even where negotiating agents can achieve either high utility or cooperation, agents have incentives that make it challenging for them to but not both. To address this, we introduce cooperate, it would be difficult to imagine that negotiation a targeted data acquisition framework where we could be useful to agents over time -- let alone society -- guide the exploration of a reinforcement learning if agents were incapable of cooperating to achieve equitable agent using annotations from an expert oracle.

agent, bob, negotiation, (12 more...)

2106.07728

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre:

Research Report (0.64)
Instructional Material > Course Syllabus & Notes (0.48)
Questionnaire & Opinion Survey (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceJun-16-2021

Learned Belief Search: Efficiently Improving Policies in Partially Observable Settings

Hu, Hengyuan, Lerer, Adam, Brown, Noam, Foerster, Jakob

Search is an important tool for computing effective policies in single- and multi-agent environments, and has been crucial for achieving superhuman performance in several benchmark fully and partially observable games. However, one major limitation of prior search approaches for partially observable environments is that the computational cost scales poorly with the amount of hidden information. In this paper we present \emph{Learned Belief Search} (LBS), a computationally efficient search procedure for partially observable environments. Rather than maintaining an exact belief distribution, LBS uses an approximate auto-regressive counterfactual belief that is learned as a supervised task. In multi-agent settings, LBS uses a novel public-private model architecture for underlying policies in order to efficiently evaluate these policies during rollouts. In the benchmark domain of Hanabi, LBS can obtain 55% ~ 91% of the benefit of exact search while reducing compute requirements by $35.8 \times$ ~ $4.6 \times$, allowing it to scale to larger settings that were inaccessible to previous search methods.

belief model, learned belief search, sparta, (16 more...)

2106.09086

Country: North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.68)
(2 more...)