AITopics

2102.12307

Country:

North America > United States > Michigan (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.93)

Shelke, Omkar, Meisheri, Hardik, Khadilkar, Harshad

School of hard knocks: Curriculum analysis for Pommerman with a fixed computational budget

arXiv.org Artificial IntelligenceFeb-24-2021

Pommerman is a hybrid cooperative/adversarial multi-agent environment, with challenging characteristics in terms of partial observability, limited or no communication, sparse and delayed rewards, and restrictive computational time limits. This makes it a challenging environment for reinforcement learning (RL) approaches. In this paper, we focus on developing a curriculum for learning a robust and promising policy in a constrained computational budget of 100,000 games, starting from a fixed base policy (which is itself trained to imitate a noisy expert policy). All RL algorithms starting from the base policy use vanilla proximal-policy optimization (PPO) with the same reward function, and the only difference between their training is the mix and sequence of opponent policies. One expects that beginning training with simpler opponents and then gradually increasing the opponent difficulty will facilitate faster learning, leading to more robust policies compared against a baseline where all available opponent policies are introduced from the start. We test this hypothesis and show that within constrained computational budgets, it is in fact better to "learn in the school of hard knocks", i.e., against all available opponent policies nearly from the start. We also include ablation studies where we study the effect of modifying the base environment properties of ammo and bomb blast strength on the agent performance.

agent, curriculum, opponent, (13 more...)

2102.11762

Country: Asia > India > Maharashtra > Mumbai (0.04)

Genre: Research Report (0.82)

Industry:

Education (0.86)
Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

#artificialintelligenceFeb-23-2021, 18:30:31 GMT

Why Scientists are Teaching Robots to Play Hide-and-Seek

Artificial general intelligence, the idea of an intelligent A.I. agent that's able to understand and learn any intellectual task that humans can do, has long been a component of science fiction. As A.I. gets smarter and smarter -- especially with breakthroughs in machine learning tools that are able to rewrite their code to learn from new experiences -- it's increasingly widely a part of real artificial intelligence conversations as well. But how do we measure AGI when it does arrive? Over the years, researchers have laid out a number of possibilities. The most famous remains the Turing Test, in which a human judge interacts, sight unseen, with both humans and a machine, and must try and guess which is which.

agent, intelligence, play hide-and-seek, (13 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games > Computer Games (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.41)

Chen, Shixiang, Garcia, Alfredo, Shahrampour, Shahin

On Distributed Non-convex Optimization: Projected Subgradient Method For Weakly Convex Problems in Networks

arXiv.org Machine LearningFeb-23-2021

The stochastic subgradient method is a widely-used algorithm for solving large-scale optimization problems arising in machine learning. Often these problems are neither smooth nor convex. Recently, Davis et al. [1-2] characterized the convergence of the stochastic subgradient method for the weakly convex case, which encompasses many important applications (e.g., robust phase retrieval, blind deconvolution, biconvex compressive sensing, and dictionary learning). In practice, distributed implementations of the projected stochastic subgradient method (stoDPSM) are used to speed-up risk minimization. In this paper, we propose a distributed implementation of the stochastic subgradient method with a theoretical guarantee. Specifically, we show the global convergence of stoDPSM using the Moreau envelope stationarity measure. Furthermore, under a so-called sharpness condition, we show that deterministic DPSM (with a proper initialization) converges linearly to the sharp minima, using geometrically diminishing step-size. We provide numerical experiments to support our theoretical analysis.

algorithm, dpsm, lemma ii, (14 more...)

arXiv.org Machine Learning

doi: 10.1109/TAC.2021.3056535

2004.13233

Country:

North America > United States > Texas > Brazos County > College Station (0.04)
North America > United States > New York (0.04)
North America > United States > Massachusetts (0.04)
(3 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Gandhi, Kanishk, Stojnic, Gala, Lake, Brenden M., Dillon, Moira R.

Baby Intuitions Benchmark (BIB): Discerning the goals, preferences, and actions of others

To achieve human-like common sense about everyday life, machine learning systems must understand and reason about the goals, preferences, and actions of others. Human infants intuitively achieve such common sense by making inferences about the underlying causes of other agents' actions. Directly informed by research on infant cognition, our benchmark BIB challenges machines to achieve generalizable, common-sense reasoning about other agents like human infants do. As in studies on infant cognition, moreover, we use a violation of expectation paradigm in which machines must predict the plausibility of an agent's behavior given a video sequence, making this benchmark appropriate for direct validation with human infants in future studies. We show that recently proposed, deep-learning-based agency reasoning models fail to show infant-like reasoning, leaving BIB an open challenge.

agent, familiarization, infant, (12 more...)

2102.11938

Country:

North America > United States > New York (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Models we Can Trust: Toward a Systematic Discipline of (Agent-Based) Model Interpretation and Validation

Istrate, Gabriel

We advocate the development of a discipline of interacting with and extracting information from models, both mathematical (e.g. game-theoretic ones) and computational (e.g. agent-based models). We outline some directions for the development of a such a discipline: - the development of logical frameworks for the systematic formal specification of stylized facts and social mechanisms in (mathematical and computational) social science. Such frameworks would bring to attention new issues, such as phase transitions, i.e. dramatical changes in the validity of the stylized facts beyond some critical values in parameter space. We argue that such statements are useful for those logical frameworks describing properties of ABM. - the adaptation of tools from the theory of reactive systems (such as bisimulation) to obtain practically relevant notions of two systems "having the same behavior". - the systematic development of an adversarial theory of model perturbations, that investigates the robustness of conclusions derived from models of social behavior to variations in several features of the social dynamics. These may include: activation order, the underlying social network, individual agent behavior.

artificial society and social simulation, mechanism, stylized fact, (10 more...)

2102.11615

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
North America > United States > Virginia (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(6 more...)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Epidemiology (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.68)
Government (0.68)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Rahman, Md. Musfiqur, Rasheed, Ayman, Khan, Md. Mosaddek, Javidian, Mohammad Ali, Jamshidi, Pooyan, Mamun-Or-Rashid, Md.

Accelerating Recursive Partition-Based Causal Structure Learning

Causal structure discovery from observational data is fundamental to the causal understanding of autonomous systems such as medical decision support systems, advertising campaigns and self-driving cars. This is essential to solve well-known causal decision making and prediction problems associated with those real-world applications. Recently, recursive causal discovery algorithms have gained particular attention among the research community due to their ability to provide good results by using Conditional Independent (CI) tests in smaller sub-problems. However, each of such algorithms needs a refinement function to remove undesired causal relations of the discovered graphs. Notably, with the increase of the problem size, the computation cost (i.e., the number of CI-tests) of the refinement function makes an algorithm expensive to deploy in practice. This paper proposes a generic causal structure refinement strategy that can locate the undesired relations with a small number of CI-tests, thus speeding up the algorithm for large and complex problems. We theoretically prove the correctness of our algorithm. We then empirically evaluate its performance against the state-of-the-art algorithms in terms of solution quality and completion time in synthetic and real datasets.

algorithm, ci-test, graph, (17 more...)

2102.11545

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > South Carolina (0.04)
(3 more...)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.82)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.66)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.54)

Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games

Bai, Yu, Jin, Chi, Wang, Huan, Xiong, Caiming

Real world applications such as economics and policy making often involve solving multi-agent games with two unique features: (1) The agents are inherently asymmetric and partitioned into leaders and followers; (2) The agents have different reward functions, thus the game is general-sum. The majority of existing results in this field focuses on either symmetric solution concepts (e.g. Nash equilibrium) or zero-sum games. It remains vastly open how to learn the Stackelberg equilibrium -- an asymmetric analog of the Nash equilibrium -- in general-sum games efficiently from samples. This paper initiates the theoretical study of sample-efficient learning of the Stackelberg equilibrium in two-player turn-based general-sum games. We identify a fundamental gap between the exact value of the Stackelberg equilibrium and its estimated version using finite samples, which can not be closed information-theoretically regardless of the algorithm. We then establish a positive result on sample-efficient learning of Stackelberg equilibrium with value optimal up to the gap identified above. We show that our sample complexity is tight with matching upper and lower bounds. Finally, we extend our learning results to the setting where the follower plays in a Markov Decision Process (MDP), and the setting where the leader and the follower act simultaneously.

best response, br 3, stackelberg equilibrium, (15 more...)

2102.11494

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry:

Government (0.92)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)

Schuderer, Andreas, Bromuri, Stefano, van Eekelen, Marko

Sim-Env: Decoupling OpenAI Gym Environments from Simulation Models

arXiv.org Artificial IntelligenceFeb-22-2021

Reinforcement learning (RL) is one of the most active fields of AI research. Despite the interest demonstrated by the research community in reinforcement learning, the development methodology still lags behind, with a severe lack of standard APIs to foster the development of RL applications. OpenAI Gym is probably the most used environment to develop RL applications and simulations, but most of the abstractions proposed in such a framework are still assuming a semi-structured methodology. This is particularly relevant for agent-based models whose purpose is to analyse adaptive behaviour displayed by self-learning agents in the simulation. In order to bridge this gap, we present a workflow and tools for the decoupled development and maintenance of multi-purpose agent-based models and derived single-purpose reinforcement learning environments, enabling the researcher to swap out environments with ones representing different perspectives or different reward models, all while keeping the underlying domain model intact and separate. The Sim-Env Python library generates OpenAI-Gym-compatible reinforcement learning environments that use existing or purposely created domain models as their simulation back-ends. Its design emphasizes ease-of-use, modularity and code separation.

machine learning, natural language, reinforcement learning, (15 more...)

doi: 10.1007/978-3-030-85739-4_39

2102.09824

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > District of Columbia > Washington (0.04)
Europe > Netherlands > Gelderland > Nijmegen (0.04)

Genre:

Workflow (0.68)
Research Report (0.51)

Industry:

Banking & Finance (0.47)
Leisure & Entertainment > Games (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.84)

Jericevich, Ivan, Sing, Dharmesh, Gebbie, Tim

CoinTossX: An open-source low-latency high-throughput matching engine

arXiv.org Artificial IntelligenceFeb-22-2021

We deploy and demonstrate the CoinTossX low-latency, high-throughput, open-source matching engine with orders sent using the Julia and Python languages. We show how this can be deployed for small-scale local desk-top testing and discuss a larger scale, but local hosting, with multiple traded instruments managed concurrently and managed by multiple clients. We then demonstrate a cloud based deployment using Microsoft Azure, with large-scale industrial and simulation research use cases in mind. The system is exposed and interacted with via sockets using UDP SBE message protocols and can be monitored using a simple web browser interface using HTTP. We give examples showing how orders can be be sent to the system and market data feeds monitored using the Julia and Python languages. The system is developed in Java with orders submitted as binary encodings (SBE) via UDP protocols using the Aeron Media Driver as the low-latency, high throughput message transport. The system separates the order-generation and simulation environments e.g. agent-based model simulation, from the matching of orders, data-feeds and various modularised components of the order-book system. This ensures a more natural and realistic asynchronicity between events generating orders, and the events associated with order-book dynamics and market data-feeds. We promote the use of Julia as the preferred order submission and simulation environment.

engine, simulation, trading gateway, (16 more...)

doi: 10.1016/j.softx.2022.101136

2102.10925

Country:

Europe > United Kingdom > England > Greater London > London > City of London (0.04)
Africa > South Africa > Western Cape > Cape Town (0.04)
Africa > South Africa > Gauteng > Johannesburg (0.04)
Asia > Vietnam > Long An Province (0.04)

Genre: Research Report (0.50)

Industry:

Information Technology > Services (1.00)
Banking & Finance > Trading (1.00)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Communications > Networks (0.93)