AITopics

2102.06148

Country:

Europe > Sweden > Stockholm > Stockholm (0.04)
Europe > Poland > Lublin Province > Lublin (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > China > Chongqing Province > Chongqing (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Game Theory (0.89)

Cen, Sarah H., Shah, Devavrat

Regret, stability, and fairness in matching markets with bandit learners

arXiv.org Machine LearningFeb-11-2021

We consider the two-sided matching market with bandit learners. In the standard matching problem, users and providers are matched to ensure incentive compatibility via the notion of stability. However, contrary to the core assumption of the matching problem, users and providers do not know their true preferences a priori and must learn them. To address this assumption, recent works propose to blend the matching and multi-armed bandit problems. They establish that it is possible to assign matchings that are stable (i.e., incentive-compatible) at every time step while also allowing agents to learn enough so that the system converges to matchings that are stable under the agents' true preferences. However, while some agents may incur low regret under these matchings, others can incur high regret -- specifically, $\Omega(T)$ optimal regret where $T$ is the time horizon. In this work, we incorporate costs and transfers in the two-sided matching market with bandit learners in order to faithfully model competition between agents. We prove that, under our framework, it is possible to simultaneously guarantee four desiderata: (1) incentive compatibility, i.e., stability, (2) low regret, i.e., $O(\log(T))$ optimal regret, (3) fairness in the distribution of regret among agents, and (4) high social welfare.

agent, algorithm, social welfare, (15 more...)

arXiv.org Machine Learning

2102.06246

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Kosovo > District of Gjilan > Kamenica (0.04)
North America > United States > New York > New York County > New York City (0.04)
(6 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)

arXiv.org Artificial IntelligenceFeb-11-2021

A Metamodel and Framework for Artificial General Intelligence From Theory to Practice

Latapie, Hugo, Kilic, Ozkan, Liu, Gaowen, Yan, Yan, Kompella, Ramana, Wang, Pei, Thorisson, Kristinn R., Lawrence, Adam, Sun, Yuhong, Srinivasa, Jayanth

This paper introduces a new metamodel-based knowledge representation that significantly improves autonomous learning and adaptation. While interest in hybrid machine learning / symbolic AI systems leveraging, for example, reasoning and knowledge graphs, is gaining popularity, we find there remains a need for both a clear definition of knowledge and a metamodel to guide the creation and manipulation of knowledge. Some of the benefits of the metamodel we introduce in this paper include a solution to the symbol grounding problem, cumulative learning, and federated learning. We have applied the metamodel to problems ranging from time series analysis, computer vision, and natural language understanding and have found that the metamodel enables a wide variety of learning mechanisms ranging from machine learning, to graph network analysis and learning by reasoning engines to interoperate in a highly synergistic way. Our metamodel-based projects have consistently exhibited unprecedented accuracy, performance, and ability to generalize. This paper is inspired by the state-of-the-art approaches to AGI, recent AGI-aspiring work, the granular computing community, as well as Alfred Korzybski's general semantics. One surprising consequence of the metamodel is that it not only enables a new level of autonomous learning and optimal functioning for machine intelligences, but may also shed light on a path to better understanding how to improve human cognition.

abstraction, knowledge, metamodel, (14 more...)

2102.06112

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > New York (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(7 more...)

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.88)
(2 more...)

Agarwal, Mridul, Aggarwal, Vaneet, Azizzadenesheli, Kamyar

Multi-Agent Multi-Armed Bandits with Limited Communication

arXiv.org Artificial IntelligenceFeb-10-2021

We consider the problem where N agents collaboratively interact with an instance of a stochastic K arm bandit problem for K N. The agents aim to simultaneously minimize the cumulative regret over all the agents for a total of T time steps, the number of communication rounds, and the number of bits in each communication round. We present Limited Communication Collaboration - Upper Confidence Bound (LCC-UCB), a doubling-epoch based algorithm where each agent communicates only after the end of the epoch and shares the index of the best ( (K/N) arm it knows. With our algorithm, LCC-UCB, each agent enjoys a regret of Õ N)T, communicates for O(logT) steps and broadcasts O(logK) bits in each communication step. Finally, we empirically show that the LCC-UCB and the LCC-UCB-GRAPH algorithm perform well and outperform strategies that communicate through a central node. We consider a setup where N agents connected over a network, interact with a multi armed bandit (MAB) environment (Lattimore and Szepesvári, 2020). The agents aim to collaborate with other agents in the network to minimize their regret. The agents also aim to reduce the number of messages and the size of messages communicated with others.

agent, algorithm, epoch, (13 more...)

2102.08462

Country:

North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Learning State Representations from Random Deep Action-conditional Predictions

Zheng, Zeyu, Veeriah, Vivek, Vuorio, Risto, Lewis, Richard, Singh, Satinder

In this work, we study auxiliary prediction tasks defined by temporal-difference networks (TD networks); these networks are a language for expressing a rich space of general value function (GVF) prediction targets that may be learned efficiently with TD. Through analysis in an illustrative domain we show the benefits to learning state representations of exploiting the full richness of TD networks, including both action-conditional predictions and temporally deep predictions. Our main (and perhaps surprising) result is that deep action-conditional TD networks with random structures that create random prediction-questions about random features yield state representations that are competitive with state-of-the-art hand-crafted value prediction and pixel control auxiliary tasks in both Atari games and DeepMind Lab tasks. We also show through stop-gradient experiments that learning the state representations solely via these unsupervised random TD network prediction tasks yield agents that outperform the end-to-end-trained actor-critic baseline.

machine learning, natural language, reinforcement learning, (19 more...)

2102.04897

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > New York > Richmond County > New York City (0.04)
North America > United States > New York > Queens County > New York City (0.04)
(11 more...)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (0.56)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Modeling the Interaction between Agents in Cooperative Multi-Agent Reinforcement Learning

Ma, Xiaoteng, Yang, Yiqin, Li, Chenghao, Lu, Yiwen, Zhao, Qianchuan, Jun, Yang

Value-based methods of multi-agent reinforcement learning (MARL), especially the value decomposition methods, have been demonstrated on a range of challenging cooperative tasks. However, current methods pay little attention to the interaction between agents, which is essential to teamwork in games or real life. This limits the efficiency of value-based MARL algorithms in the two aspects: collaborative exploration and value function estimation. In this paper, we propose a novel cooperative MARL algorithm named as interactive actor-critic~(IAC), which models the interaction of agents from the perspectives of policy and value function. On the policy side, a multi-agent joint stochastic policy is introduced by adopting a collaborative exploration module, which is trained by maximizing the entropy-regularized expected return. On the value side, we use the shared attention mechanism to estimate the value function of each agent, which takes the impact of the teammates into consideration. At the implementation level, we extend the value decomposition methods to continuous control tasks and evaluate IAC on benchmark tasks including classic control and multi-agent particle environments. Experimental results indicate that our method outperforms the state-of-the-art approaches and achieves better performance in terms of cooperation.

action-value function, agent, learning, (15 more...)

2102.06042

Country:

Asia > China > Beijing > Beijing (0.05)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Ogunsina, Kolawole, Papamichalis, Marios, DeLaurentis, Daniel

Uncertainty Quantification and Propagation for Airline Disruption Management

Disruption management during the airline scheduling process can be compartmentalized into proactive and reactive processes depending upon the time of schedule execution. The state of the art for decision-making in airline disruption management involves a heuristic human-centric approach that does not categorically study uncertainty in proactive and reactive processes for managing airline schedule disruptions. Hence, this paper introduces an uncertainty transfer function model (UTFM) framework that characterizes uncertainty for proactive airline disruption management before schedule execution, reactive airline disruption management during schedule execution, and proactive airline disruption management after schedule execution to enable the construction of quantitative tools that can allow an intelligent agent to rationalize complex interactions and procedures for robust airline disruption management. Specifically, we use historical scheduling and operations data from a major U.S. airline to facilitate the development and assessment of the UTFM, defined by hidden Markov models (a special class of probabilistic graphical models) that can efficiently perform pattern learning and inference on portions of large data sets.

airline disruption management, disruption management, utfm, (12 more...)

2102.05147

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Florida > Hillsborough County > University (0.04)
Europe (0.04)

Genre: Research Report (0.82)

Industry:

Transportation > Passenger (1.00)
Transportation > Infrastructure & Services (1.00)
Transportation > Air (1.00)
Consumer Products & Services > Travel (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.95)

Cacciamani, Federico, Celli, Andrea, Ciccone, Marco, Gatti, Nicola

Multi-Agent Coordination in Adversarial Environments through Signal Mediated Strategies

Many real-world scenarios involve teams of agents that have to coordinate their actions to reach a shared goal. We focus on the setting in which a team of agents faces an opponent in a zero-sum, imperfect-information game. Team members can coordinate their strategies before the beginning of the game, but are unable to communicate during the playing phase of the game. This is the case, for example, in Bridge, collusion in poker, and collusion in bidding. In this setting, model-free RL methods are oftentimes unable to capture coordination because agents' policies are executed in a decentralized fashion. Our first contribution is a game-theoretic centralized training regimen to effectively perform trajectory sampling so as to foster team coordination. When team members can observe each other actions, we show that this approach provably yields equilibrium strategies. Then, we introduce a signaling-based framework to represent team coordinated strategies given a buffer of past experiences. Each team member's policy is parametrized as a neural network whose output is conditioned on a suitable exogenous signal, drawn from a learned probability distribution. By combining these two elements, we empirically show convergence to coordinated equilibria in cases where previous state-of-the-art multi-agent RL algorithms did not.

coordination, team member, trajectory, (14 more...)

2102.05026

Country:

Europe > Sweden > Stockholm > Stockholm (0.04)
North America > United States > Texas (0.04)
North America > United States > Virginia > Arlington County > Arlington (0.04)
(5 more...)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Hammond, Lewis, Fox, James, Everitt, Tom, Abate, Alessandro, Wooldridge, Michael

Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice

Multi-agent influence diagrams (MAIDs) are a popular form of Previous work on MAIDs has focussed on Nash equilibria as graphical model that, for certain classes of games, have been shown the core solution concept [20]. Whilst this is arguably the most important to offer key complexity and explainability advantages over traditional solution concept in non-cooperative game theory, if there extensive form game (EFG) representations. In this paper, we are many Nash equilibria we often wish to remove some of those extend previous work on MAIDs by introducing the concept of a that are less'rational'. Many refinements to the Nash equilibrium MAID subgame, as well as subgame perfect and trembling hand have been proposed [17], with two of the most important being perfect equilibrium refinements. We then prove several equivalence subgame perfect Nash equilibria [26] and trembling hand perfect results between MAIDs and EFGs. Finally, we describe an open equilibria [27]. The first rules out'non-credible' threats and the second source implementation for reasoning about MAIDs and computing requires that each player is still playing a best-response when their equilibria.

information, node, subgame, (16 more...)

2102.05008

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
(3 more...)

Genre: Research Report (0.40)

Industry:

Leisure & Entertainment > Games (0.66)
Government > Military (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.85)

Structured Diversification Emergence via Reinforced Organization Control and Hierarchical Consensus Learning

Li, Wenhao, Wang, Xiangfeng, Jin, Bo, Sheng, Junjie, Hua, Yun, Zha, Hongyuan

When solving a complex task, humans will spontaneously form teams and to complete different parts of the whole task, respectively. Meanwhile, the cooperation between teammates will improve efficiency. However, for current cooperative MARL methods, the cooperation team is constructed through either heuristics or end-to-end blackbox optimization. In order to improve the efficiency of cooperation and exploration, we propose a structured diversification emergence MARL framework named {\sc{Rochico}} based on reinforced organization control and hierarchical consensus learning. {\sc{Rochico}} first learns an adaptive grouping policy through the organization control module, which is established by independent multi-agent reinforcement learning. Further, the hierarchical consensus module based on the hierarchical intentions with consensus constraint is introduced after team formation. Simultaneously, utilizing the hierarchical consensus module and a self-supervised intrinsic reward enhanced decision module, the proposed cooperative MARL algorithm {\sc{Rochico}} can output the final diversified multi-agent cooperative policy. All three modules are organically combined to promote the structured diversification emergence. Comparative experiments on four large-scale cooperation tasks show that {\sc{Rochico}} is significantly better than the current SOTA algorithms in terms of exploration efficiency and cooperation strength.

agent, algorithm, intention, (12 more...)

2102.04775

Country:

Asia > China > Shanghai > Shanghai (0.05)
Asia > China > Guangdong Province > Shenzhen (0.04)
North America > United States > Virginia (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)