Agents
Strategic Decision-Making in the Presence of Information Asymmetry: Provably Efficient RL with Algorithmic Instruments
Yu, Mengxin, Yang, Zhuoran, Fan, Jianqing
We study offline reinforcement learning under a novel model called strategic MDP, which characterizes the strategic interactions between a principal and a sequence of myopic agents with private types. Due to the bilevel structure and private types, strategic MDP involves information asymmetry between the principal and the agents. We focus on the offline RL problem, where the goal is to learn the optimal policy of the principal concerning a target population of agents based on a pre-collected dataset that consists of historical interactions. The unobserved private types confound such a dataset as they affect both the rewards and observations received by the principal. We propose a novel algorithm, Pessimistic policy Learning with Algorithmic iNstruments (PLAN), which leverages the ideas of instrumental variable regression and the pessimism principle to learn a near-optimal principal's policy in the context of general function approximation. Our algorithm is based on the critical observation that the principal's actions serve as valid instrumental variables. In particular, under a partial coverage assumption on the offline dataset, we prove that PLAN outputs a $1 / \sqrt{K}$-optimal policy with $K$ being the number of collected trajectories. We further apply our framework to some special cases of strategic MDP, including strategic regression, strategic bandit, and noncompliance in recommendation systems.
Decentralized Collaborative Learning with Probabilistic Data Protection
Abstract--We discuss future directions of Blockchain as a collaborative value co-creation platform, in which network participants can gain extra insights that cannot be accessed when disconnected from the others. As such, we propose a decentralized machine learning framework that is carefully designed to respect the values of democracy, diversity, and privacy. Specifically, we propose a federated multi-task learning framework that integrates a privacy-preserving dynamic consensus algorithm. We show that a specific network topology called the expander graph dramatically improves the scalability of global consensus building. We conclude the paper by making some remarks on open problems.
Learning Correlated Equilibria in Mean-Field Games
Muller, Paul, Elie, Romuald, Rowland, Mark, Lauriere, Mathieu, Perolat, Julien, Perrin, Sarah, Geist, Matthieu, Piliouras, Georgios, Pietquin, Olivier, Tuyls, Karl
The designs of many large-scale systems today, from traffic routing environments to smart grids, rely on game-theoretic equilibrium concepts. However, as the size of an $N$-player game typically grows exponentially with $N$, standard game theoretic analysis becomes effectively infeasible beyond a low number of players. Recent approaches have gone around this limitation by instead considering Mean-Field games, an approximation of anonymous $N$-player games, where the number of players is infinite and the population's state distribution, instead of every individual player's state, is the object of interest. The practical computability of Mean-Field Nash equilibria, the most studied Mean-Field equilibrium to date, however, typically depends on beneficial non-generic structural properties such as monotonicity or contraction properties, which are required for known algorithms to converge. In this work, we provide an alternative route for studying Mean-Field games, by developing the concepts of Mean-Field correlated and coarse-correlated equilibria. We show that they can be efficiently learnt in \emph{all games}, without requiring any additional assumption on the structure of the game, using three classical algorithms. Furthermore, we establish correspondences between our notions and those already present in the literature, derive optimality bounds for the Mean-Field - $N$-player transition, and empirically demonstrate the convergence of these algorithms on simple games.
An Entropy-based Measure of Intelligence Degree of System Structures
In this paper, we investigate how to measure the intelligence of systems under specific structures. Two indicators are adopted to characterize the intelligence of a given structure, namely the function diversity of the structure, and the ability to generate order under specific environments. A measure of intelligence degree is proposed, with which the intelligence degree of several basic structures is calculated. It is shown that some structures are indeed "smarter" than the others under the proposed measure. The results add a possible way of revealing the evolution mechanism of natural life and constructing life-like structures with high intelligence degree.
The GENEA Challenge 2022: A large evaluation of data-driven co-speech gesture generation
Yoon, Youngwoo, Wolfert, Pieter, Kucherenko, Taras, Viegas, Carla, Nikolov, Teodor, Tsakov, Mihail, Henter, Gustav Eje
This paper reports on the second GENEA Challenge to benchmark data-driven automatic co-speech gesture generation. Participating teams used the same speech and motion dataset to build gesture-generation systems. Motion generated by all these systems was rendered to video using a standardised visualisation pipeline and evaluated in several large, crowdsourced user studies. Unlike when comparing different research papers, differences in results are here only due to differences between methods, enabling direct comparison between systems. This year's dataset was based on 18 hours of full-body motion capture, including fingers, of different persons engaging in dyadic conversation. Ten teams participated in the challenge across two tiers: full-body and upper-body gesticulation. For each tier we evaluated both the human-likeness of the gesture motion and its appropriateness for the specific speech signal. Our evaluations decouple human-likeness from gesture appropriateness, which previously was a major challenge in the field. The evaluation results are a revolution, and a revelation. Some synthetic conditions are rated as significantly more human-like than human motion capture. To the best of our knowledge, this has never been shown before on a high-fidelity avatar. On the other hand, all synthetic motion is found to be vastly less appropriate for the speech than the original motion-capture recordings. Additional material is available via the project website at https://youngwoo-yoon.github.io/GENEAchallenge2022/
Entropy Enhanced Multi-Agent Coordination Based on Hierarchical Graph Learning for Continuous Action Space
Chen, Yining, Wang, Ke, Song, Guanghua, Jiang, Xiaohong
In most existing studies on large-scale multi-agent coordination, the control methods aim to learn discrete policies for agents with finite choices. They rarely consider selecting actions directly from continuous action spaces to provide more accurate control, which makes them unsuitable for more complex tasks. To solve the control issue due to large-scale multi-agent systems with continuous action spaces, we propose a novel MARL coordination control method that derives stable continuous policies. By optimizing policies with maximum entropy learning, agents improve their exploration in execution and acquire an excellent performance after training. We also employ hierarchical graph attention networks (HGAT) and gated recurrent units (GRU) to improve the scalability and transferability of our method. The experiments show that our method consistently outperforms all baselines in large-scale multi-agent cooperative reconnaissance tasks.
Solving Royal Game of Ur Using Reinforcement Learning
Malhotra, Sidharth, Malik, Girik
Reinforcement Learning has recently surfaced as a very powerful tool to solve complex problems in the domain of board games, wherein an agent is generally required to learn complex strategies and moves based on its own experiences and rewards received. While RL has outperformed existing state-of-the-art methods used for playing simple video games and popular board games, it is yet to demonstrate its capability on ancient games. Here, we solve one such problem, where we train our agents using different methods namely Monte Carlo, Qlearning and Expected Sarsa to learn optimal policy to play the strategic Royal Game of Ur. The state space for our game is complex and large, but our agents show promising results at playing the game and learning important strategic moves. Although it is hard to conclude that when trained with limited resources which algorithm performs better overall, but Expected Sarsa shows promising results when it comes to fastest learning.
DIDER: Discovering Interpretable Dynamically Evolving Relations
Effective understanding of dynamically evolving multiagent interactions is crucial to capturing the underlying behavior of agents in social systems. It is usually challenging to observe these interactions directly, and therefore modeling the latent interactions is essential for realizing the complex behaviors. Recent work on Dynamic Neural Relational Inference (DNRI) captures explicit inter-agent interactions at every step. However, prediction at every step results in noisy interactions and lacks intrinsic interpretability without post-hoc inspection. Moreover, it requires access to ground truth annotations to analyze the predicted interactions, which are hard to obtain. This paper introduces DIDER, Discovering Interpretable Dynamically Evolving Relations, a generic end-to-end interaction modeling framework with intrinsic interpretability. DIDER discovers an interpretable sequence of inter-agent interactions by disentangling the task of latent interaction prediction into sub-interaction prediction and duration estimation. By imposing the consistency of a sub-interaction type over an extended time duration, the proposed framework achieves intrinsic interpretability without requiring any post-hoc inspection. We evaluate DIDER on both synthetic and real-world datasets. The experimental results demonstrate that modeling disentangled and interpretable dynamic relations improves performance on trajectory forecasting tasks.
Development of a CAV-based Intersection Control System and Corridor Level Impact Assessment
Mirbakhsh, Ardeshir, Lee, Joyoung, Besenski, Dejan
This paper presents a signal-free intersection control system for CAVs by combination of a pixel reservation algorithm and a Deep Reinforcement Learning (DRL) decision-making logic, followed by a corridor-level impact assessment of the proposed model. The pixel reservation algorithm detects potential colliding maneuvers and the DRL logic optimizes vehicles' movements to avoid collision and minimize the overall delay at the intersection. The proposed control system is called Decentralized Sparse Coordination System (DSCLS) since each vehicle has its own control logic and interacts with other vehicles in coordinated states only. Due to the chain impact of taking random actions in the DRL's training course, the trained model can deal with unprecedented volume conditions, which poses the main challenge in intersection management. The performance of the developed model is compared with conventional and CAV-based control systems, including fixed traffic lights, actuated traffic lights, and the Longest Queue First (LQF) control system under three volume regimes in a corridor of four intersections in VISSIM software. The simulation result revealed that the proposed model reduces delay by 50%, 29%, and 23% in moderate, high, and extreme volume regimes compared to the other CAV-based control system. Improvements in travel time, fuel consumption, emission, and Surrogate Safety Measures (SSM) are also noticeable.
Explainability in Mechanism Design: Recent Advances and the Road Ahead
Suryanarayana, Sharadhi Alape, Sarne, David, Kraus, Sarit
Designing and implementing explainable systems is seen as the next step towards increasing user trust in, acceptance of and reliance on Artificial Intelligence (AI) systems. While explaining choices made by black-box algorithms such as machine learning and deep learning has occupied most of the limelight, systems that attempt to explain decisions (even simple ones) in the context of social choice are steadily catching up. In this paper, we provide a comprehensive survey of explainability in mechanism design, a domain characterized by economically motivated agents and often having no single choice that maximizes all individual utility functions. We discuss the main properties and goals of explainability in mechanism design, distinguishing them from those of Explainable AI in general. This discussion is followed by a thorough review of the challenges one may face when working on Explainable Mechanism Design and propose a few solution concepts to those.