Agents
What Do We See in Them? Identifying Dimensions of Partner Models for Speech Interfaces Using a Psycholexical Approach
Doyle, Philip R, Clark, Leigh, Cowan, Benjamin R
Perceptions of system competence and communicative ability, termed partner models, play a significant role in speech interface interaction. Yet we do not know what the core dimensions of this concept are. Taking a psycholexical approach, our paper is the first to identify the key dimensions that define partner models in speech agent interaction. Through a repertory grid study (N=21), a review of key subjective questionnaires, an expert review of resulting word pairs and an online study of 356 user of speech interfaces, we identify three key dimensions that make up a users' partner model: 1) perceptions toward competence and capability; 2) assessment of human-likeness; and 3) a system's perceived cognitive flexibility. We discuss the implications for partner modelling as a concept, emphasising the importance of salience and the dynamic nature of these perceptions.
Towards Multi-agent Reinforcement Learning for Wireless Network Protocol Synthesis
Dutta, Hrishikesh, Biswas, Subir
This paper proposes a multi-agent reinforcement learning based medium access framework for wireless networks. The access problem is formulated as a Markov Decision Process (MDP), and solved using reinforcement learning with every network node acting as a distributed learning agent. The solution components are developed step by step, starting from a single-node access scenario in which a node agent incrementally learns to control MAC layer packet loads for reining in self-collisions. The strategy is then scaled up for multi-node fully-connected scenarios by using more elaborate reward structures. It also demonstrates preliminary feasibility for more general partially connected topologies. It is shown that by learning to adjust MAC layer transmission probabilities, the protocol is not only able to attain theoretical maximum throughput at an optimal load, but unlike classical approaches, it can also retain that maximum throughput at higher loading conditions. Additionally, the mechanism is agnostic to heterogeneous loading while preserving that feature. It is also shown that access priorities of the protocol across nodes can be parametrically adjusted. Finally, it is also shown that the online learning feature of reinforcement learning is able to make the protocol adapt to time-varying loading conditions.
Guidance on the Assurance of Machine Learning in Autonomous Systems (AMLAS)
Hawkins, Richard, Paterson, Colin, Picardi, Chiara, Jia, Yan, Calinescu, Radu, Habli, Ibrahim
Machine Learning (ML) is now used in a range of systems with results that are reported to exceed, under certain conditions, human performance. Many of these systems, in domains such as healthcare, automotive and manufacturing, exhibit high degrees of autonomy and are safety critical. Establishing justified confidence in ML forms a core part of the safety case for these systems. In this document we introduce a methodology for the Assurance of Machine Learning for use in Autonomous Systems (AMLAS). AMLAS comprises a set of safety case patterns and a process for (1) systematically integrating safety assurance into the development of ML components and (2) for generating the evidence base for explicitly justifying the acceptable safety of these components when integrated into autonomous system applications. The material in this document is provided as guidance only. No responsibility for loss occasioned to any person acting or refraining from action as a result of this material or any comments made can be accepted by the authors or The University of York.
An Abstraction-based Method to Verify Multi-Agent Deep Reinforcement-Learning Behaviours
Mqirmi, Pierre El, Belardinelli, Francesco, Leรณn, Borja G.
Multi-agent reinforcement learning (RL) often struggles to ensure the safe behaviours of the learning agents, and therefore it is generally not adapted to safety-critical applications. To address this issue, we present a methodology that combines formal verification with (deep) RL algorithms to guarantee the satisfaction of formally-specified safety constraints both in training and testing. The approach we propose expresses the constraints to verify in Probabilistic Computation Tree Logic (PCTL) and builds an abstract representation of the system to reduce the complexity of the verification step. This abstract model allows for model checking techniques to identify a set of abstract policies that meet the safety constraints expressed in PCTL. Then, the agents' behaviours are restricted according to these safe abstract policies. We provide formal guarantees that by using this method, the actions of the agents always meet the safety constraints, and provide a procedure to generate an abstract model automatically. We empirically evaluate and show the effectiveness of our method in a multi-agent environment.
Subdimensional Expansion for Multi-objective Multi-agent Path Finding
Ren, Zhongqiang, Rathinam, Sivakumar, Choset, Howie
Conventional multi-agent path planners typically determine a path that optimizes a single objective, such as path length. Many applications, however, may require multiple objectives, say time-to-completion and fuel use, to be simultaneously optimized in the planning process. Often, these criteria may not be readily compared and sometimes lie in competition with each other. Simply applying standard multi-objective search algorithms to multi-agent path finding may prove to be inefficient because the size of the space of possible solutions, i.e., the Pareto-optimal set, can grow exponentially with the number of agents (the dimension of the search space). This paper presents an approach that bypasses this so-called curse of dimensionality by leveraging our prior multi-agent work with a framework called subdimensional expansion. One example of subdimensional expansion, when applied to A*, is called M* and M* was limited to a single objective function. We combine principles of dominance and subdimensional expansion to create a new algorithm named multi-objective M* (MOM*), which dynamically couples agents for planning only when those agents have to "interact" with each other. MOM* computes the complete Pareto-optimal set for multiple agents efficiently and naturally trades off sub-optimal approximations of the Pareto-optimal set and computational efficiency. Our approach is able to find the complete Pareto-optimal set for problem instances with hundreds of solutions which the standard multi-objective A* algorithms could not find within a bounded time.
Risk Aware and Multi-Objective Decision Making with Distributional Monte Carlo Tree Search
Hayes, Conor F., Reymond, Mathieu, Roijers, Diederik M., Howley, Enda, Mannion, Patrick
In many risk-aware and multi-objective reinforcement learning settings, the utility of the user is derived from the single execution of a policy. In these settings, making decisions based on the average future returns is not suitable. For example, in a medical setting a patient may only have one opportunity to treat their illness. When making a decision, just the expected return -- known in reinforcement learning as the value -- cannot account for the potential range of adverse or positive outcomes a decision may have. Our key insight is that we should use the distribution over expected future returns differently to represent the critical information that the agent requires at decision time. In this paper, we propose Distributional Monte Carlo Tree Search, an algorithm that learns a posterior distribution over the utility of the different possible returns attainable from individual policy executions, resulting in good policies for both risk-aware and multi-objective settings. Moreover, our algorithm outperforms the state-of-the-art in multi-objective reinforcement learning for the expected utility of the returns.
A General Framework for the Logical Representation of Combinatorial Exchange Protocols
Mittelmann, Munyque, Bouveret, Sylvain, Perrussel, Laurent
The goal of this paper is to propose a framework for representing and reasoning about the rules governing a combinatorial exchange. Such a framework is at first interest as long as we want to build up digital marketplaces based on auction, a widely used mechanism for automated transactions. Combinatorial exchange is the most general case of auctions, mixing the double and combinatorial variants: agents bid to trade bundles of goods. Hence the framework should fulfill two requirements: (i) it should enable bidders to express their bids on combinations of goods and (ii) it should allow describing the rules governing some market, namely the legal bids, the allocation and payment rules. To do so, we define a logical language in the spirit of the Game Description Language: the Combinatorial Exchange Description Language is the first language for describing combinatorial exchange in a logical framework. The contribution is two-fold: first, we illustrate the general dimension by representing different kinds of protocols, and second, we show how to reason about auction properties in this machine-processable language.
The 4th International Workshop on Smart Simulation and Modelling for Complex Systems
Su, Xing, Kong, Yan, Li, Weihua
Computer-based modelling and simulation have become useful tools to facilitate humans to understand systems in different domains, such as physics, astrophysics, chemistry, biology, economics, engineering and social science. A complex system is featured with a large number of interacting components (agents, processes, etc.), whose aggregate activities are nonlinear and self-organized. Complex systems are hard to be simulated or modelled by using traditional computational approaches due to complex relationships among system components, distributed features of resources, and dynamics of environments. Meanwhile, smart systems such as multi-agent systems have demonstrated advantages and great potentials in modelling and simulating complex systems.
Hybrid Information-driven Multi-agent Reinforcement Learning
Dawson, William A., Glatt, Ruben, Rusu, Edward, Soper, Braden C., Goldhahn, Ryan A.
Information theoretic sensor management approaches are an ideal solution to state estimation problems when considering the optimal control of multi-agent systems, however they are too computationally intensive for large state spaces, especially when considering the limited computational resources typical of large-scale distributed multi-agent systems. Reinforcement learning (RL) is a promising alternative which can find approximate solutions to distributed optimal control problems that take into account the resource constraints inherent in many systems of distributed agents. However, the RL training can be prohibitively inefficient, especially in low-information environments where agents receive little to no feedback in large portions of the state space. We propose a hybrid information-driven multi-agent reinforcement learning (MARL) approach that utilizes information theoretic models as heuristics to help the agents navigate large sparse state spaces, coupled with information based rewards in an RL framework to learn higher-level policies. This paper presents our ongoing work towards this objective. Our preliminary findings show that such an approach can result in a system of agents that are approximately three orders of magnitude more efficient at exploring a sparse state space than naive baseline metrics. While the work is still in its early stages, it provides a promising direction for future research.
Hybrid Beamforming for mmWave MU-MISO Systems Exploiting Multi-agent Deep Reinforcement Learning
Wang, Qisheng, Li, Xiao, Jin, Shi, Chen, Yijiain
In this letter, we investigate the hybrid beamforming based on deep reinforcement learning (DRL) for millimeter Wave (mmWave) multi-user (MU) multiple-input-single-output (MISO) system. A multiagent DRL method is proposed to solve the exploration efficiency problem in DRL. In the proposed method, prioritized replay buffer and more informative reward are applied to accelerate the convergence. Simulation results show that the proposed architecture achieves higher spectral efficiency and less time consumption than the benchmarks, thus is more suitable for practical applications. To obtain the hybrid precoding matrices, several iterative methods, such as [1]-[4], have been proposed for single-user and multi-user (MU) systems.