Goto

Collaborating Authors

 Agents


Tradeoff-Focused Contrastive Explanation for MDP Planning

arXiv.org Artificial Intelligence

End-users' trust in automated agents is important as automated decision-making and planning is increasingly used in many aspects of people's lives. In real-world applications of planning, multiple optimization objectives are often involved. Thus, planning agents' decisions can involve complex tradeoffs among competing objectives. It can be difficult for the end-users to understand why an agent decides on a particular planning solution on the basis of its objective values. As a result, the users may not know whether the agent is making the right decisions, and may lack trust in it. In this work, we contribute an approach, based on contrastive explanation, that enables a multi-objective MDP planning agent to explain its decisions in a way that communicates its tradeoff rationale in terms of the domain-level concepts. We conduct a human subjects experiment to evaluate the effectiveness of our explanation approach in a mobile robot navigation domain. The results show that our approach significantly improves the users' understanding, and confidence in their understanding, of the tradeoff rationale of the planning agent.


Learning Agile Locomotion via Adversarial Training

arXiv.org Artificial Intelligence

Developing controllers for agile locomotion is a long-standing challenge for legged robots. Reinforcement learning (RL) and Evolution Strategy (ES) hold the promise of automating the design process of such controllers. However, dedicated and careful human effort is required to design training environments to promote agility. In this paper, we present a multi-agent learning system, in which a quadruped robot (protagonist) learns to chase another robot (adversary) while the latter learns to escape. We find that this adversarial training process not only encourages agile behaviors but also effectively alleviates the laborious environment design effort. In contrast to prior works that used only one adversary, we find that training an ensemble of adversaries, each of which specializes in a different escaping strategy, is essential for the protagonist to master agility. Through extensive experiments, we show that the locomotion controller learned with adversarial training significantly outperforms carefully designed baselines.


The Effects of Experience on Deception in Human-Agent Negotiation

Journal of Artificial Intelligence Research

Negotiation is the complex social process by which multiple parties come to mutual agreement over a series of issues. As such, it has proven to be a key challenge problem for designing adequately social AIs that can effectively navigate this space. Artificial AI agents that are capable of negotiating must be capable of realizing policies and strategies that govern offer acceptances, offer generation, preference elicitation, and more. But the next generation of agents must also adapt to reflect their users’ experiences.      The best human negotiators tend to have honed their craft through hours of practice and experience. But, not all negotiators agree on which strategic tactics to use, and endorsement of deceptive tactics in particular is a controversial topic for many negotiators. We examine the ways in which deceptive tactics are used and endorsed in non-repeated human negotiation and show that prior experience plays a key role in governing what tactics are seen as acceptable or useful in negotiation. Previous work has indicated that people that negotiate through artificial agent representatives may be more inclined to fairness than those people that negotiate directly. We present a series of three user studies that challenge this initial assumption and expand on this picture by examining the role of past experience.      This work constructs a new scale for measuring endorsement of manipulative negotiation tactics and introduces its use to artificial intelligence research. It continues by presenting the results of a series of three studies that examine how negotiating experience can change what negotiation tactics and strategies human endorse. Study #1 looks at human endorsement of deceptive techniques based on prior negotiating experience as well as representative effects. Study #2 further characterizes the negativity of prior experience in relation to endorsement of deceptive techniques. Finally, in Study #3, we show that the lessons learned from the empirical observations in Study #1 and #2 can in fact be induced—by designing agents that provide a specific type of negative experience, human endorsement of deception can be predictably manipulated.


Dynamic Discrete Choice Estimation with Partially Observable States and Hidden Dynamics

arXiv.org Machine Learning

Dynamic discrete choice models are used to estimate the intertemporal preferences of an agent as described by a reward function based upon observable histories of states and implemented actions. However, in many applications, such as reliability and healthcare, the system state is partially observable or hidden (e.g., the level of deterioration of an engine, the condition of a disease), and the decision maker only has access to information imperfectly correlated with the true value of the hidden state. In this paper, we consider the estimation of a dynamic discrete choice model with state variables and system dynamics that are hidden (or partially observed) to both the agent and the modeler, thus generalizing Rust's model to partially observable cases. We analyze the structural properties of the model and prove that this model is still identifiable if the cardinality of the state space, the discount factor, the distribution of random shocks, and the rewards for a given (reference) action are given. We analyze both theoretically and numerically the potential mis-specification errors that may be incurred when Rust's model is improperly used in partially observable settings. We further apply the developed model to a subset of Rust's dataset for bus engine mileage and replacement decisions. The results show that our model can improve model fit as measured by the $\log$-likelihood function by $17.7\%$ and the $\log$-likelihood ratio test shows that our model statistically outperforms Rust's model. Interestingly, our hidden state model also reveals an economically meaningful route assignment behavior in the dataset which was hitherto ignored, i.e. routes with lower mileage are assigned to buses believed to be in worse condition.


Agent-Based Modeling and Simulation with Swarm - Programmer Books

#artificialintelligence

Swarm-based multi-agent simulation leads to better modeling of tasks in biology, engineering, economics, art, and many other areas. It also facilitates an understanding of complicated phenomena that cannot be solved analytically. Agent-Based Modeling and Simulation with Swarm provides the methodology for a multi-agent-based modeling approach that integrates computational techniques such as artificial life, cellular automata, and bio-inspired optimization. Each chapter gives an overview of the problem, explores state-of-the-art technology in the field, and discusses multi-agent frameworks. The author describes step by step how to assemble algorithms for generating a simulation model, program, method for visualization, and further research tasks.


From Prediction to Prescription: Evolutionary Optimization of Non-Pharmaceutical Interventions in the COVID-19 Pandemic

arXiv.org Artificial Intelligence

Several models have been developed to predict how the COVID-19 pandemic spreads, and how it could be contained with non-pharmaceutical interventions (NPIs) such as social distancing restrictions and school and business closures. This paper demonstrates how evolutionary AI could be used to facilitate the next step, i.e. determining most effective intervention strategies automatically. Through evolutionary surrogate-assisted prescription (ESP), it is possible to generate a large number of candidate strategies and evaluate them with predictive models. In principle, strategies can be customized for different countries and locales, and balance the need to contain the pandemic and the need to minimize their economic impact. While still limited by available data, early experiments suggest that workplace and school restrictions are the most important and need to be designed carefully. It also demonstrates that results of lifting restrictions can be unreliable, and suggests creative ways in which restrictions can be implemented softly, e.g. by alternating them over time. As more data becomes available, the approach can be increasingly useful in dealing with COVID-19 as well as possible future pandemics.


Value-Decomposition Multi-Agent Actor-Critics

arXiv.org Artificial Intelligence

The exploitation of extra state information has been an active research area in multi-agent reinforcement learning (MARL). QMIX represents the joint action-value using a non-negative function approximator and achieves the best performance, by far, on multi-agent benchmarks, StarCraft II micromanagement tasks. However, our experiments show that, in some cases, QMIX is incompatible with A2C, a training paradigm that promotes algorithm training efficiency. To obtain a reasonable trade-off between training efficiency and algorithm performance, we extend value-decomposition to actor-critics that are compatible with A2C and propose a novel actor-critic framework, value-decomposition actor-critics (VDACs). We evaluate VDACs on the testbed of StarCraft II micromanagement tasks and demonstrate that the proposed framework improves median performance over other actor-critic methods. Furthermore, we use a set of ablation experiments to identify the key factors that contribute to the performance of VDACs.


Towards Efficient Connected and Automated Driving System via Multi-agent Graph Reinforcement Learning

arXiv.org Machine Learning

Connected and automated vehicles (CAVs) have attracted more and more attention recently. The fast actuation time allows them having the potential to promote the efficiency and safety of the whole transportation system. Due to technical challenges, there will be a proportion of vehicles that can be equipped with automation while other vehicles are without automation. Instead of learning a reliable behavior for ego automated vehicle, we focus on how to improve the outcomes of the total transportation system by allowing each automated vehicle to learn cooperation with each other and regulate human-driven traffic flow. One of state of the art method is using reinforcement learning to learn intelligent decision making policy. However, direct reinforcement learning framework cannot improve the performance of the whole system. In this article, we demonstrate that considering the problem in multi-agent setting with shared policy can help achieve better system performance than non-shared policy in single-agent setting. Furthermore, we find that utilization of attention mechanism on interaction features can capture the interplay between each agent in order to boost cooperation. To the best of our knowledge, while previous automated driving studies mainly focus on enhancing individual's driving performance, this work serves as a starting point for research on system-level multi-agent cooperation performance using graph information sharing. We conduct extensive experiments in car-following and unsignalized intersection settings. The results demonstrate that CAVs controlled by our method can achieve the best performance against several state of the art baselines.


Near-Optimal Reactive Synthesis Incorporating Runtime Information

arXiv.org Artificial Intelligence

We consider the problem of optimal reactive synthesis - compute a strategy that satisfies a mission specification in a dynamic environment, and optimizes a performance metric. We incorporate task-critical information, that is only available at runtime, into the strategy synthesis in order to improve performance. Existing approaches to utilising such time-varying information require online re-synthesis, which is not computationally feasible in real-time applications. In this paper, we pre-synthesize a set of strategies corresponding to candidate instantiations (pre-specified representative information scenarios). We then propose a novel switching mechanism to dynamically switch between the strategies at runtime while guaranteeing all safety and liveness goals are met. We also characterize bounds on the performance suboptimality. We demonstrate our approach on two examples - robotic motion planning where the likelihood of the position of the robot's goal is updated in real-time, and an air traffic management problem for urban air mobility.


Predictability and Fairness in Social Sensing

arXiv.org Artificial Intelligence

In many applications, one may benefit from the collaborative collection of data for sensing a physical phenomenon, which is known as social sensing. We show how to make social sensing (1) predictable, in the sense of guaranteeing that the number of queries per participant will be independent of the initial state, in expectation, even when the population of participants varies over time, and (2) fair, in the sense of guaranteeing that the number of queries per participant will be equalised among the participants, in expectation, even when the population of participants varies over time. In a use case, we consider a large, high-density network of participating parked vehicles. When awoken by an administrative centre, this network proceeds to search for moving, missing entities of interest using RFID-based techniques. We regulate the number and geographical distribution of the parked vehicles that are "Switched On" and thus actively searching for the moving entity of interest. In doing so, we seek to conserve vehicular energy consumption while, at the same time, maintaining good geographical coverage of the city such that the moving entity of interest is likely to be located within an acceptable time frame. Which vehicle participants are "Switched On" at any point in time is determined periodically through the use of stochastic techniques. This is illustrated on the example of a missing Alzheimer's patient in Melbourne, Australia.