Goto

Collaborating Authors

 Agents


Trustworthy Preference Completion in Social Choice

arXiv.org Artificial Intelligence

As from time to time it is impractical to ask agents to provide linear orders over all alternatives, for these partial rankings it is necessary to conduct preference completion. Specifically, the personalized preference of each agent over all the alternatives can be estimated with partial rankings from neighboring agents over subsets of alternatives. However, since the agents' rankings are nondeterministic, where they may provide rankings with noise, it is necessary and important to conduct the trustworthy preference completion. Hence, in this paper firstly, a trust-based anchor-kNN algorithm is proposed to find $k$-nearest trustworthy neighbors of the agent with trust-oriented Kendall-Tau distances, which will handle the cases when an agent exhibits irrational behaviors or provides only noisy rankings. Then, for alternative pairs, a bijection can be built from the ranking space to the preference space, and its certainty and conflict can be evaluated based on a well-built statistical measurement Probability-Certainty Density Function. Therefore, a certain common voting rule for the first $k$ trustworthy neighboring agents based on certainty and conflict can be taken to conduct the trustworthy preference completion. The properties of the proposed certainty and conflict have been studied empirically, and the proposed approach has been experimentally validated compared to state-of-arts approaches with several data sets.


Efficient Querying for Cooperative Probabilistic Commitments

arXiv.org Artificial Intelligence

Multiagent systems can use commitments as the core of a general coordination infrastructure, supporting both cooperative and non-cooperative interactions. Agents whose objectives are aligned, and where one agent can help another achieve greater reward by sacrificing some of its own reward, should choose a cooperative commitment to maximize their joint reward. We present a solution to the problem of how cooperative agents can efficiently find an (approximately) optimal commitment by querying about carefully-selected commitment choices. We prove structural properties of the agents' values as functions of the parameters of the commitment specification, and develop a greedy method for composing a query with provable approximation bounds, which we empirically show can find nearly optimal commitments in a fraction of the time methods that lack our insights require.


A Unified Model for the Two-stage Offline-then-Online Resource Allocation

arXiv.org Artificial Intelligence

Furthermore, upon the arrival of any online agent, we have to decide quickly and irrevocably which offline agent(s) to With the popularity of the Internet, traditional offline match it with. That is mainly due to the low "patience" of resource allocation has evolved into a new the online agents. These features--online arrivals and the form, called online resource allocation. It features real-time decision-making requirement--distinguish OMMs the online arrivals of agents in the system and the from traditional matching markets where the information of real-time decision-making requirement upon the arrival all agents is fully disclosed in advance. of each online agent. Both offline and online OMMs have received significant interest in both computer resource allocation have wide applications in science and operations research communities. There is a various real-world matching markets ranging from large body of research work who studied matching policy ridesharing to crowdsourcing. There are some design for the profit maximization in ridesharing [Ashlagi emerging applications such as rebalancing in bike et al., 2019; Lowalekar et al., 2018; Bei and Zhang, 2018; sharing and trip-vehicle dispatching in ridesharing, Zhao et al., 2019; Dickerson et al., 2018a; Li et al., 2020], which involve a two-stage resource allocation process.


Learning Multi-Arm Manipulation Through Collaborative Teleoperation

arXiv.org Artificial Intelligence

Imitation Learning (IL) is a powerful paradigm to teach robots to perform manipulation tasks by allowing them to learn from human demonstrations collected via teleoperation, but has mostly been limited to single-arm manipulation. However, many real-world tasks require multiple arms, such as lifting a heavy object or assembling a desk. Unfortunately, applying IL to multi-arm manipulation tasks has been challenging -- asking a human to control more than one robotic arm can impose significant cognitive burden and is often only possible for a maximum of two robot arms. To address these challenges, we present Multi-Arm RoboTurk (MART), a multi-user data collection platform that allows multiple remote users to simultaneously teleoperate a set of robotic arms and collect demonstrations for multi-arm tasks. Using MART, we collected demonstrations for five novel two and three-arm tasks from several geographically separated users. From our data we arrived at a critical insight: most multi-arm tasks do not require global coordination throughout its full duration, but only during specific moments. We show that learning from such data consequently presents challenges for centralized agents that directly attempt to model all robot actions simultaneously, and perform a comprehensive study of different policy architectures with varying levels of centralization on our tasks. Finally, we propose and evaluate a base-residual policy framework that allows trained policies to better adapt to the mixed coordination setting common in multi-arm manipulation, and show that a centralized policy augmented with a decentralized residual model outperforms all other models on our set of benchmark tasks. Additional results and videos at https://roboturk.stanford.edu/multiarm .


Human-in-the-Loop Imitation Learning using Remote Teleoperation

arXiv.org Artificial Intelligence

Imitation Learning is a promising paradigm for learning complex robot manipulation skills by reproducing behavior from human demonstrations. However, manipulation tasks often contain bottleneck regions that require a sequence of precise actions to make meaningful progress, such as a robot inserting a pod into a coffee machine to make coffee. Trained policies can fail in these regions because small deviations in actions can lead the policy into states not covered by the demonstrations. Intervention-based policy learning is an alternative that can address this issue -- it allows human operators to monitor trained policies and take over control when they encounter failures. In this paper, we build a data collection system tailored to 6-DoF manipulation settings, that enables remote human operators to monitor and intervene on trained policies. We develop a simple and effective algorithm to train the policy iteratively on new data collected by the system that encourages the policy to learn how to traverse bottlenecks through the interventions. We demonstrate that agents trained on data collected by our intervention-based system and algorithm outperform agents trained on an equivalent number of samples collected by non-interventional demonstrators, and further show that our method outperforms multiple state-of-the-art baselines for learning from the human interventions on a challenging robot threading task and a coffee making task. Additional results and videos at https://sites.google.com/stanford.edu/iwr .


Infinite use of finite means: Zero-Shot Generalization using Compositional Emergent Protocols

arXiv.org Artificial Intelligence

Human language has been described as a system that makes use of finite means to express an unlimited array of thoughts. Of particular interest is the aspect of compositionality, whereby, the meaning of a complex, compound language expression can be deduced from the meaning of its constituent parts. If artificial agents can develop compositional communication protocols akin to human language, they can be made to seamlessly generalize to unseen combinations. However, the real question is, how do we induce compositionality in emergent communication? Studies have recognized the role of curiosity in enabling linguistic development in children. It is this same intrinsic urge that drives us to master complex tasks with decreasing amounts of explicit reward. In this paper, we seek to use this intrinsic feedback in inducing a systematic and unambiguous protolanguage in artificial agents. We show in our experiments, how these rewards can be leveraged in training agents to induce compositionality in absence of any external feedback. Additionally, we introduce Comm-gSCAN, a platform for investigating grounded language acquisition in 2D-grid environments. Using this, we demonstrate how compositionality can enable agents to not only interact with unseen objects, but also transfer skills from one task to other in zero-shot (Can an agent, trained to pull and push twice, pull twice?)


Comprehension and Knowledge

arXiv.org Artificial Intelligence

The ability of an agent to comprehend a sentence is tightly connected to the agent's prior experiences and background knowledge. The paper suggests to interpret comprehension as a modality and proposes a complete bimodal logical system that describes an interplay between comprehension and knowledge modalities.


Epistemic Logic of Know-Who

arXiv.org Artificial Intelligence

The paper suggests a definition of "know who" as a modality using Grove-Halpern semantics of names. It also introduces a logical system that describes the interplay between modalities "knows who", "knows", and "for all agents". The main technical result is a completeness theorem for the proposed system.


Flatland-RL : Multi-Agent Reinforcement Learning on Trains

arXiv.org Artificial Intelligence

Efficient automated scheduling of trains remains a major challenge for modern railway systems. The underlying vehicle rescheduling problem (VRSP) has been a major focus of Operations Research (OR) since decades. Traditional approaches use complex simulators to study VRSP, where experimenting with a broad range of novel ideas is time consuming and has a huge computational overhead. In this paper, we introduce a two-dimensional simplified grid environment called "Flatland" that allows for faster experimentation. Flatland does not only reduce the complexity of the full physical simulation, but also provides an easy-to-use interface to test novel approaches for the VRSP, such as Reinforcement Learning (RL) and Imitation Learning (IL). In order to probe the potential of Machine Learning (ML) research on Flatland, we (1) ran a first series of RL and IL experiments and (2) design and executed a public Benchmark at NeurIPS 2020 to engage a large community of researchers to work on this problem. Our own experimental results, on the one hand, demonstrate that ML has potential in solving the VRSP on Flatland. On the other hand, we identify key topics that need further research. Overall, the Flatland environment has proven to be a robust and valuable framework to investigate the VRSP for railway networks. Our experiments provide a good starting point for further research and for the participants of the NeurIPS 2020 Flatland Benchmark. All of these efforts together have the potential to have a substantial impact on shaping the mobility of the future.


Imitating Interactive Intelligence

arXiv.org Artificial Intelligence

A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language. Here we study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment. This setting nevertheless integrates a number of the central challenges of artificial intelligence (AI) research: complex visual perception and goal-directed physical control, grounded language comprehension and production, and multi-agent social interaction. To build agents that can robustly interact with humans, we would ideally train them while they interact with humans. However, this is presently impractical. Therefore, we approximate the role of the human with another learned agent, and use ideas from inverse reinforcement learning to reduce the disparities between human-human and agent-agent interactive behaviour. Rigorously evaluating our agents poses a great challenge, so we develop a variety of behavioural tests, including evaluation by humans who watch videos of agents or interact directly with them. These evaluations convincingly demonstrate that interactive training and auxiliary losses improve agent behaviour beyond what is achieved by supervised learning of actions alone. Further, we demonstrate that agent capabilities generalise beyond literal experiences in the dataset. Finally, we train evaluation models whose ratings of agents agree well with human judgement, thus permitting the evaluation of new agent models without additional effort. Taken together, our results in this virtual environment provide evidence that large-scale human behavioural imitation is a promising tool to create intelligent, interactive agents, and the challenge of reliably evaluating such agents is possible to surmount.