Campbell, Murray
Learning to Teach in Cooperative Multiagent Reinforcement Learning
Omidshafiei, Shayegan, Kim, Dong-Ki, Liu, Miao, Tesauro, Gerald, Riemer, Matthew, Amato, Christopher, Campbell, Murray, How, Jonathan P.
We present a framework and algorithm for peer-to-peer teaching in cooperative multiagent reinforcement learning. Our algorithm, Learning to Coordinate and Teach Reinforcement (LeCTR), trains advising policies by using students' learning progress as a teaching reward. Agents using LeCTR learn to assume the role of a teacher or student at the appropriate moments, exchanging action advice to accelerate the entire learning process. Our algorithm supports teaching heterogeneous teammates, advising under communication constraints, and learns both what and when to advise. LeCTR is demonstrated to outperform the final performance and rate of learning of prior teaching methods on multiple benchmark domains. To our knowledge, this is the first approach for learning to teach in a multiagent setting.
Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering
Wang, Shuohang, Yu, Mo, Jiang, Jing, Zhang, Wei, Guo, Xiaoxiao, Chang, Shiyu, Wang, Zhiguo, Klinger, Tim, Tesauro, Gerald, Campbell, Murray
A popular recent approach to answering open-domain questions is to first search for question-related passages and then apply reading comprehension models to extract answers. Existing methods usually extract answers from single passages independently. But some questions require a combination of evidence from across different sources to answer correctly. In this paper, we propose two models which make use of multiple passages to generate their answers. Both use an answer-reranking approach which reorders the answer candidates generated by an existing state-of-the-art QA model. We propose two methods, namely, strength-based re-ranking and coverage-based re-ranking, to make use of the aggregated evidence from different passages to better determine the answer. Our models have achieved state-of-the-art results on three public open-domain QA datasets: Quasar-T, SearchQA and the open-domain version of TriviaQA, with about 8 percentage points of improvement over the former two datasets.
UbuntuWorld 1.0 LTS - A Platform for Automated Problem Solving & Troubleshooting in the Ubuntu OS
Chakraborti, Tathagata, Talamadupula, Kartik, Fadnis, Kshitij P., Campbell, Murray, Kambhampati, Subbarao
In this paper, we present UbuntuWorld 1.0 LTS - a platform for developing automated technical support agents in the Ubuntu operating system. Specifically, we propose to use the Bash terminal as a simulator of the Ubuntu environment for a learning-based agent and demonstrate the usefulness of adopting reinforcement learning (RL) techniques for basic problem solving and troubleshooting in this environment. We provide a plug-and-play interface to the simulator as a python package where different types of agents can be plugged in and evaluated, and provide pathways for integrating data from online support forums like AskUbuntu into an automated agent's learning process. Finally, we show that the use of this data significantly improves the agent's learning efficiency. We believe that this platform can be adopted as a real-world test bed for research on automated technical support.
UbuntuWorld 1.0 LTS — A Platform for Automated Problem Solving & Troubleshooting in the Ubuntu OS
Chakraborti, Tathagata (Arizona State University) | Talamadupula, Kartik (IBM T.J. Watson Research Center) | Fadnis, Kshitij P. (IBM T.J. Watson Research Center) | Campbell, Murray (IBM T.J. Watson Research Center) | Kambhampati, Subbarao (Arizona State University)
In this paper, we present UbuntuWorld 1.0 LTS - a platform for developing automated technical support agents in the Ubuntu operating system. Specifically, we propose to use the Bash terminal as a simulator of the Ubuntu environment for a learning-based agent and demonstrate the usefulness of adopting reinforcement learning (RL) techniques for basic problem solving and troubleshooting in this environment. We provide a plug-and-play interface to the simulator as a python package where different types of agents can be plugged in and evaluated, and provide pathways for integrating data from online support forums like Ask Ubuntu into an automated agent’s learning process. Finally, we show that the use of this data significantly improves the agent’s learning efficiency. We believe that this platform can be adopted as a real-world test bed for research on automated technical support.
I-athlon: Towards A Multidimensional Turing Test
Adams, Sam S. (IBM T. J. Watson Research Center) | Banavar, Guruduth (IBM T. J. Watson Research Center) | Campbell, Murray (IBM T. J. Watson Research Center)
While the Turing test is a well-known method for evaluating machine intelligence, it has a number of drawbacks that make it problematic as a rigorous and practical test for assessing progress in general-purpose AI. For example, the Turing test is deception based, subjectively evaluated, and narrowly focused on language use. We suggest that a test would benefit from including the following requirements: focus on rational behavior, test several dimensions of intelligence, automate as much as possible, score as objectively as possible, and allow incremental progress to be measured. The approach, which we call the I-athlon, is intended to ultimately enable the community to evaluate progress towards machine intelligence in a practical and repeatable way.
I-athlon: Towards A Multidimensional Turing Test
Adams, Sam S. (IBM T. J. Watson Research Center) | Banavar, Guruduth (IBM T. J. Watson Research Center) | Campbell, Murray (IBM T. J. Watson Research Center)
While the Turing test is a well-known method for evaluating machine intelligence, it has a number of drawbacks that make it problematic as a rigorous and practical test for assessing progress in general-purpose AI. For example, the Turing test is deception based, subjectively evaluated, and narrowly focused on language use. We suggest that a test would benefit from including the following requirements: focus on rational behavior, test several dimensions of intelligence, automate as much as possible, score as objectively as possible, and allow incremental progress to be measured. In this article we propose a methodology for designing a test that consists of a series of events, analogous to the Olympic Decathlon, which complies with these requirements. The approach, which we call the I-athlon, is intended to ultimately enable the community to evaluate progress towards machine intelligence in a practical and repeatable way.