University of Waterloo
Multi-Agent Advisor Q-Learning
Ganapathi Subramanian, Sriram (U Waterloo) | Taylor, Matthew E. (University of Alberta) | Larson, Kate (University of Waterloo) | Crowley, Mark (University of Waterloo)
In the last decade, there have been significant advances in multi-agent reinforcement learning (MARL) but there are still numerous challenges, such as high sample complexity and slow convergence to stable policies, that need to be overcome before wide-spread deployment is possible. However, many real-world environments already, in practice, deploy sub-optimal or heuristic approaches for generating policies. An interesting question that arises is how to best use such approaches as advisors to help improve reinforcement learning in multi-agent domains. In this paper, we provide a principled framework for incorporating action recommendations from online suboptimal advisors in multi-agent settings. We describe the problem of ADvising Multiple Intelligent Reinforcement Agents (ADMIRAL) in nonrestrictive general-sum stochastic game environments and present two novel Q-learning based algorithms: ADMIRAL - Decision Making (ADMIRAL-DM) and ADMIRAL - Advisor Evaluation (ADMIRAL-AE), which allow us to improve learning by appropriately incorporating advice from an advisor (ADMIRAL-DM), and evaluate the effectiveness of an advisor (ADMIRAL-AE). We analyze the algorithms theoretically and provide fixed point guarantees regarding their learning in general-sum stochastic games. Furthermore, extensive experiments illustrate that these algorithms: can be used in a variety of environments, have performances that compare favourably to other related baselines, can scale to large state-action spaces, and are robust to poor advice from advisors.
Towards Provably Moral AI Agents in Bottom-Up Learning Frameworks
Shaw, Nolan P. (University of Waterloo) | Stรถckel, Andreas (University of Waterloo) | Orr, Ryan W. (University of Waterloo) | Lidbetter, Thomas F. (University of Waterloo) | Cohen, Robin (University of Waterloo)
We examine moral decision making in autonomous systems as inspired by a central question posed by Rossi with respect to moral preferences: can AI systems based on statistical machine learning (which do not provide a natural way to explain or justify their decisions) be used for embedding morality into a machine in a way that allows us to prove that nothing morally wrong will happen? We argue for an evaluation which is held to the same standards as a human agent, removing the demand that ethical behavior is always achieved. We introduce four key meta-qualities desired for our moral standards, and then proceed to clarify how we can prove that an agent will correctly learn to perform moral actions given a set of samples within certain error bounds. Our group-dynamic approach enables us to demonstrate that the learned models converge to a common function to achieve stability. We further explain a valuable intrinsic consistency check made possible through the derivation of logical statements from the machine learning model. In all, this work proposes an approach for building ethical AI systems, coming from the perspective of artificial intelligence research, and sheds important light on understanding how much learning is required in order for an intelligent agent to behave morally with negligible error.
An Architecture for a Military AI System with Ethical Rules
Wang, Yetian (University of Waterloo) | Friyia, Daniel (University of Waterloo) | Liu, Kanzhe (University of Waterloo) | Cohen, Robin (University of Waterloo)
The current era of computer science has seen a significant increase in the application of machine learning (ML) and knowledge representation (KR). The problem with the current situation regarding ethics and AI is the weaknesses of ML and KR when used separately. ML will โlearnโ ethical behaviour as it is observed and may therefore disagree with human morals. On the other hand, KR is too rigid and can only process scenarios that have been predefined. This paper proposes a solution to the question posed by Rossi (2016) โHow to combine bottom-up learning approaches with top-down rule-based approaches in defining ethical principles for AI systems?โ This system focuses on potential unethical behaviors that are caused by human nature instead of ethical dilemmas caused by technology insufficiency in the wartime scenarios. Our solution is an architecture that combines a classifier to identify targets in wartime scenarios and a rules-based system in the form of ontologies to guide an AI agentโs behaviour in the given circumstance.
Sample-Efficient Learning of Mixtures
Ashtiani, Hassan (University of Waterloo) | Ben-David, Shai (University of Waterloo) | Mehrabian, Abbas (Simons Institute for the Theory of Computing, University of California, Berkeley)
We consider PAC learning of probability distributions (a.k.a. density estimation), where we are given an i.i.d. sample generated from an unknown target distribution, and want to output a distribution that is close to the target in total variation distance. Let F be an arbitrary class of probability distributions, and let F k denote the class of k-mixtures of elements of F. Assuming the existence of a method for learning F with sample complexity m(ฮต), we provide a method for learning F k with sample complexity O((k.log k .m(ฮต))/(ฮต 2 )). Our mixture learning algorithm has the property that, if the F-learner is proper and agnostic, then the F k -learner would be proper and agnostic as well. This general result enables us to improve the best known sample complexity upper bounds for a variety of important mixture classes. First, we show that the class of mixtures of k axis-aligned Gaussians in R d is PAC-learnable in the agnostic setting with O((kd)/(ฮต 4 )) samples, which is tight in k and d up to logarithmic factors. Second, we show that the class of mixtures of k Gaussians in R d is PAC-learnable in the agnostic setting with sample complexity ร((kd 2 )/(ฮต 4 )), which improves the previous known bounds of ร((k 3 .d 2 )/(ฮต 4 )) and ร(k 4 .d 4 /ฮต 2 ) in its dependence on k and d. Finally, we show that the class of mixtures of k log-concave distributions over R d is PAC-learnable using ร(k.d ((d+5)/2) ฮต (-(d+9)/2 )) samples.
Generative Adversarial Networks and Probabilistic Graph Models for Hyperspectral Image Classification
Zhong, Zilong (University of Waterloo) | Li, Jonathan (University of Waterloo)
High spectral dimensionality and the shortage of annotations make hyperspectral image (HSI) classification a challenging problem. Recent studies suggest that convolutional neural networks can learn discriminative spatial features, which play a paramount role in HSI interpretation. However, most of these methods ignore the distinctive spectral-spatial characteristic of hyperspectral data. In addition, a large amount of unlabeled data remains an unexploited gold mine for efficient data use. Therefore, we proposed an integration of generative adversarial networks (GANs) and probabilistic graphical models for HSI classification. Specifically, we used a spectral-spatial generator and a discriminator to identify land cover categories of hyperspectral cubes. Moreover, to take advantage of a large amount of unlabeled data, we adopted a conditional random field to refine the preliminary classification results generated by GANs. Experimental results obtained using two commonly studied datasets demonstrate that the proposed framework achieved encouraging classification accuracy using a small number of data for training.
Towards Neural Speaker Modeling in Multi-Party Conversation: The Task, Dataset, and Models
Meng, Zhao (ETH Zurich) | Mou, Lili (University of Waterloo) | Jin, Zhi (Peking University)
In this paper, we address the problem of speaker classification in multi-party conversation, and collect massive data to facilitate research in this direction. We further investigate temporal-based and content-based models of speakers, and propose several hybrids of them. Experiments show that speaker classification is feasible, and that hybrid models outperform each single component.
Clustering - What Both Theoreticians and Practitioners Are Doing Wrong
Ben-David, Shai (University of Waterloo)
Unsupervised learning is widely recognized as one of the most important challenges facing machine learning nowadays. However, in spite of hundreds of papers on the topic being published every year, current theoretical understanding and practical implementations of such tasks, in particular of clustering, is very rudimentary. This note focuses on clustering. The first challenge I address is model selection---how should a user pick an appropriate clustering tool for a given clustering problem, and how should the parameters of such an algorithmic tool be tuned? In contrast with other common computational tasks, for clustering, different algorithms often yield drastically different outcomes. Therefore, the choice of a clustering algorithm may play a crucial role in the usefulness of an output clustering solution. However, currently there exists no methodical guidance for clustering tool selection for a given clustering task. I argue the severity of this problem and describe some recent proposals aiming to address this crucial lacuna.
RUBER: An Unsupervised Method for Automatic Evaluation of Open-Domain Dialog Systems
Tao, Chongyang (Peking University) | Mou, Lili (University of Waterloo) | Zhao, Dongyan (Peking University) | Yan, Rui (Peking University)
Open-domain human-computer conversation has been attracting increasing attention over the past few years. However, there does not exist a standard automatic evaluation metric for open-domain dialog systems; researchers usually resort to human annotation for model evaluation, which is time- and labor-intensive. In this paper, we propose RUBER, a Referenced metric and Unreferenced metric Blended Evaluation Routine, which evaluates a reply by taking into consideration both a groundtruth reply and a query (previous user-issued utterance). Our metric is learnable, but its training does not require labels of human satisfaction. Hence, RUBER is flexible and extensible to different datasets and languages. Experiments on both retrieval and generative dialog systems show that RUBER has a high correlation with human annotation, and that RUBER has fair transferability over different datasets.
A SAT+CAS Method for Enumerating Williamson Matrices of Even Order
Bright, Curtis (University of Waterloo) | Kotsireas, Ilias (Wilfrid Laurier University) | Ganesh, Vijay (University of Waterloo)
We present for the first time an exhaustive enumeration of Williamson matrices of even order n < 65. The search method relies on the novel SAT+CAS paradigm of coupling SAT solvers with computer algebra systems so as to take advantage of the advances made in both the field of satisfiability checking and the field of symbolic computation. Additionally, we use a programmatic SAT solver which allows conflict clauses to be learned programmatically, through a piece of code specifically tailored to the domain area. Prior to our work, Williamson matrices had only been enumerated for odd orders n < 60, so our work increases the bounds that Williamson matrices have been enumerated up to and provides the first enumeration of Williamson matrices of even order. Our results show that Williamson matrices of even order tend to be much more abundant than those of odd orders. In particular, Williamson matrices exist for every even order n < 65 but do not exist in orders 35, 47, 53, and 59.
Continuous and Parallel: Challenges for a Standard Model of the Mind
Stewart, Terrence C. (University of Waterloo) | Eliasmith, Chris (University of Waterloo)
We believe that a Standard Model of the Mind should take into account continuous state representations, continuous timing, continuous actions, continuous learning, and parallel control loops. For each of these, we describe initial models that we have made exploring these directions. While we have demonstrated that it is possible to construct high-level cognitive models with these features (which are uncommon in most cognitive modeling approaches), there are many theoretical challenges still to be faced to allow these features to interact in useful ways and to characterize what may be gained by including these features.