Goto

Collaborating Authors

 Edmonton


Gap-Increasing Policy Evaluation for Efficient and Noise-Tolerant Reinforcement Learning

arXiv.org Machine Learning

In real-world applications of reinforcement learning (RL), noise from inherent stochasticity of environments is inevitable. However, current policy evaluation algorithms, which plays a key role in many RL algorithms, are either prone to noise or inefficient. To solve this issue, we introduce a novel policy evaluation algorithm, which we call Gap-increasing RetrAce Policy Evaluation (GRAPE). It leverages two recent ideas: (1) gap-increasing value update operators in advantage learning for noise-tolerance and (2) off-policy eligibility trace in Retrace algorithm for efficient learning. We provide detailed theoretical analysis of the new algorithm that shows its efficiency and noise-tolerance inherited from Retrace and advantage learning. Furthermore, our analysis shows that GRAPE's learning is significantly efficient than that of a simple learning-rate-based approach while keeping the same level of noise-tolerance. We applied GRAPE to control problems and obtained experimental results supporting our theoretical analysis.


Artificial Intelligence Game Talk, University of Alberta, Hex and Chess

#artificialintelligence

U of Alberta created the first Computing Science department in Canada in 1964. It has a long tradition of research in AI (is rated 3rd in the world in machine learning). It has also led in the development of AI for strategy games. The results can be commercialized in non-game applications as well. Among these are Checkers, Chess, Go and Poker, The evening's talks were by Jonathan Schaeffer (computer chess) and Ryan Hayward (the strategy game Hex).


Exponential-Binary State-Space Search

arXiv.org Artificial Intelligence

Iterative deepening search is used in applications where the best cost bound for state-space search is unknown. The iterative deepening process is used to avoid overshooting the appropriate cost bound and doing too much work as a result. However, iterative deepening search also does too much work if the cost bound grows too slowly. This paper proposes a new framework for iterative deepening search called exponential-binary state-space search. The approach interleaves exponential and binary searches to find the desired cost bound, reducing the worst-case overhead from polynomial to logarithmic. Exponential-binary search can be used with bounded depth-first search to improve the worst-case performance of IDA* and with breadth-first heuristic search to improve the worst-case performance of search with inconsistent heuristics.


Ease-of-Teaching and Language Structure from Emergent Communication

arXiv.org Artificial Intelligence

Artificial agents have been shown to learn to communicate when needed to complete a cooperative task. Some level of language structure (e.g., compositionality) has been found in the learned communication protocols. This observed structure is often the result of specific environmental pressures during training. By introducing new agents periodically to replace old ones, sequentially and within a population, we explore such a new pressure -- ease of teaching -- and show its impact on the structure of the resulting language.


Policy Based Inference in Trick-Taking Card Games

arXiv.org Artificial Intelligence

Trick-taking card games feature a large amount of private information that slowly gets revealed through a long sequence of actions. This makes the number of histories exponentially large in the action sequence length, as well as creating extremely large information sets. As a result, these games become too large to solve. To deal with these issues many algorithms employ inference, the estimation of the probability of states within an information set. In this paper, we demonstrate a Policy Based Inference (PI) algorithm that uses player modelling to infer the probability we are in a given state. We perform experiments in the German trick-taking card game Skat, in which we show that this method vastly improves the inference as compared to previous work, and increases the performance of the state-of-the-art Skat AI system Kermit when it is employed into its determinized search algorithm.


Learning Policies from Human Data for Skat

arXiv.org Artificial Intelligence

Decision-making in large imperfect information games is difficult. Thanks to recent success in Poker, Counterfactual Regret Minimization (CFR) methods have been at the forefront of research in these games. However, most of the success in large games comes with the use of a forward model and powerful state abstractions. In trick-taking card games like Bridge or Skat, large information sets and an inability to advance the simulation without fully determinizing the state make forward search problematic. Furthermore, state abstractions can be especially difficult to construct because the precise holdings of each player directly impact move values. In this paper we explore learning model-free policies for Skat from human game data using deep neural networks (DNN). We produce a new state-of-the-art system for bidding and game declaration by introducing methods to a) directly vary the aggressiveness of the bidder and b) declare games based on expected value while mitigating issues with rarely observed state-action pairs. Although cardplay policies learned through imitation are slightly weaker than the current best search-based method, they run orders of magnitude faster. We also explore how these policies could be learned directly from experience in a reinforcement learning setting and discuss the value of incorporating human data for this task.


Multivariate Time Series Classification using Dilated Convolutional Neural Network

arXiv.org Machine Learning

General approach for time series classification is splitting time series to equal size Multivariate time series classification is a high segments using a fixed-length sliding window and extracting value and well-known problem in machine learning handcrafted features from the segments for classification community. Feature extraction is a main step tasks. The features are usually statistical measurements or in classification tasks. Traditional approaches employ features extracted from another domain such Fourier and handcrafted features for classification while Wavelet domain (Jiang & Yin, 2015; Ravi et al., 2017; Lin convolutional neural networks (CNN) are able et al., 2003). In multivariate time series classification, commonly, to extract features automatically. In this paper, information is extracted separately from each variate, we use dilated convolutional neural network for and the features are concatenated for the classification task multivariate time series classification.


AAAI News

AI Magazine

Submissions for HCOMP-19 Are Due in June! The Seventh AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2019) will be held October 28-30 at Skamania Lodge in Washington State near the Columbia Gorge River, just 45 minutes from Portland, Oregon. This year is the 10-year anniversary of the very first HCOMP workshop in Paris, and to celebrate, there will be special events, talks, and panels throughout the conference. HCOMP is the premier venue for disseminating the latest research findings on crowdsourcing and human computation. While artificial intelligence (AI) and human-computer interaction (HCI) represent traditional mainstays of the conference, HCOMP believes strongly in inviting, fostering, and promoting broad, interdisciplinary research.


On the Functional Equivalence of TSK Fuzzy Systems to Neural Networks, Mixture of Experts, CART, and Stacking Ensemble Regression

arXiv.org Artificial Intelligence

Fuzzy systems have achieved great success in numerous applications. However, there are still many challenges in designing an optimal fuzzy system, e.g., how to efficiently train its parameters, how to improve its performance without adding too many parameters, how to balance the trade-off between cooperations and competitions among the rules, how to overcome the curse of dimensionality, etc. Literature has shown that by making appropriate connections between fuzzy systems and other machine learning approaches, good practices from other domains may be used to improve the fuzzy systems, and vice versa. This paper gives an overview on the functional equivalence between Takagi-Sugeno-Kang fuzzy systems and four classic machine learning approaches -- neural networks, mixture of experts, classification and regression trees, and stacking ensemble regression -- for regression problems. We also point out some promising new research directions, inspired by the functional equivalence, that could lead to solutions to the aforementioned problems. To our knowledge, this is so far the most comprehensive overview on the connections between fuzzy systems and other popular machine learning approaches, and hopefully will stimulate more hybridization between different machine learning algorithms.


Improving Search with Supervised Learning in Trick-Based Card Games

arXiv.org Artificial Intelligence

In trick-taking card games, a two-step process of state sampling and evaluation is widely used to approximate move values. While the evaluation component is vital, the accuracy of move value estimates is also fundamentally linked to how well the sampling distribution corresponds the true distribution. Despite this, recent work in trick-taking card game AI has mainly focused on improving evaluation algorithms with limited work on improving sampling. In this paper, we focus on the effect of sampling on the strength of a player and propose a novel method of sampling more realistic states given move history. In particular, we use predictions about locations of individual cards made by a deep neural network --- trained on data from human gameplay - in order to sample likely worlds for evaluation. This technique, used in conjunction with Perfect Information Monte Carlo (PIMC) search, provides a substantial increase in cardplay strength in the popular trick-taking card game of Skat.