Goto

Collaborating Authors

 Maastricht University


Adapting to Concept Drift in Credit Card Transaction Data Streams Using Contextual Bandits and Decision Trees

AAAI Conferences

Credit card transactions predicted to be fraudulent by automated detection systems are typically handed over to human experts for verification. To limit costs, it is standard practice to select only the most suspicious transactions for investigation. We claim that a trade-off between exploration and exploitation is imperative to enable adaptation to changes in behavior (concept drift). Exploration consists of the selection and investigation of transactions with the purpose of improving predictive models, and exploitation consists of investigating transactions detected to be suspicious. Modeling the detection of fraudulent transactions as rewarding, we use an incremental Regression Tree learner to create clusters of transactions with similar expected rewards. This enables the use of a Contextual Multi-Armed Bandit (CMAB) algorithm to provide the exploration/exploitation trade-off. We introduce a novel variant of a CMAB algorithm that makes use of the structure of this tree, and use Semi-Supervised Learning to grow the tree using unlabeled data. The approach is evaluated on a real dataset and data generated by a simulator that adds concept drift by adapting the behavior of fraudsters to avoid detection. It outperforms frequently used offline models in terms of cumulative rewards, in particular in the presence of concept drift.


An Automated Measure of MDP Similarity for Transfer in Reinforcement Learning

AAAI Conferences

Transfer learning can improve the reinforcement learning of a new task by allowing the agent to reuse knowledge acquired from other source tasks. Despite their success, transfer learning methods rely on having relevant source tasks; transfer from inappropriate tasks can inhibit performance on the new task. For fully autonomous transfer, it is critical to have a method for automatically choosing relevant source tasks, which requires a similarity measure between Markov Decision Processes (MDPs). This issue has received little attention, and is therefore still a largely open problem. This paper presents a data-driven automated similarity measure for MDPs. This novel measure is a significant step toward autonomous reinforcement learning transfer, allowing agents to: (1) characterize when transfer will be useful and, (2) automatically select tasks to use for transfer. The proposed measure is based on the reconstruction error of a restricted Boltzmann machine that attempts to model the behavioral dynamics of the two MDPs being compared. Empirical results illustrate that this measure is correlated with the performance of transfer and therefore can be used to identify similar source tasks for transfer learning.


Search in Imperfect Information Games Using Online Monte Carlo Counterfactual Regret Minimization

AAAI Conferences

Online search in games has always been a core interest of artificial intelligence. Advances made in search for perfect information games (such as Chess, Checkers, Go, and Backgammon) have led to AI capable of defeating the world's top human experts. Search in imperfect information games (such as Poker, Bridge, and Skat) is significantly more challenging due to the complexities introduced by hidden information. In this paper, we present Online Outcome Sampling (OOS), the first imperfect information search algorithm that is guaranteed to converge to an equilibrium strategy in two-player zero-sum games. We show that OOS avoids common problems encountered by existing search algorithms and we experimentally evaluate its convergence rate and practical performance against benchmark strategies in Liar's Dice and a variant of Goofspiel. We show that unlike with Information Set Monte Carlo Tree Search (ISMCTS) the exploitability of the strategies produced by OOS decreases as the amount of search time increases. In practice, OOS performs as well as ISMCTS in head-to-head play while producing strategies with lower exploitability given the same search time.


Theory of Cooperation in Complex Social Networks

AAAI Conferences

This paper presents a theoretical as well as empirical study on the evolution of cooperation on complex social networks, following the continuous action iterated prisoner's dilemma (CAIPD) model. In particular, convergence to network-wide agreement is proven for both evolutionary networks with fixed interaction dynamics, as well as for coevolutionary networks where these dynamics change over time. Moreover, an extension to the CAIPD model is proposed that allows to model influence on the evolution of cooperation in social networks. As such, this work contributes to a better understanding of behavioral change on social networks, and provides a first step towards their active control.


Conditional Restricted Boltzmann Machines for Negotiations in Highly Competitive and Complex Domains

AAAI Conferences

Learning in automated negotiations, while useful, is hard because of the indirect way the target function can be observed and the limited amount of experience available to learn from. This paper proposes two novel opponent modeling techniques based on deep learning methods. Moreover, to improve the learning efficacy of negotiating agents, the second approach is also capable of transferring knowledge efficiently between negotiation tasks. Transfer is conducted by automatically mapping the source knowledge to the target in a rich feature space. Experiments show that using these techniques the proposed strategies outperform existing state-of-the-art agents in highly competitive and complex negotiation domains. Furthermore, the empirical game theoretic analysis reveals the robustness of the proposed strategies.


Monte Carlo *-Minimax Search

AAAI Conferences

This paper introduces Monte Carlo *-Minimax Search (MCMS), a Monte Carlo search algorithm for turned-based, stochastic, two-player, zero-sum games of perfect information. The algorithm is designed for the class of of densely stochastic games; that is, games where one would rarely expect to sample the same successor state multiple times at any particular chance node. Our approach combines sparse sampling techniques from MDP planning with classic pruning techniques developed for adversarial expectimax planning. We compare and contrast our algorithm to the traditional *-Minimax approaches, as well as MCTS enhanced with the Double Progressive Widening, on four games: Pig, EinStein Wurfelt Nicht!, Can't Stop, and Ra. Our results show that MCMS can be competitive with enhanced MCTS variants in some domains, while consistently outperforming the equivalent classic approaches given the same amount of thinking time.


Telepresence Robots as a Research Platform for AI

AAAI Conferences

Recently, various commercial telepresence robots have become available to the broader public. Here, we present the telepresence domain as a research platform for (re-)integrating AI. With MITRO: Maastricht Intelligent Telepresence RObot, we built a low-cost working prototype of a robot system specifically designed for augmented and autonomous telepresence. Telepresence robots can be deployed in a wide range of application domains, and augmented presence with assisted control can greatly improve the experience for the user. The research domains that we are focusing on are human robot interaction, navigation and perception.


Multiagent Learning: Basics, Challenges, and Prospects

AI Magazine

Multiagent systems (MAS) are widely accepted as an important method for solving problems of a distributed nature. A key to the success of MAS is efficient and effective multiagent learning (MAL). The past twenty-five years have seen a great interest and tremendous progress in the field of MAL. This article introduces and overviews this field by presenting its fundamentals, sketching its historical development and describing some key algorithms for MAL.


Multiagent Learning: Basics, Challenges, and Prospects

AI Magazine

Multiagent systems (MAS) are widely accepted as an important method for solving problems of a distributed nature. A key to the success of MAS is efficient and effective multiagent learning (MAL). The past twenty-five years have seen a great interest and tremendous progress in the field of MAL. This article introduces and overviews this field by presenting its fundamentals, sketching its historical development and describing some key algorithms for MAL. Moreover, main challenges that the field is facing today are indentified.


Research Summary

AAAI Conferences

Monte-Carlo Tree Search (MCTS) is an online planning algorithm that combines the ideas of best-first tree search and Monte-Carlo evaluation. Since MCTS is based on sampling, it does not require a transition function in explicit form, but only a generative model of the domain. Because it grows a highly selective search tree guided by its samples, it can handle huge search spaces with large branching factors. By using Monte-Carlo playouts, MCTS can take long-term rewards into account even with distant horizons. Combined with multi-armed bandit algorithms to trade off exploration and exploitation, MCTS has been shown to guarantee asymptotic convergence to the optimal policy, while providing approximations when stopped at any time. The relatively new MCTS approach has started a revolution in computer Go. Furthermore, it has achieved considerable success in domains as diverse as the games of Hex, Amazons, LOA, and Ms. Pacman; in General Game Playing, planning, and optimization. Whereas the focus of previous MCTS research has been on the practical application, current research begins to address the problem of understanding the nature, the underlying principles, of MCTS. A careful understanding of MCTS will lead to more effective search algorithms. Hence, my two interrelated research questions are: How can we formulate models that increase our understanding of how MCTS works? and How can we use the developed understanding to create effective search algorithms? This research summary describes the first steps I undertook in these directions, as well as my plans for future work.