Goto

Collaborating Authors

 Chalmers University of Technology


Branch-and-Bound for the Precedence Constrained Generalized Traveling Salesman Problem

AAAI Conferences

The Precedence Constrained Generalized Traveling Salesman Problem (PCGTSP) combines the Generalized Traveling Salesman Problem (GTSP) and the Sequential Ordering Problem (SOP). We present a novel branching technique for the GTSP which enables the extension of a powerful pruning technique. This is combined with some modifications of known bounding methods for related problems. The algorithm manages to solve problem instances with 12-26 groups within a minute, and instances with around 50 groups which are denser with precedence constraints within 24 hours.



A New AI Evaluation Cosmos: Ready to Play the Game?

AI Magazine

We report on a series of new platforms and events dealing with AI evaluation that may change the way in which AI systems are compared and their progress is measured. The introduction of a more diverse and challenging set of tasks in these platforms can feed AI research in the years to come, shaping the notion of success and the directions of the field. However, the playground of tasks and challenges presented there may misdirect the field without some meaningful structure and systematic guidelines for its organization and use. Anticipating this issue, we also report on several initiatives and workshops that are putting the focus on analyzing the similarity and dependencies between tasks, their difficulty, what capabilities they really measure and – ultimately – on elaborating new concepts and tools that can arrange tasks and benchmarks into a meaningful taxonomy.


Achieving Privacy in the Adversarial Multi-Armed Bandit

AAAI Conferences

In this paper, we improve the previously best known regret  bound to achieve ε-differential privacy in oblivious adversarial  bandits from O(T 2/3 /ε) to O(√T lnT/ε). This is achieved  by combining a Laplace Mechanism with EXP3. We show that though EXP3 is already differentially private, it leaks a linear  amount of information in T. However, we can improve this  privacy by relying on its intrinsic exponential mechanism for selecting actions. This allows us to reach O(√ ln T)-DP, with a a regret of O(T 2/3 ) that holds against an adaptive adversary, an improvement from the best known of O(T 3/4 ). This is done by using an algorithm that run EXP3 in a mini-batch loop. Finally, we run experiments that clearly demonstrate the validity of our theoretical analysis.


Thompson Sampling for Stochastic Bandits with Graph Feedback

AAAI Conferences

We present a simple set of algorithms based on Thompson Sampling for stochastic bandit problems with graph feedback. Thompson Sampling is generally applicable, without the need to construct complicated upper confidence bounds. As we show in this paper, it has excellent performance in problems with graph feedback, even when the graph structure itself is unknown and/or changing. We provide theoretical guarantees on the Bayesian regret of the algorithm, as well as extensive experi- mental results on real and simulated networks. More specifically, we tested our algorithms on power law, planted partitions and Erdo's–Rényi graphs, as well as on graphs derived from Facebook and Flixster data and show that they clearly outperform related methods that employ upper confidence bounds.


Algorithms for Differentially Private Multi-Armed Bandits

AAAI Conferences

We present differentially private algorithms for the stochastic Multi-Armed Bandit (MAB) problem. This is a problem for applications such as adaptive clinical trials, experiment design, and user-targeted advertising where private information is connected to individual rewards. Our major contribution is to show that there exist (ε,δ) differentially private variants of Upper Confidence Bound algorithms which have optimal regret, O(ε−1 + log T ). This is a significant improvement over previous results, which only achieve poly-log regret O(ε−2 log3 T), because of our use of a novel interval based mechanism. We also substantially improve the bounds of previous family of algorithms which use a continual release mechanism. Experiments clearly validate our theoretical bounds.


The Reinforcement Learning Competition 2014

AI Magazine

Reinforcement learning is one of the most general problems in artificial intelligence. It has been used to model problems in automated experiment design, control, economics, game playing, scheduling and telecommunications. The aim of the reinforcement learning competition is to encourage the development of very general learning agents for arbitrary reinforcement learning problems and to provide a test-bed for the unbiased evaluation of algorithms.


The Reinforcement Learning Competition 2014

AI Magazine

Reinforcement learning is one of the most general problems in artificial intelligence. It has been used to model problems in automated experiment design, control, economics, game playing, scheduling and telecommunications. The aim of the reinforcement learning competition is to encourage the development of very general learning agents for arbitrary reinforcement learning problems and to provide a test-bed for the unbiased evaluation of algorithms.