Goto

Collaborating Authors

 Search


Exponential Weights on the Hypercube in Polynomial Time

arXiv.org Machine Learning

We study a general online linear optimization problem(OLO). At each round, a subset of objects from a fixed universe of $n$ objects is chosen, and a linear cost associated with the chosen subset is incurred. We use \textit{regret} as a measure of performance of our algorithms. Regret is the difference between the total cost incurred over all iterations and the cost of the best fixed subset in hindsight. We consider \textit{Full Information}, \textit{Semi-Bandit} and \textit{Bandit} feedback for this problem. Using characteristic vectors of the subsets, this problem reduces to OLO on the $\{0,1\}^n$ hypercube. The Exp2 algorithm and its bandit variants are commonly used strategies for this problem. It was previously unknown if it is possible to run Exp2 on the hypercube in polynomial time. In this paper, we present a polynomial time algorithm called \textit{PolyExp} for OLO on the hypercube. We show that our algorithm is equivalent to both Exp2 on $\{0,1\}^n$ as well as Online Mirror Descent(OMD) with Entropic regularization on $[0,1]^n$ and Bernoulli Sampling. Under $L_\infty$ adversarial losses, in the Full Information case and Semi-Bandit case, analyzing Exp2 directly, gives an expected regret bound of $O(n^{3/2}\sqrt{T})$, whereas PolyExp yields a regret of $O(n\sqrt{T})$. In the Bandit case, analyzing Exp2 directly, gives an expected regret bound of $O(n^{2}\sqrt{T})$, whereas PolyExp yields a regret of $O(n^{3/2}\sqrt{T})$. This implies an improvement on Exp2's regret bound for these settings because of the equivalence. Moreover, PolyExp is minimax optimal in all the three settings as its regret bounds match the $L_\infty$ lowerbounds in \cite{audibert2011minimax}. Finally, we show how to use PolyExp on the $\{-1,+1\}^n$ hypercube, solving an open problem in \cite{bubeck2012towards}.


A survey on policy search algorithms for learning robot controllers in a handful of trials

arXiv.org Machine Learning

Most policy search algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word "big-data", we refer to this challenge as "micro-data reinforcement learning". We show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based policy search), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data algorithms combine these two strategies by varying the kind of model and prior knowledge. The current scientific challenges essentially revolve around scaling up to complex robots (e.g., humanoids), designing generic priors, and optimizing the computing time.


A Game-Based Approximate Verification of Deep Neural Networks with Provable Guarantees

arXiv.org Machine Learning

Despite the improved accuracy of deep neural networks, the discovery of adversarial examples has raised serious safety concerns. In this paper, we study two variants of pointwise robustness, the maximum safe radius problem, which for a given input sample computes the minimum distance to an adversarial example, and the feature robustness problem, which aims to quantify the robustness of individual features to adversarial perturbations. We demonstrate that, under the assumption of Lipschitz continuity, both problems can be approximated using finite optimisation by discretising the input space, and the approximation has provable guarantees, i.e., the error is bounded. We then show that the resulting optimisation problems can be reduced to the solution of two-player turn-based games, where the first player selects features and the second perturbs the image within the feature. While the second player aims to minimise the distance to an adversarial example, depending on the optimisation objective the first player can be cooperative or competitive. We employ an anytime approach to solve the games, in the sense of approximating the value of a game by monotonically improving its upper and lower bounds. The Monte Carlo tree search algorithm is applied to compute upper bounds for both games, and the Admissible A* and the Alpha-Beta Pruning algorithms are, respectively, used to compute lower bounds for the maximum safety radius and feature robustness games. When working on the upper bound of the maximum safe radius problem, our tool demonstrates competitive performance against existing adversarial example crafting algorithms. Furthermore, we show how our framework can be deployed to evaluate pointwise robustness of neural networks in safety-critical applications such as traffic sign recognition in self-driving cars.


Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization

arXiv.org Machine Learning

Adversarial self-play in two-player games has delivered impressive results when used with reinforcement learning algorithms that combine deep neural networks and tree search. Algorithms like AlphaZero and Expert Iteration learn tabula-rasa, producing highly informative training data on the fly. However, the self-play training strategy is not directly applicable to single-player games. Recently, several practically important combinatorial optimization problems, such as the traveling salesman problem and the bin packing problem, have been reformulated as reinforcement learning problems, increasing the importance of enabling the benefits of self-play beyond two-player games. We present the Ranked Reward (R2) algorithm which accomplishes this by ranking the rewards obtained by a single agent over multiple games to create a relative performance metric. Results from applying the R2 algorithm to instances of a two-dimensional bin packing problem show that it outperforms generic Monte Carlo tree search, heuristic algorithms and reinforcement learning algorithms not using ranked rewards.


Automated Machine Learning Hyperparameter Tuning in Python

#artificialintelligence

Tuning machine learning hyperparameters is a tedious yet crucial task, as the performance of an algorithm can be highly dependent on the choice of hyperparameters. Manual tuning takes time away from important steps of the machine learning pipeline like feature engineering and interpreting results. Grid and random search are hands-off, but require long run times because they waste time evaluating unpromising areas of the search space. Increasingly, hyperparameter tuning is done by automated methods that aim to find optimal hyperparameters in less time using an informed search with no manual effort necessary beyond the initial set-up. Bayesian optimization, a model-based method for finding the minimum of a function, has recently been applied to machine learning hyperparameter tuning, with results suggesting this approach can achieve better performance on the test set while requiring fewer iterations than random search. Moreover, there are now a number of Python libraries that make implementing Bayesian hyperparameter tuning simple for any machine learning model.


The 13th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment

AI Magazine

The 13th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE 2017) was held at the Snowbird Ski and Summer Resort in Little Cottonwod Canyon in the Wasatch Range of the Rock Mountains near Salt Lake County, Utah. Along with the main conference presentations, the meeting included two tutorials, three workshops, and invited keynotes. This report summarizes the main conference. It also includes contributions from the organizers of the three workshops.


An AI System Taught Itself How to Solve the Rubik's Cube in Just 44 Hours

#artificialintelligence

A self-taught artificial intelligence (AI) system called DeepCube has mastered solving the Rubik's Cube puzzle in just 44 hours without human intervention. The system's inventors have detailed their design in a paper titled'Solving the Rubik's Cube Without Human Knowledge'. "A generally intelligent agent must be able to teach itself how to solve problems in complex domains with minimal human supervision," write the paper's authors. "Indeed, if we're ever going to achieve a general, human-like machine intelligence, we'll have to develop systems that can learn and then apply those learnings to real-world applications." While many AI systems have been taught to play games, mastering the complexity of a Rubik's Cube posed a unique set of challenges.


Extending Classical Planning with State Constraints: Heuristics and Search for Optimal Planning

Journal of Artificial Intelligence Research

We present a principled way of extending a classical AI planning formalism with systems of state constraints, which relate - sometimes determine - the values of variables in each state traversed by the plan. This extension occupies an attractive middle ground between expressivity and complexity. It enables modelling a new range of problems, as well as formulating more efficient models of classical planning problems. An example of the former is planning-based control of networked physical systems - power networks, for example - in which a local, discrete control action can have global effects on continuous quantities, such as altering flows across the entire network. At the same time, our extension remains decidable as long as the satisfiability of sets of state constraints is decidable, including in the presence of numeric state variables, and we demonstrate that effective techniques for cost-optimal planning known in the classical setting - in particular, relaxation-based admissible heuristics - can be adapted to the extended formalism. In this paper, we apply our approach to constraints in the form of linear or non-linear equations over numeric state variables, but the approach is independent of the type of state constraints, as long as there exists a procedure that decides their consistency. The planner and the constraint solver interact through a well-defined, narrow interface, in which the solver requires no specialisation to the planning context.


AI in Game Playing: Sokoban Solver

arXiv.org Artificial Intelligence

Artificial Intelligence is becoming instrumental in a variety of applications. Games serve as a good breeding ground for trying and testing these algorithms in a sandbox with simpler constraints in comparison to real life. In this project, we aim to develop an AI agent that can solve the classical Japanese game of Sokoban using various algorithms and heuristics and compare their performances through standard metrics. In this progress report, we delve deeper into the ideas furnished in our proposal by (1) pointing out the game mechanics in a straightforward manner (2) describing the states and explaining how it is modeled in our algorithms (3) detailing our algorithms to calculate the best moves for the levels of the game (4) defining proper pruning techniques that are implemented to ameliorate the performance of the algorithm (5) providing results and comparing the developed algorithm using standard metrics of evaluation. We conclude with the next steps we plan to take up to finish this project.


Multi-atomic Annealing Heuristic for Static Dial-a-ride Problem

arXiv.org Artificial Intelligence

Dial-a-ride problem (DARP) deals with the transportation of users between pickup and drop-off locations associated with specified time windows. This paper proposes a novel algorithm called multi-atomic annealing (MATA) to solve static dial-a-ride problem. Two new local search operators (burn and reform), a new construction heuristic and two request sequencing mechanisms (Sorted List and Random List) are developed. Computational experiments conducted on various standard DARP test instances prove that MATA is an expeditious meta-heuristic in contrast to other existing methods. In all experiments, MATA demonstrates the capability to obtain high quality solutions, faster convergence, and quicker attainment of a first feasible solution. It is observed that MATA attains a first feasible solution 29.8 to 65.1% faster, and obtains a final solution that is 3.9 to 5.2% better, when compared to other algorithms within 60 sec.