Goto

Collaborating Authors

 Search


Perception-in-the-Loop Adversarial Examples

arXiv.org Machine Learning

We present a scalable, black box, perception-in-the-loop technique to find adversarial examples for deep neural network classifiers. Black box means that our procedure only has input-output access to the classifier, and not to the internal structure, parameters, or intermediate confidence values. Perception-in-the-loop means that the notion of proximity between inputs can be directly queried from human participants rather than an arbitrarily chosen metric. Our technique is based on covariance matrix adaptation evolution strategy (CMA-ES), a black box optimization approach. CMA-ES explores the search space iteratively in a black box manner, by generating populations of candidates according to a distribution, choosing the best candidates according to a cost function, and updating the posterior distribution to favor the best candidates. We run CMA-ES using human participants to provide the fitness function, using the insight that the choice of best candidates in CMA-ES can be naturally modeled as a perception task: pick the top $k$ inputs perceptually closest to a fixed input. We empirically demonstrate that finding adversarial examples is feasible using small populations and few iterations. We compare the performance of CMA-ES on the MNIST benchmark with other black-box approaches using $L_p$ norms as a cost function, and show that it performs favorably both in terms of success in finding adversarial examples and in minimizing the distance between the original and the adversarial input. In experiments on the MNIST, CIFAR10, and GTSRB benchmarks, we demonstrate that CMA-ES can find perceptually similar adversarial inputs with a small number of iterations and small population sizes when using perception-in-the-loop. Finally, we show that networks trained specifically to be robust against $L_\infty$ norm can still be susceptible to perceptually similar adversarial examples.


Active Learning with Gaussian Processes for High Throughput Phenotyping

arXiv.org Machine Learning

A looming question that must be solved before robotic plant phenotyping capabilities can have significant impact to crop improvement programs is scalability. High Throughput Phenotyping (HTP) uses robotic technologies to analyze crops in order to determine species with favorable traits, however, the current practices rely on exhaustive coverage and data collection from the entire crop field being monitored under the breeding experiment. This works well in relatively small agricultural fields but can not be scaled to the larger ones, thus limiting the progress of genetics research. In this work, we propose an active learning algorithm to enable an autonomous system to collect the most informative samples in order to accurately learn the distribution of phenotypes in the field with the help of a Gaussian Process model. We demonstrate the superior performance of our proposed algorithm compared to the current practices on sorghum phenotype data collection.


Practical Algorithms for Multi-Stage Voting Rules with Parallel Universes Tiebreaking

arXiv.org Artificial Intelligence

STV and ranked pairs (RP) are two well-studied voting rules for group decision-making. They proceed in multiple rounds, and are affected by how ties are broken in each round. However, the literature is surprisingly vague about how ties should be broken. We propose the first algorithms for computing the set of alternatives that are winners under some tiebreaking mechanism under STV and RP, which is also known as parallel-universes tiebreaking (PUT). Unfortunately, PUT-winners are NP-complete to compute under STV and RP, and standard search algorithms from AI do not apply. We propose multiple DFS-based algorithms along with pruning strategies, heuristics, sampling and machine learning to prioritize search direction to significantly improve the performance. We also propose novel ILP formulations for PUT-winners under STV and RP, respectively. Experiments on synthetic and real-world data show that our algorithms are overall faster than ILP.


Distance-Guided GA-Based Approach to Distributed Data-Intensive Web Service Composition

arXiv.org Artificial Intelligence

Distributed computing which uses Web services as fundamental elements, enables high-speed development of software applications through composing many interoperating, distributed, re-usable, and autonomous services. As a fundamental challenge for service developers, service composition must fulfil functional requirements and optimise Quality of Service (QoS) attributes, simultaneously. On the other hand, huge amounts of data have been created by advances in technologies, which may be exchanged between services. Data-intensive Web services are of great interest to implement data-intensive processes. However, current approaches to Web service composition have omitted either the effect of data, or the distribution of services. Evolutionary Computing (EC) techniques allow for the creation of compositions that meet all the above factors. In this paper, we will develop Genetic Algorithm (GA)-based approach for solving the problem of distributed data-intensive Web service composition (DWSC). In particular, we will introduce two new heuristics, i.e. Longest Common Subsequence(LCS) distance of services, in designing crossover operators. Additionally, a new local search technique incorporating distance of services will be proposed.


Ordinal Monte Carlo Tree Search

arXiv.org Artificial Intelligence

In many problem settings, most notably in game playing, an agent receives a possibly delayed reward for its actions. Often, those rewards are handcrafted and not naturally given. Even simple terminal-only rewards, like winning equals 1 and losing equals -1, can not be seen as an unbiased statement, since these values are chosen arbitrarily, and the behavior of the learner may change with different encodings, such as setting the value of a loss to -0:5, which is often done in practice to encourage learning. It is hard to argue about good rewards and the performance of an agent often depends on the design of the reward signal. In particular, in domains where states by nature only have an ordinal ranking and where meaningful distance information between game state values are not available, a numerical reward signal is necessarily biased. In this paper, we take a look at Monte Carlo Tree Search (MCTS), a popular algorithm to solve MDPs, highlight a reoccurring problem concerning its use of rewards, and show that an ordinal treatment of the rewards overcomes this problem. Using the General Video Game Playing framework we show a dominance of our newly proposed ordinal MCTS algorithm over preference-based MCTS, vanilla MCTS and various other MCTS variants.


Robust and Adaptive Planning under Model Uncertainty

arXiv.org Artificial Intelligence

Planning under model uncertainty is a fundamental problem across many applications of decision making and learning. In this paper, we propose the Robust Adaptive Monte Carlo Planning (RAMCP) algorithm, which allows computation of risk-sensitive Bayes-adaptive policies that optimally trade off exploration, exploitation, and robustness. RAMCP formulates the risk-sensitive planning problem as a two-player zero-sum game, in which an adversary perturbs the agent's belief over the models. We introduce two versions of the RAMCP algorithm. The first, RAMCP-F, converges to an optimal risk-sensitive policy without having to rebuild the search tree as the underlying belief over models is perturbed. The second version, RAMCP-I, improves computational efficiency at the cost of losing theoretical guarantees, but is shown to yield empirical results comparable to RAMCP-F. RAMCP is demonstrated on an n-pull multi-armed bandit problem, as well as a patient treatment scenario.


Text line Segmentation in Compressed Representation of Handwritten Document using Tunneling Algorithm

arXiv.org Artificial Intelligence

In this research work, we perform text line segmentation directly in compressed representation of an unconstrained handwritten document image. In this relation, we make use of text line terminal points which is the current state-of-the-art. The terminal points spotted along both margins (left and right) of a document image for every text line are considered as source and target respectively. The tunneling algorithm uses a single agent (or robot) to identify the coordinate positions in the compressed representation to perform text-line segmentation of the document. The agent starts at a source point and progressively tunnels a path routing in between two adjacent text lines and reaches the probable target. The agent's navigation path from source to the target bypassing obstacles, if any, results in segregating the two adjacent text lines. However, the target point would be known only when the agent reaches the destination; this is applicable for all source points and henceforth we could analyze the correspondence between source and target nodes. Artificial Intelligence in Expert systems, dynamic programming and greedy strategies are employed for every search space while tunneling. An exhaustive experimentation is carried out on various benchmark datasets including ICDAR13 and the performances are reported.


From exploration to control: learning object manipulation skills through novelty search and local adaptation

arXiv.org Artificial Intelligence

Programming a robot to deal with open-ended tasks remains a challenge, in particular if the robot has to manipulate objects. Launching, grasping, pushing or any other object interaction can be simulated but the corresponding models are not reversible and the robot behavior thus cannot be directly deduced. These behaviors are hard to learn without a demonstration as the search space is large and the reward sparse. We propose a method to autonomously generate a diverse repertoire of simple object interaction behaviors in simulation. Our goal is to bootstrap a robot learning and development process with limited informations about what the robot has to achieve and how. This repertoire can be exploited to solve different tasks in reality thanks to a proposed adaptation method or could be used as a training set for data-hungry algorithms. The proposed approach relies on the definition of a goal space and generates a repertoire of trajectories to reach attainable goals, thus allowing the robot to control this goal space. The repertoire is built with an off-the-shelf simulation thanks to a quality diversity algorithm. The result is a set of solutions tested in simulation only. It may result in two different problems: (1) as the repertoire is discrete and finite, it may not contain the trajectory to deal with a given situation or (2) some trajectories may lead to a behavior in reality that differs from simulation because of a reality gap. We propose an approach to deal with both issues by using a local linearization between the motion parameters and the observed effects. Furthermore, we present an approach to update the existing solution repertoire with the tests done on the real robot. The approach has been validated on two different experiments on the Baxter robot: a ball launching and a joystick manipulation tasks.


Learning Loop Invariants for Program Verification

Neural Information Processing Systems

A fundamental problem in program verification concerns inferring loop invariants. The problem is undecidable and even practical instances are challenging. Inspired by how human experts construct loop invariants, we propose a reasoning framework Code2Inv that constructs the solution by multi-step decision making and querying an external program graph memory block. By training with reinforcement learning, Code2Inv captures rich program features and avoids the need for ground truth solutions as supervision. Compared to previous learning tasks in domains with graph-structured data, it addresses unique challenges, such as a binary objective function and an extremely sparse reward that is given by an automated theorem prover only after the complete loop invariant is proposed. We evaluate Code2Inv on a suite of 133 benchmark problems and compare it to three state-of-the-art systems. It solves 106 problems compared to 73 by a stochastic search-based system, 77 by a heuristic search-based system, and 100 by a decision tree learning-based system. Moreover, the strategy learned can be generalized to new programs: compared to solving new instances from scratch, the pre-trained agent is more sample efficient in finding solutions.


Simple random search of static linear policies is competitive for reinforcement learning

Neural Information Processing Systems

Model-free reinforcement learning aims to offer off-the-shelf solutions for controlling dynamical systems without requiring models of the system dynamics. We introduce a model-free random search algorithm for training static, linear policies for continuous control problems. Common evaluation methodology shows that our method matches state-of-the-art sample efficiency on the benchmark MuJoCo locomotion tasks. Nonetheless, more rigorous evaluation reveals that the assessment of performance on these benchmarks is optimistic. We evaluate the performance of our method over hundreds of random seeds and many different hyperparameter configurations for each benchmark task. This extensive evaluation is possible because of the small computational footprint of our method. Our simulations highlight a high variability in performance in these benchmark tasks, indicating that commonly used estimations of sample efficiency do not adequately evaluate the performance of RL algorithms. Our results stress the need for new baselines, benchmarks and evaluation methodology for RL algorithms.