Goto

Collaborating Authors

 Reinforcement Learning


Reinforcement Learning for Relation Classification from Noisy Data

arXiv.org Machine Learning

Existing relation classification methods that rely on distant supervision assume that a bag of sentences mentioning an entity pair are all describing a relation for the entity pair. Such methods, performing classification at the bag level, cannot identify the mapping between a relation and a sentence, and largely suffers from the noisy labeling problem. In this paper, we propose a novel model for relation classification at the sentence level from noisy data. The model has two modules: an instance selector and a relation classifier. The instance selector chooses high-quality sentences with reinforcement learning and feeds the selected sentences into the relation classifier, and the relation classifier makes sentence level prediction and provides rewards to the instance selector. The two modules are trained jointly to optimize the instance selection and relation classification processes. Experiment results show that our model can deal with the noise of data effectively and obtains better performance for relation classification at the sentence level.


LIFT: Reinforcement Learning in Computer Systems by Learning From Demonstrations

arXiv.org Machine Learning

Reinforcement learning approaches have long appealed to the data management community due to their ability to learn to control dynamic behavior from raw system performance. Recent successes in combining deep neural networks with reinforcement learning have sparked significant new interest in this domain. However, practical solutions remain elusive due to large training data requirements, algorithmic instability, and lack of standard tools. In this work, we introduce LIFT, an end-to-end software stack for applying deep reinforcement learning to data management tasks. While prior work has frequently explored applications in simulations, LIFT centers on utilizing human expertise to learn from demonstrations, thus lowering online training times. We further introduce TensorForce, a TensorFlow library for applied deep reinforcement learning exposing a unified declarative interface to common RL algorithms, thus providing a backend to LIFT. We demonstrate the utility of LIFT in two case studies in database compound indexing and resource management in stream processing. Results show LIFT controllers initialized from demonstrations can outperform human baselines and heuristics across latency metrics and space usage by up to 70%.


Proximal Policy Optimization and its Dynamic Version for Sequence Generation

arXiv.org Machine Learning

In sequence generation task, many works use policy gradient for model optimization to tackle the intractable backpropagation issue when maximizing the non-differentiable evaluation metrics or fooling the discriminator in adversarial learning. In this paper, we replace policy gradient with proximal policy optimization (PPO), which is a proved more efficient reinforcement learning algorithm, and propose a dynamic approach for PPO (PPO-dynamic). We demonstrate the efficacy of PPO and PPO-dynamic on conditional sequence generation tasks including synthetic experiment and chit-chat chatbot. The results show that PPO and PPO-dynamic can beat policy gradient by stability and performance.


Data Poisoning Attacks in Contextual Bandits

arXiv.org Machine Learning

We study offline data poisoning attacks in contextual bandits, a class of reinforcement learning problems with important applications in online recommendation and adaptive medical treatment, among others. We provide a general attack framework based on convex optimization and show that by slightly manipulating rewards in the data, an attacker can force the bandit algorithm to pull a target arm for a target contextual vector. The target arm and target contextual vector are both chosen by the attacker. That is, the attacker can hijack the behavior of a contextual bandit. We also investigate the feasibility and the side effects of such attacks, and identify future directions for defense. Experiments on both synthetic and real-world data demonstrate the efficiency of the attack algorithm.


Artificial General Intelligence Is Here, and Impala Is Its Name - ExtremeTech

#artificialintelligence

However even these reinforcement learning algorithms couldn't transfer what they'd learned about one task to acquiring a new task. In order to realize this achievement, DeepMind supercharged a reinforcement learning algorithm called A3C. In so-called actor-critic reinforcement learning, of which A3C is one variety, acting and learning are decoupled so that one neural network, the critic, evaluates the other, the actor. Together, they drive the learning process. This was already the state of the art, but DeepMind added a new off-policy correction algorithm called V-trace to the mix, which made the learning more efficient, and crucially, better able to achieve positive transfer between tasks.


Why does AI stink at certain video games? Researchers made one play Ms. Pac-Man to find out

#artificialintelligence

STOCKHOLM--Artificial intelligence (AI) can kick butt in games such as Pong and Space Invaders, but it comes off like a common n00b when playing Ms. Pac-Man (pictured). Now, by making AI play six classic arcade games, researchers are closer to figuring out why thinking machines excel at some games and stink at others, they reported last month at the International Conference on Machine Learning here. The team developed a new system for visualizing how Atari-playing AIs operate. They chose Atari because the games are relatively simple and a frequent focus for researchers developing "reinforcement learning" algorithms, AIs that learn behaviors through trial and error. An AI "sees" the screen (as an input of ones and zeroes) and at first randomly responds with commands for "left," "right," "fire," and so on, slowly shaping its strategy as it receives points for certain actions.


Learning to Dialogue via Complex Hindsight Experience Replay

arXiv.org Artificial Intelligence

Reinforcement learning methods have been used for learning dialogue policies from the experience of conversations. However, learning an effective dialogue policy frequently requires prohibitively many conversations. This is partly because of the sparse rewards in dialogues, and the relatively small number of successful dialogues in early learning phase. Hindsight experience replay (HER) enables an agent to learn from failure, but the vanilla HER is inapplicable to dialogue domains due to dialogue goals being implicit (c.f., explicit goals in manipulation tasks). In this work, we develop two complex HER methods providing different trade-offs between complexity and performance. Experiments were conducted using a realistic user simulator. Results suggest that our HER methods perform better than standard and prioritized experience replay methods (as applied to deep Q-networks) in learning rate, and that our two complex HER methods can be combined to produce the best performance.


Improving Search through A3C Reinforcement Learning based Conversational Agent

arXiv.org Artificial Intelligence

We develop a reinforcement learning based search assistant which can assist users through a set of actions and sequence of interactions to enable them realize their intent. Our approach caters to subjective search where the user is seeking digital assets such as images which is fundamentally different from the tasks which have objective and limited search modalities. Labeled conversational data is generally not available in such search tasks and training the agent through human interactions can be time consuming. We propose a stochastic virtual user which impersonates a real user and can be used to sample user behavior efficiently to train the agent which accelerates the bootstrapping of the agent. We develop A3C algorithm based context preserving architecture which enables the agent to provide contextual assistance to the user. We compare the A3C agent with Q-learning and evaluate its performance on average rewards and state values it obtains with the virtual user in validation episodes. Our experiments show that the agent learns to achieve higher rewards and better states.


Reinforcement Learning for Autonomous Defence in Software-Defined Networking

arXiv.org Artificial Intelligence

Despite the successful application of machine learning (ML) in a wide range of domains, adaptability---the very property that makes machine learning desirable---can be exploited by adversaries to contaminate training and evade classification. In this paper, we investigate the feasibility of applying a specific class of machine learning algorithms, namely, reinforcement learning (RL) algorithms, for autonomous cyber defence in software-defined networking (SDN). In particular, we focus on how an RL agent reacts towards different forms of causative attacks that poison its training process, including indiscriminate and targeted, white-box and black-box attacks. In addition, we also study the impact of the attack timing, and explore potential countermeasures such as adversarial training.


Importance mixing: Improving sample reuse in evolutionary policy search methods

arXiv.org Machine Learning

Deep neuroevolution, that is evolutionary policy search methods based on deep neural networks, have recently emerged as a competitor to deep reinforcement learning algorithms due to their better parallelization capabilities. However, these methods still suffer from a far worse sample efficiency. In this paper we investigate whether a mechanism known as "importance mixing" can significantly improve their sample efficiency. We provide a didactic presentation of importance mixing and we explain how it can be extended to reuse more samples. Then, from an empirical comparison based on a simple benchmark, we show that, though it actually provides better sample efficiency, it is still far from the sample efficiency of deep reinforcement learning, though it is more stable.