AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Reinforcement Learning for Relation Classification from Noisy Data

Feng, Jun, Huang, Minlie, Zhao, Li, Yang, Yang, Zhu, Xiaoyan

arXiv.org Machine LearningAug-24-2018

Existing relation classification methods that rely on distant supervision assume that a bag of sentences mentioning an entity pair are all describing a relation for the entity pair. Such methods, performing classification at the bag level, cannot identify the mapping between a relation and a sentence, and largely suffers from the noisy labeling problem. In this paper, we propose a novel model for relation classification at the sentence level from noisy data. The model has two modules: an instance selector and a relation classifier. The instance selector chooses high-quality sentences with reinforcement learning and feeds the selected sentences into the relation classifier, and the relation classifier makes sentence level prediction and provides rewards to the instance selector. The two modules are trained jointly to optimize the instance selection and relation classification processes. Experiment results show that our model can deal with the noise of data effectively and obtains better performance for relation classification at the sentence level.

machine learning, natural language, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1808.08013

Country:

Europe > France (0.05)
Asia > Japan > Shikoku > Kagawa Prefecture > Takamatsu (0.04)
North America > United States > New York (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre:

Research Report > Promising Solution (0.34)
Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

LIFT: Reinforcement Learning in Computer Systems by Learning From Demonstrations

Schaarschmidt, Michael, Kuhnle, Alexander, Ellis, Ben, Fricke, Kai, Gessert, Felix, Yoneki, Eiko

arXiv.org Machine LearningAug-23-2018

Reinforcement learning approaches have long appealed to the data management community due to their ability to learn to control dynamic behavior from raw system performance. Recent successes in combining deep neural networks with reinforcement learning have sparked significant new interest in this domain. However, practical solutions remain elusive due to large training data requirements, algorithmic instability, and lack of standard tools. In this work, we introduce LIFT, an end-to-end software stack for applying deep reinforcement learning to data management tasks. While prior work has frequently explored applications in simulations, LIFT centers on utilizing human expertise to learn from demonstrations, thus lowering online training times. We further introduce TensorForce, a TensorFlow library for applied deep reinforcement learning exposing a unified declarative interface to common RL algorithms, thus providing a backend to LIFT. We demonstrate the utility of LIFT in two case studies in database compound indexing and resource management in stream processing. Results show LIFT controllers initialized from demonstrations can outperform human baselines and heuristics across latency metrics and space usage by up to 70%.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1808.07903

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > District of Columbia > Washington (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Education > Educational Setting > Online (0.69)
Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Proximal Policy Optimization and its Dynamic Version for Sequence Generation

Tuan, Yi-Lin, Zhang, Jinzhi, Li, Yujia, Lee, Hung-yi

arXiv.org Machine LearningAug-23-2018

In sequence generation task, many works use policy gradient for model optimization to tackle the intractable backpropagation issue when maximizing the non-differentiable evaluation metrics or fooling the discriminator in adversarial learning. In this paper, we replace policy gradient with proximal policy optimization (PPO), which is a proved more efficient reinforcement learning algorithm, and propose a dynamic approach for PPO (PPO-dynamic). We demonstrate the efficacy of PPO and PPO-dynamic on conditional sequence generation tasks including synthetic experiment and chit-chat chatbot. The results show that PPO and PPO-dynamic can beat policy gradient by stability and performance.

arxiv preprint arxiv, machine learning, reinforcement learning, (12 more...)

arXiv.org Machine Learning

1808.07982

Country:

Asia > Taiwan (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > Bulgaria (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Data Poisoning Attacks in Contextual Bandits

Ma, Yuzhe, Jun, Kwang-Sung, Li, Lihong, Zhu, Xiaojin

arXiv.org Machine LearningAug-23-2018

We study offline data poisoning attacks in contextual bandits, a class of reinforcement learning problems with important applications in online recommendation and adaptive medical treatment, among others. We provide a general attack framework based on convex optimization and show that by slightly manipulating rewards in the data, an attacker can force the bandit algorithm to pull a target arm for a target contextual vector. The target arm and target contextual vector are both chosen by the attacker. That is, the attacker can hijack the behavior of a contextual bandit. We also investigate the feasibility and the side effects of such attacks, and identify future directions for defense. Experiments on both synthetic and real-world data demonstrate the efficiency of the attack algorithm.

data mining, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1808.0576

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Washington > King County > Kirkland (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)

Add feedback

Artificial General Intelligence Is Here, and Impala Is Its Name - ExtremeTech

#artificialintelligenceAug-22-2018, 00:01:57 GMT

However even these reinforcement learning algorithms couldn't transfer what they'd learned about one task to acquiring a new task. In order to realize this achievement, DeepMind supercharged a reinforcement learning algorithm called A3C. In so-called actor-critic reinforcement learning, of which A3C is one variety, acting and learning are decoupled so that one neural network, the critic, evaluates the other, the actor. Together, they drive the learning process. This was already the state of the art, but DeepMind added a new off-policy correction algorithm called V-trace to the mix, which made the learning more efficient, and crucially, better able to achieve positive transfer between tasks.

artificial general intelligence, machine learning, reinforcement learning, (5 more...)

#artificialintelligence

AI-Alerts: 2018 > 2018-08 > AAAI AI-Alert for Aug 28, 2018 (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.62)

Add feedback

Why does AI stink at certain video games? Researchers made one play Ms. Pac-Man to find out

#artificialintelligenceAug-20-2018, 08:46:14 GMT

STOCKHOLM--Artificial intelligence (AI) can kick butt in games such as Pong and Space Invaders, but it comes off like a common n00b when playing Ms. Pac-Man (pictured). Now, by making AI play six classic arcade games, researchers are closer to figuring out why thinking machines excel at some games and stink at others, they reported last month at the International Conference on Machine Learning here. The team developed a new system for visualizing how Atari-playing AIs operate. They chose Atari because the games are relatively simple and a frequent focus for researchers developing "reinforcement learning" algorithms, AIs that learn behaviors through trial and error. An AI "sees" the screen (as an input of ones and zeroes) and at first randomly responds with commands for "left," "right," "fire," and so on, slowly shaping its strategy as it receives points for certain actions.

artificial intelligence, machine learning, reinforcement learning, (6 more...)

#artificialintelligence

Country: Europe > Sweden > Stockholm > Stockholm (0.26)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Games (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.58)

Add feedback

Learning to Dialogue via Complex Hindsight Experience Replay

Lu, Keting, Zhang, Shiqi, Chen, Xiaoping

arXiv.org Artificial IntelligenceAug-20-2018

Reinforcement learning methods have been used for learning dialogue policies from the experience of conversations. However, learning an effective dialogue policy frequently requires prohibitively many conversations. This is partly because of the sparse rewards in dialogues, and the relatively small number of successful dialogues in early learning phase. Hindsight experience replay (HER) enables an agent to learn from failure, but the vanilla HER is inapplicable to dialogue domains due to dialogue goals being implicit (c.f., explicit goals in manipulation tasks). In this work, we develop two complex HER methods providing different trade-offs between complexity and performance. Experiments were conducted using a realistic user simulator. Results suggest that our HER methods perform better than standard and prioritized experience replay methods (as applied to deep Q-networks) in learning rate, and that our two complex HER methods can be combined to produce the best performance.

machine learning, natural language, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

1808.06497

Country:

North America > United States > New York > Broome County > Binghamton (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Leisure & Entertainment (0.68)
Media > Film (0.46)
Education > Educational Setting (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

Improving Search through A3C Reinforcement Learning based Conversational Agent

Aggarwal, Milan, Arora, Aarushi, Sodhani, Shagun, Krishnamurthy, Balaji

arXiv.org Artificial IntelligenceAug-19-2018

We develop a reinforcement learning based search assistant which can assist users through a set of actions and sequence of interactions to enable them realize their intent. Our approach caters to subjective search where the user is seeking digital assets such as images which is fundamentally different from the tasks which have objective and limited search modalities. Labeled conversational data is generally not available in such search tasks and training the agent through human interactions can be time consuming. We propose a stochastic virtual user which impersonates a real user and can be used to sample user behavior efficiently to train the agent which accelerates the bootstrapping of the agent. We develop A3C algorithm based context preserving architecture which enables the agent to provide contextual assistance to the user. We compare the A3C agent with Q-learning and evaluate its performance on average rewards and state values it obtains with the virtual user in validation episodes. Our experiments show that the agent learns to achieve higher rewards and better states.

agent, deep learning, neural network, (20 more...)

arXiv.org Artificial Intelligence

1709.05638

Country: Asia > India (0.28)

Genre: Research Report (0.40)

Industry:

Media (0.68)
Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)

Add feedback

Reinforcement Learning for Autonomous Defence in Software-Defined Networking

Han, Yi, Rubinstein, Benjamin I. P., Abraham, Tamas, Alpcan, Tansu, De Vel, Olivier, Erfani, Sarah, Hubczenko, David, Leckie, Christopher, Montague, Paul

arXiv.org Artificial IntelligenceAug-17-2018

Despite the successful application of machine learning (ML) in a wide range of domains, adaptability---the very property that makes machine learning desirable---can be exploited by adversaries to contaminate training and evade classification. In this paper, we investigate the feasibility of applying a specific class of machine learning algorithms, namely, reinforcement learning (RL) algorithms, for autonomous cyber defence in software-defined networking (SDN). In particular, we focus on how an RL agent reacts towards different forms of causative attacks that poison its training process, including indiscriminate and targeted, white-box and black-box attacks. In addition, we also study the impact of the attack timing, and explore potential countermeasures such as adversarial training.

machine learning, node, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

1808.0577

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Importance mixing: Improving sample reuse in evolutionary policy search methods

Pourchot, Aloïs, Perrin, Nicolas, Sigaud, Olivier

arXiv.org Machine LearningAug-17-2018

Deep neuroevolution, that is evolutionary policy search methods based on deep neural networks, have recently emerged as a competitor to deep reinforcement learning algorithms due to their better parallelization capabilities. However, these methods still suffer from a far worse sample efficiency. In this paper we investigate whether a mechanism known as "importance mixing" can significantly improve their sample efficiency. We provide a didactic presentation of importance mixing and we explain how it can be extended to reuse more samples. Then, from an empirical comparison based on a simple benchmark, we show that, though it actually provides better sample efficiency, it is still far from the sample efficiency of deep reinforcement learning, though it is more stable.

evolutionary algorithm, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

1808.05832

Country:

Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback