AITopics

1807.03765

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

#artificialintelligenceJul-9-2018, 16:06:49 GMT

Learning to drive in a day

Read the full blog here: https://wayve.ai/blog/learning-to-dri...

artificial intelligence, machine learning, reinforcement learning, (1 more...)

Technology:

Information Technology > Communications > Social Media (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.33)

arXiv.org Machine LearningJul-9-2018

Temporal Difference Learning with Neural Networks - Study of the Leakage Propagation Problem

Penedones, Hugo, Vincent, Damien, Maennel, Hartmut, Gelly, Sylvain, Mann, Timothy, Barreto, Andre

Temporal-Difference learning (TD) [Sutton, 1988] with function approximation can converge to solutions that are worse than those obtained by Monte-Carlo regression, even in the simple case of on-policy evaluation. To increase our understanding of the problem, we investigate the issue of approximation errors in areas of sharp discontinuities of the value function being further propagated by bootstrap updates. We show empirical evidence of this leakage propagation, and show analytically that it must occur, in a simple Markov chain, when function approximation errors are present. For reversible policies, the result can be interpreted as the tension between two terms of the loss function that TD minimises, as recently described by [Ollivier, 2018]. We show that the upper bounds from [Tsitsiklis and Van Roy, 1997] hold, but they do not imply that leakage propagation occurs and under what conditions. Finally, we test whether the problem could be mitigated with a better state representation, and whether it can be learned in an unsupervised manner, without rewards or privileged information.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

1807.03064

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > United Kingdom > England > Greater London > London (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

arXiv.org Machine LearningJul-8-2018

Memory Augmented Policy Optimization for Program Synthesis with Generalization

Liang, Chen, Norouzi, Mohammad, Berant, Jonathan, Le, Quoc, Lao, Ni

This paper presents Memory Augmented Policy Optimization (MAPO): a novel policy optimization formulation that incorporates a memory buffer of promising trajectories to reduce the variance of policy gradient estimates for deterministic environments with discrete actions. The formulation expresses the expected return objective as a weighted sum of two terms: an expectation over a memory of trajectories with high rewards, and a separate expectation over the trajectories outside the memory. We propose 3 techniques to make an efficient training algorithm for MAPO: (1) distributed sampling from inside and outside memory with an actor-learner architecture; (2) a marginal likelihood constraint over the memory to accelerate training; (3) systematic exploration to discover high reward trajectories. MAPO improves the sample efficiency and robustness of policy gradient, especially on tasks with a sparse reward. We evaluate MAPO on weakly supervised program synthesis from natural language with an emphasis on generalization. On the WikiTableQuestions benchmark we improve the state-of-the-art by 2.5%, achieving an accuracy of 46.2%, and on the WikiSQL benchmark, MAPO achieves an accuracy of 74.9% with only weak supervision, outperforming several strong baselines with full supervision. Our code is open sourced at https://github.com/crazydonkey200/neural-symbolic-machines

logic & formal reasoning, machine learning, trajectory, (21 more...)

1807.02322

Country:

South America > Peru (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
North America > Canada > Saskatchewan > Saskatoon (0.04)
(8 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.87)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningJul-8-2018

Financial Trading as a Game: A Deep Reinforcement Learning Approach

Huang, Chien Yi

An automatic program that generates constant profit from the financial market is lucrative for every market practitioner. Recent advance in deep reinforcement learning provides a framework toward end-to-end training of such trading agent. In this paper, we propose an Markov Decision Process (MDP) model suitable for the financial trading task and solve it with the state-of-the-art deep recurrent Q-network (DRQN) algorithm. We propose several modifications to the existing learning algorithm to make it more suitable under the financial trading setting, namely 1. We employ a substantially small replay memory (only a few hundreds in size) compared to ones used in modern deep reinforcement learning algorithms (often millions in size.) 2. We develop an action augmentation technique to mitigate the need for random exploration by providing extra feedback signals for all actions to the agent. This enables us to use greedy policy over the course of learning and shows strong empirical performance compared to more commonly used epsilon-greedy exploration. However, this technique is specific to financial trading under a few market assumptions. 3. We sample a longer sequence for recurrent neural network training. A side product of this mechanism is that we can now train the agent for every T steps. This greatly reduces training time since the overall computation is down by a factor of T. We combine all of the above into a complete online learning algorithm and validate our approach on the spot foreign exchange market.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

1807.02787

Country: Asia > Taiwan (0.04)

Genre: Research Report (0.50)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

arXiv.org Machine LearningJul-8-2018

Auto Deep Compression by Reinforcement Learning Based Actor-Critic Structure

Hakkak, Hamed

Model-based compression is an effective, facilitating, and expanded model of neural network models with limited computing and low power. However, conventional models of compression techniques utilize crafted features [2,3,12] and explore specialized areas for exploration and design of large spaces in terms of size, speed, and accuracy, which usually have returns Less and time is up. This paper will effectively analyze deep auto compression (ADC) and reinforcement learning strength in an effective sample and space design, and improve the compression quality of the model. The results of compression of the advanced model are obtained without any human effort and in a completely automated way. With a 4- fold reduction in FLOP, the accuracy of 2.8% is higher than the manual compression model for VGG-16 in ImageNet.

arxiv preprint arxiv, machine learning, reinforcement learning, (16 more...)

1807.02886

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

#artificialintelligenceJul-7-2018, 17:05:49 GMT

Using Artificial Intelligence for Crop Production - Intel AI

Recently, the "Deep Greens" team, comprised of Intel AI data scientists and horticultural experts from the Universidad Nacional Autónoma de México (UNAM), competed in, and won, a 24-hour hackathon for the chance to win one of 5 slots to grow cucumbers in an autonomous greenhouse later this year. The competition is to designed to test which team can grow the most cucumbers while reducing the number of total resources needed. By succeeding in the hackathon we now have the chance to test our deep reinforcement learning algorithms in a very novel environment. In this post, we will talk more about the overall challenge, our strategy for the hackathon, and the strategy going forward. The competition is sponsored by the Wageningen University and Research (WUR) of the Netherlands, and the company Tencent*.

cucumber, machine learning, reinforcement learning, (16 more...)

Country:

North America > Mexico (0.25)
Europe > Netherlands (0.25)
North America > United States (0.05)

Genre: Contests & Prizes (0.99)

Industry: Food & Agriculture > Agriculture (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.98)

#artificialintelligenceJul-7-2018, 15:20:56 GMT

A Deep Dive into Reinforcement Learning

Let's take a deep dive into reinforcement learning. In this article, we will tackle a concrete problem with modern libraries such as TensorFlow, TensorBoard, Keras, and OpenAI gym. You will see how to implement one of the fundamental algorithms called deep $Q$-learning to learn its inner workings. Regarding the hardware, the whole code will work on a typical PC and use all found CPU cores (this is handled out of the box by TensorFlow). The problem is called Mountain Car: A car is on a one-dimensional track, positioned between two mountains. The goal is to drive up the mountain on the right (reaching the flag). However, the car's engine is not strong enough to climb the mountain in a single pass. Therefore, the only way to succeed is to drive back and forth to build up momentum. This problem was chosen because it is simple enough to find a solution with reinforcement learning in minutes on a single CPU core. However, it is complex enough to be a good representative.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Genre: Instructional Material (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)

Xiong, Wenhan, Hoang, Thien, Wang, William Yang

DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning

arXiv.org Artificial IntelligenceJul-7-2018

We study the problem of learning to reason in large scale knowledge graphs (KGs). More specifically, we describe a novel reinforcement learning framework for learning multi-hop relational paths: we use a policy-based agent with continuous states based on knowledge graph embeddings, which reasons in a KG vector space by sampling the most promising relation to extend its path. In contrast to prior work, our approach includes a reward function that takes the accuracy, diversity, and efficiency into consideration. Experimentally, we show that our proposed method outperforms a path-ranking based algorithm and knowledge graph embedding methods on Freebase and Never-Ending Language Learning datasets.

machine learning, reinforcement learning, relation, (19 more...)

arXiv.org Artificial Intelligence

1707.0669

Country: North America > United States > California > Santa Barbara County > Santa Barbara (0.14)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceJul-6-2018, 01:27:13 GMT

Learning to run - an example of reinforcement learning deepsense.ai

Turns out a walk in the park is not so simple after all. In fact, it is a complex process done by controlling multiple muscles and coordinating who knows how many motions. If carbon-based lifeforms have been developing these aspects of walking for millions of years, can AI recreate it? Moving by controlling the muscles attached to bones, as humans do it, is way more complicated and harder to recreate than building a robot that can move with engines and hydraulic cylinders. Building a model that can run by controlling human muscles recreated in a simulated environment was the goal of a competition organized at the NIPS 2017 conference.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.56)