AITopics

#artificialintelligenceAug-30-2018, 17:06:04 GMT

Weekly Machine Learning Opensource Roundup – Aug. 30, 2018

Dopamine Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. Simple Baselines for Human Pose Estimation and Tracking The project is an official implement of Microsoft ECCV2018 paper "Simple Baselines for Human Pose Estimation and Tracking"

artificial intelligence, machine learning, reinforcement learning, (6 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.43)

Oh, Min-hwan, Iyengar, Garud

Directed Exploration in PAC Model-Free Reinforcement Learning

arXiv.org Machine LearningAug-30-2018

We study an exploration method for model-free RL that generalizes the counter-based exploration bonus methods and takes into account long term exploratory value of actions rather than a single step look-ahead. We propose a model-free RL method that modifies Delayed Q-learning and utilizes the long-term exploration bonus with provable efficiency. We show that our proposed method finds a near-optimal policy in polynomial time (PAC-MDP), and also provide experimental evidence that our proposed algorithm is an efficient exploration method.

delayed q-learning, machine learning, reinforcement learning, (12 more...)

arXiv.org Machine Learning

1808.10552

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Middlesex County > New Brunswick (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Mehta, Ashish, Subramanian, Adithya, Subramanian, Anbumani

Learning End-to-end Autonomous Driving using Guided Auxiliary Supervision

Learning to drive faithfully in highly stochastic urban settings remains an open problem. To that end, we propose a Multi-task Learning from Demonstration (MT-LfD) framework which uses supervised auxiliary task prediction to guide the main task of predicting the driving commands. Our framework involves an end-to-end trainable network for imitating the expert demonstrator's driving commands. The network intermediately predicts visual affordances and action primitives through direct supervision which provide the aforementioned auxiliary supervised guidance. We demonstrate that such joint learning and supervised guidance facilitates hierarchical task decomposition, assisting the agent to learn faster, achieve better driving performance and increases transparency of the otherwise black-box end-to-end network. We run our experiments to validate the MT-LfD framework in CARLA, an open-source urban driving simulator. We introduce multiple non-player agents in CARLA and induce temporal noise in them for realistic stochasticity.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

1808.10393

Country: North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.67)

Industry:

Transportation > Ground > Road (0.84)
Information Technology > Robotics & Automation (0.52)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.66)

Lin, Xi Victoria, Socher, Richard, Xiong, Caiming

Multi-Hop Knowledge Graph Reasoning with Reward Shaping

Multi-hop reasoning is an effective approach for query answering (QA) over incomplete knowledge graphs (KGs). The problem can be formulated in a reinforcement learning (RL) setup, where a policy-based agent sequentially extends its inference path until it reaches a target. However, in an incomplete KG environment, the agent receives low-quality rewards corrupted by false negatives in the training data, which harms generalization at test time. Furthermore, since no golden action sequence is used for training, the agent can be misled by spurious search trajectories that incidentally lead to the correct answer. We propose two modeling advances to address both issues: (1) we reduce the impact of false negative supervision by adopting a pretrained one-hop embedding model to estimate the reward of unobserved facts; (2) we counter the sensitivity to spurious paths of on-policy RL by forcing the agent to explore a diverse set of paths using randomly generated edge masks. Our approach significantly improves over existing path-based KGQA models on several benchmark datasets and is comparable or better than embedding-based models.

machine learning, reinforcement learning, relation, (20 more...)

1808.10568

Country:

North America > United States > Hawaii (0.05)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > New York > Richmond County > New York City (0.04)
(14 more...)

Genre: Research Report (0.64)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.62)
(2 more...)

Application of Self-Play Reinforcement Learning to a Four-Player Game of Imperfect Information

Charlesworth, Henry

We introduce a new virtual environment for simulating a card game known as "Big 2". This is a four-player game of imperfect information with a relatively complicated action space (being allowed to play 1,2,3,4 or 5 card combinations from an initial starting hand of 13 cards). As such it poses a challenge for many current reinforcement learning methods. We then use the recently proposed "Proximal Policy Optimization" algorithm to train a deep neural network to play the game, purely learning via self-play, and find that it is able to reach a level which outperforms amateur human players after only a relatively short amount of training time and without needing to search a tree of future game states.

artificial intelligence, machine learning, neural network, (15 more...)

1808.10442

Country: Asia > East Asia (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Kitchen, Andy, Benedetti, Michela

ExpIt-OOS: Towards Learning from Planning in Imperfect Information Games

The current state of the art in playing many important perfect information games, including Chess and Go, combines planning and deep reinforcement learning with self-play. We extend this approach to imperfect information games and present ExIt-OOS, a novel approach to playing imperfect information games within the Expert Iteration framework and inspired by AlphaZero. We use Online Outcome Sampling, an online search algorithm for imperfect information games in place of MCTS. While training online, our neural strategy is used to improve the accuracy of playouts in OOS, allowing a learning and planning feedback loop for imperfect information games.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

1808.1012

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Texas (0.04)

Genre: Research Report (0.70)

Industry:

Leisure & Entertainment > Games > Poker (0.46)
Leisure & Entertainment > Games > Chess (0.34)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

#artificialintelligenceAug-29-2018, 02:49:00 GMT

Tutorial: Double Deep Q-Learning with Dueling Network Architecture

If you are as fascinated by Deep Q-Learning as I am but never had the time to understand or implement it, this is for you: In one Jupyter notebook I will 1) briefly explain how Reinforcement Learning differs from Supervised Learning, 2) discuss the theory behind Deep Q-Networks (DQN) by telling you where you find the respective explanations in the papers and what they mean and 3) how to implement the components needed to make it work in python and tensorflow. In 2013 a London based startup called DeepMind published a groundbreaking paper called Playing Atari with Deep Reinforcement Learning on arXiv: The authors presented a variant of Reinforcement Learning called Deep Q-Learning that is able to successfully learn control policies for different Atari 2600 games receiving only screen pixels as input and a reward when the game score changes. This is an astonishing result because previously "AIs" used to be limited to one single game, for instance, chess, whereas in this case the types and contents of the games in the Arcade Learning Environment vary significantly and yet no adjustment of the architecture, learning algorithm or hyperparameters is needed. No wonder DeepMind was bought by Google for 500 Million Dollars. The company has since been one of the leading institutions advancing Deep Learning research and a later article discussing DQN has been published in Nature.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Padmakumar, Aishwarya, Stone, Peter, Mooney, Raymond J.

Learning a Policy for Opportunistic Active Learning

arXiv.org Artificial IntelligenceAug-29-2018

Active learning identifies data points to label that are expected to be the most useful in improving a supervised model. Opportunistic active learning incorporates active learning into interactive tasks that constrain possible queries during interactions. Prior work has shown that opportunistic active learning can be used to improve grounding of natural language descriptions in an interactive object retrieval task. In this work, we use reinforcement learning for such an object retrieval task, to learn a policy that effectively trades off task completion with model improvement that would benefit future tasks.

machine learning, predicate, reinforcement learning, (16 more...)

1808.10009

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Texas > Travis County > Austin (0.04)
South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
(5 more...)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Aissa, Wafa, Soulier, Laure, Denoyer, Ludovic

A Reinforcement Learning-driven Translation Model for Search-Oriented Conversational Systems

arXiv.org Machine LearningAug-29-2018

Search-oriented conversational systems rely on information needs expressed in natural language (NL). We focus here on the understanding of NL expressions for building keyword-based queries. We propose a reinforcement-learning-driven translation model framework able to 1) learn the translation from NL expressions to queries in a supervised way, and, 2) to overcome the lack of large-scale dataset by framing the translation model as a word selection approach and injecting relevance feedback in the learning process. Experiments are carried out on two TREC datasets and outline the effectiveness of our approach.

machine learning, natural language, reinforcement learning, (19 more...)

arXiv.org Machine Learning

1809.01495

Country:

North America > Canada > Ontario > Toronto (0.05)
Europe > France > Île-de-France > Paris > Paris (0.05)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.85)