AITopics

1812.10587

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Lu, Chaochao, Schölkopf, Bernhard, Hernández-Lobato, José Miguel

Deconfounding Reinforcement Learning in Observational Settings

arXiv.org Machine LearningDec-26-2018

We propose a general formulation for addressing reinforcement learning (RL) problems in settings with observational data. That is, we consider the problem of learning good policies solely from historical data in which unobserved factors (confounders) affect both observed actions and rewards. Our formulation allows us to extend a representative RL algorithm, the Actor-Critic method, to its deconfounding variant, with the methodology for this extension being easily applied to other RL algorithms. In addition to this, we develop a new benchmark for evaluating deconfounding RL algorithms by modifying the OpenAI Gym environments and the MNIST dataset. Using this benchmark, we demonstrate that the proposed algorithms are superior to traditional RL methods in confounded environments with observational data. To the best of our knowledge, this is the first time that confounders are taken into consideration for addressing full RL problems with observational data. Code is available at https://github.com/CausalRL/DRL.

arxiv preprint arxiv, confounder, equation, (13 more...)

1812.10576

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States > New York (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Machine LearningDec-26-2018

Optimizing Market Making using Multi-Agent Reinforcement Learning

Patel, Yagna

Abstract--In this paper, reinforcement learning is applied to the problem of optimizing market making. A multi-agent reinforcement learning framework is used to optimally place limit orders that lead to successful trades. The framework consists of two agents. The macro-agent optimizes on making the decision to buy, sell, or hold an asset. For the context of this paper, the proposed framework is applied and studied on the Bitcoin cryptocurrency market. The goal of this paper is to show that reinforcement learning is a viable strategy that can be applied to complex problems (with complex environments) such as market making. Algorithmic trading, and in particular high-frequency algorithmic trading (HFT), has gained immense popularity in the recent decade. With advances in hardware and software, algorithmic trading has rapidly become the norm. The increasing popularity of machine learning has slowly made its way to financial markets [1], where it is primarily used to predict price movements of assets. However, there are a number of challenges that these classic machine learning techniques entail: 1) Prediction time: In machine learning, model complexity can have an impact on prediction time. Many times, in the supervised learning setting, neural networks are used to make predictions. Due to the computational complexity that comes with these models, as the model complexity increases, the decision time also increases [2]. In the HFT setting, by the time the model makes a prediction, it may already be too late to take the predicted action. The problem then becomes, how can these added latency costs be incorporated into our prediction? The general rule of thumb in finance is that the historical performance of an asset does not predict the future performance of the asset, i.e. forecasting and predicting the market from historical performance alone is virtually impossible.

agent, limit order, order book, (12 more...)

1812.10252

Genre: Research Report (0.64)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Machine LearningDec-22-2018

Mixed Membership Recurrent Neural Networks

Fazelnia, Ghazal, Ibrahim, Mark, Modarres, Ceena, Wu, Kevin, Paisley, John

Recurrent neural networks (RNNs) have become one of the standard models in sequential data analysis [Rumelhart et al., 1986, Elman, 1990]. At each time step of the RNN, an observation is modeled via a neural network using the observations and hidden states from previous time points. Models such as the RNN, and also the hidden Markov model among others, often implicitly assume a sequence as having a fixed time interval between observations. They also often do not account for group-level effects when multiple sequences are observed and each sequence belongs to one of multiple groups. For example, consider data in the form of a sequence of discrete counts by a set of groups-- e.g., a sequence of purchases (market baskets) for a set of customers, with one sequence per customer. A vanilla RNN implementation would model these sequences using a network with the same parameters, which removes the customer-level information, and according to an enumerated indexing, which removes the time interval information between orders. However, this information is important: customer-specific effects can improve predictive performance for each customer, while an interval of one day versus one month between orders significantly impacts the items likely to be purchased next.

prediction, sequence, topic model, (15 more...)

1812.09645

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.88)

Nikolov, Nikolay, Kirschner, Johannes, Berkenkamp, Felix, Krause, Andreas

Information-Directed Exploration for Deep Reinforcement Learning

arXiv.org Artificial IntelligenceDec-18-2018

Efficient exploration remains a major challenge for reinforcement learning. One reason is that the variability of the returns often depends on the current state and action, and is therefore heteroscedastic. Classical exploration strategies such as upper confidence bound algorithms and Thompson sampling fail to appropriately account for heteroscedasticity, even in the bandit setting. Motivated by recent findings that address this issue in bandits, we propose to use Information-Directed Sampling (IDS) for exploration in reinforcement learning. As our main contribution, we build on recent advances in distributional reinforcement learning and propose a novel, tractable approximation of IDS for deep Q-learning. The resulting exploration strategy explicitly accounts for both parametric uncertainty and heteroscedastic observation noise. We evaluate our method on Atari games and demonstrate a significant improvement over alternative approaches.

computer game, survey article, upstream oil & gas, (20 more...)

1812.07544

Country:

Europe (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.64)

Industry:

Energy > Oil & Gas > Upstream (0.68)
Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Machine LearningDec-18-2018

Machine Learning for Molecular Dynamics on Long Timescales

Noé, Frank

Molecular Dynamics (MD) simulation is widely used to analyze the properties of molecules and materials. Most practical applications, such as comparison with experimental measurements, designing drug molecules, or optimizing materials, rely on statistical quantities, which may be prohibitively expensive to compute from direct long-time MD simulations. Classical Machine Learning (ML) techniques have already had a profound impact on the field, especially for learning low-dimensional models of the long-time dynamics and for devising more efficient sampling schemes for computing long-time statistics. Novel ML methods have the potential to revolutionize long-timescale MD and to obtain interpretable models. ML concepts such as statistical estimator theory, end-to-end learning, representation learning and active learning are highly interesting for the MD researcher and will help to develop new solutions to hard MD problems. With the aim of better connecting the MD and ML research areas and spawning new research on this interface, we define the learning problems in long-timescale MD, present successful approaches and outline some of the unsolved ML problems in this application field.

artificial intelligence, machine learning, representation, (17 more...)

1812.07669

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.67)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
(2 more...)

Kim, Yoon, Wiseman, Sam, Rush, Alexander M.

A Tutorial on Deep Latent Variable Models of Natural Language

arXiv.org Machine LearningDec-18-2018

There has been much recent, exciting work on combining the complementary strengths of latent variable models and deep learning. Latent variable modeling makes it easy to explicitly specify model constraints through conditional independence properties, while deep learning makes it possible to parameterize these conditional likelihoods with powerful function approximators. While these "deep latent variable" models provide a rich, flexible framework for modeling many real-world phenomena, difficulties exist: deep parameterizations of conditional likelihoods usually make posterior inference intractable, and latent variable objectives often complicate backpropagation by introducing points of non-differentiability. This tutorial explores these issues in depth through the lens of variational inference.

artificial intelligence, machine learning, proceedings, (14 more...)

1812.06834

Country: North America > United States (0.45)

Genre:

Overview (1.00)
Instructional Material > Course Syllabus & Notes (1.00)

Industry: Education (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Artificial IntelligenceDec-18-2018

Continual Match Based Training in Pommerman: Technical Report

Peng, Peng, Pang, Liang, Yuan, Yufeng, Gao, Chao

Continual learning is the ability of agents to improve their capacities throughout multiple tasks continually. While recent works in the literature of continual learning mostly focused on developing either particular loss functions or specialized structures of neural network explaining the episodic memory or neural plasticity, we study continual learning from the perspective of the training mechanism. Specifically, we propose a COnitnual Match BAsed Training (COMBAT) framework for training a population of advantage-actor-critic (A2C) agents in Pommerman, a partially observable multi-agent environment with no communication. Following the COMBAT framework, we trained an agent, namely, Navocado, that won the title of the top 1 learning agent in the NeurIPS 2018 Pommerman Competition. Two critical features of our agent are worth mentioning. Firstly, our agent did not learn from any demonstrations. Secondly, our agent is highly reproducible. As a technical report, we articulate the design of state space, action space, reward, and most importantly, the COMBAT framework for our Pommerman agent. We show in the experiments that Pommerman is a perfect environment for studying continual learning, and the agent can improve its performance by continually learning new skills without forgetting the old ones. Finally, the result in the Pommerman Competition verifies the robustness of our agent when competing with various opponents.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

1812.07297

Country: Asia > China (0.15)

Genre: Instructional Material > Course Syllabus & Notes (0.46)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceDec-17-2018

Attention-based Recurrent Neural Network for Urban Vehicle Trajectory Prediction

Choi, Seongjin, Kim, Jiwon, Yeo, Hwasoo

As the number of various positioning sensors and location-based devices increase, a huge amount of spatial and temporal information data is collected and accumulated. These data are expressed as trajectory data by connecting the data points in chronological sequence, and thses data contain movement information of any moving object. Particularly, in this study, urban vehicle trajectory prediction is studied using trajectory data of vehicles in urban traffic network. In the previous work, Recurrent Neural Network model for urban vehicle trajectory prediction is proposed. For the further improvement of the model, in this study, we propose Attention-based Recurrent Neural Network model for urban vehicle trajectory prediction. In this proposed model, we use attention mechanism to incorporate network traffic state data into urban vehicle trajectory prediction. The model is evaluated by using the Bluetooth data collected in Brisbane, Australia, which contains the movement information of private vehicles. The performance of the model is evaluated with 5 metrics, which are BLEU-1, BLEU-2, BLEU-3, BLEU-4, and METEOR. The result shows that ARNN model have better performance compared to RNN model.

artificial intelligence, machine learning, sequence, (17 more...)

1812.07151

Country: Oceania > Australia > Queensland > Brisbane (0.24)

Genre: Research Report > New Finding (0.76)

Industry: Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Alevizos, Elias, Artikis, Alexander, Paliouras, Georgios

Wayeb: a Tool for Complex Event Forecasting

arXiv.org Artificial IntelligenceDec-16-2018

A Complex Event Processing (CEP) system takes as input a stream of events, along with a set of patterns, defining relations among the input events, and detects instances of pattern satisfaction, thus producing an output stream of complex events . Typically, an event has the structure of a tuple of values which might be numerical or categorical, with the event type and timestamp being the most common attributes. Since time is of critical importance for CEP, a temporal formalism is used in order to define the patterns to be detected. Such a pattern imposes temporal (and possibly atemporal) constraints on the input events, which, if satisfied, lead to the detection of a complex event. Efficient processing is of paramount importance since complex events must be detected with very strict latency requirements.

artificial intelligence, machine learning, predicate, (15 more...)

doi: 10.29007/2s9t

1901.01826

Country: Europe > Greece (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.31)