AITopics

2004.07473

Country:

North America > United States > California > San Francisco County > San Francisco (0.06)
Europe > Germany > Berlin (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Portugal (0.04)

Genre: Research Report (0.50)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Bellinger, Colin, Coles, Rory, Crowley, Mark, Tamblyn, Isaac

Reinforcement Learning in a Physics-Inspired Semi-Markov Environment

arXiv.org Artificial IntelligenceApr-15-2020

Reinforcement learning (RL) has been demonstrated to have great potential in many applications of scientific discovery and design. Recent work includes, for example, the design of new structures and compositions of molecules for therapeutic drugs. Much of the existing work related to the application of RL to scientific domains, however, assumes that the available state representation obeys the Markov property. For reasons associated with time, cost, sensor accuracy, and gaps in scientific knowledge, many scientific design and discovery problems do not satisfy the Markov property. Thus, something other than a Markov decision process (MDP) should be used to plan / find the optimal policy. In this paper, we present a physics-inspired semi-Markov RL environment, namely the phase change environment. In addition, we evaluate the performance of value-based RL algorithms for both MDPs and partially observable MDPs (POMDPs) on the proposed environment. Our results demonstrate deep recurrent Q-networks (DRQN) significantly outperform deep Q-networks (DQN), and that DRQNs benefit from training with hindsight experience replay. Implications for the use of semi-Markovian RL and POMDPs for scientific laboratories are also discussed.

agent, change environment, phase change environment, (15 more...)

2004.07333

Country:

North America > Canada > Ontario > National Capital Region > Ottawa (0.14)
North America > United States > District of Columbia > Washington (0.04)
North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

arXiv.org Artificial IntelligenceApr-13-2020

K-spin Hamiltonian for quantum-resolvable Markov decision processes

Jones, Eric B., Graf, Peter, Kapit, Eliot, Jones, Wesley

The Markov decision process is the mathematical formalization underlying the modern field of reinforcement learning when transition and reward functions are unknown. We derive a pseudo-Boolean cost function that is equivalent to a K-spin Hamiltonian representation of the discrete, finite, discounted Markov decision process with infinite horizon. This K-spin Hamiltonian furnishes a starting point from which to solve for an optimal policy using heuristic quantum algorithms such as adiabatic quantum annealing and the quantum approximate optimization algorithm on near-term quantum hardware. In proving that the variational minimization of our Hamiltonian is equivalent to the Bellman optimality condition we establish an interesting analogy with classical field theory. Along with proof-of-concept calculations to corroborate our formulation by simulated and quantum annealing against classical Q-Learning, we analyze the scaling of physical resources required to solve our Hamiltonian on quantum hardware.

hamiltonian, k-spin hamiltonian, optimal policy, (14 more...)

2004.0604

Country:

North America > United States > Colorado > Jefferson County > Golden (0.14)
North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
North America > United States > California > Los Angeles County > Santa Monica (0.04)

Genre: Research Report (0.40)

Industry: Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Boukas, Ioannis, Ernst, Damien, Théate, Thibaut, Bolland, Adrien, Huynen, Alexandre, Buchwald, Martin, Wynants, Christelle, Cornélusse, Bertrand

A Deep Reinforcement Learning Framework for Continuous Intraday Market Bidding

arXiv.org Artificial IntelligenceApr-13-2020

The large integration of variable energy resources is expected to shift a large part of the energy exchanges closer to real-time, where more accurate forecasts are available. In this context, the short-term electricity markets and in particular the intraday market are considered a suitable trading floor for these exchanges to occur. A key component for the successful renewable energy sources integration is the usage of energy storage. In this paper, we propose a novel modelling framework for the strategic participation of energy storage in the European continuous intraday market where exchanges occur through a centralized order book. The goal of the storage device operator is the maximization of the profits received over the entire trading horizon, while taking into account the operational constraints of the unit. The sequential decision-making problem of trading in the intraday market is modelled as a Markov Decision Process. An asynchronous distributed version of the fitted Q iteration algorithm is chosen for solving this problem due to its sample efficiency. The large and variable number of the existing orders in the order book motivates the use of high-level actions and an alternative state representation. Historical data are used for the generation of a large number of artificial trajectories in order to address exploration issues during the learning process. The resulting policy is back-tested and compared against a benchmark strategy that is the current industrial standard. Results indicate that the agent converges to a policy that achieves in average higher total revenues than the benchmark strategy.

agent, deep reinforcement learning framework, storage device, (12 more...)

2004.0594

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
(3 more...)

Genre: Research Report (0.63)

Industry:

Energy > Power Industry (1.00)
Banking & Finance > Trading (1.00)
Energy > Renewable > Hydroelectric (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.87)

Chaplot, Devendra Singh, Gandhi, Dhiraj, Gupta, Saurabh, Gupta, Abhinav, Salakhutdinov, Ruslan

Learning to Explore using Active Neural SLAM

arXiv.org Artificial IntelligenceApr-10-2020

This work presents a modular and hierarchical approach to learn policies for exploring 3D environments, called `Active Neural SLAM'. Our approach leverages the strengths of both classical and learning-based methods, by using analytical path planners with learned SLAM module, and global and local policies. The use of learning provides flexibility with respect to input modalities (in the SLAM module), leverages structural regularities of the world (in global policies), and provides robustness to errors in state estimation (in local policies). Such use of learning within each module retains its benefits, while at the same time, hierarchical decomposition and modular training allow us to sidestep the high sample complexities associated with training end-to-end policies. Our experiments in visually and physically realistic simulated 3D environments demonstrate the effectiveness of our approach over past learning and geometry-based approaches. The proposed model can also be easily transferred to the PointGoal task and was the winning entry of the CVPR 2019 Habitat PointGoal Navigation Challenge.

baseline, neural slam module, prediction, (13 more...)

2004.05155

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan > Honshū > Kansai > Hyogo Prefecture > Kobe (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
(4 more...)

Kamran, Danial, Lopez, Carlos Fernandez, Lauer, Martin, Stiller, Christoph

Risk-Aware High-level Decisions for Automated Driving at Occluded Intersections with Reinforcement Learning

arXiv.org Artificial IntelligenceApr-9-2020

Reinforcement learning is nowadays a popular framework for solving different decision making problems in automated driving. However, there are still some remaining crucial challenges that need to be addressed for providing more reliable policies. In this paper, we propose a generic risk-aware DQN approach in order to learn high level actions for driving through unsignalized occluded intersections. The proposed state representation provides lane based information which allows to be used for multi-lane scenarios. Moreover, we propose a risk based reward function which punishes risky situations instead of only collision failures. Such rewarding approach helps to incorporate risk prediction into our deep Q network and learn more reliable policies which are safer in challenging situations. The efficiency of the proposed approach is compared with a DQN learned with conventional collision based rewarding scheme and also with a rule-based intersection navigation policy. Evaluation results show that the proposed approach outperforms both of these methods. It provides safer actions than collision-aware DQN approach and is less overcautious than the rule-based policy.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2004.0445

Country: Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

arXiv.org Machine LearningApr-9-2020

Deep Reinforcement Learning (DRL): Another Perspective for Unsupervised Wireless Localization

Li, You, Hu, Xin, Zhuang, Yuan, Gao, Zhouzheng, Zhang, Peng, El-Sheimy, Naser

Location is key to spatialize internet-of-things (IoT) data. However, it is challenging to use low-cost IoT devices for robust unsupervised localization (i.e., localization without training data that have known location labels). Thus, this paper proposes a deep reinforcement learning (DRL) based unsupervised wireless-localization method. The main contributions are as follows. (1) This paper proposes an approach to model a continuous wireless-localization process as a Markov decision process (MDP) and process it within a DRL framework. (2) To alleviate the challenge of obtaining rewards when using unlabeled data (e.g., daily-life crowdsourced data), this paper presents a reward-setting mechanism, which extracts robust landmark data from unlabeled wireless received signal strengths (RSS). (3) To ease requirements for model re-training when using DRL for localization, this paper uses RSS measurements together with agent location to construct DRL inputs. The proposed method was tested by using field testing data from multiple Bluetooth 5 smart ear tags in a pasture. Meanwhile, the experimental verification process reflected the advantages and challenges for using DRL in wireless localization.

deep reinforcement learning, drl, localization, (11 more...)

doi: 10.1109/JIOT.2019.2957778

2004.04618

Country:

North America > Canada > Alberta > Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.14)
North America > United States > Texas (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(11 more...)

Genre: Research Report (0.50)

Industry: Information Technology > Smart Houses & Appliances (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.89)

arXiv.org Machine LearningApr-9-2020

Inference in the Stochastic Block Model with a Markovian assignment of the communities

Duchemin, Quentin

Large random graphs have been very popular in the last decade since they are powerful tools to model complex phenomena like interactions on social networks or the spread of a disease. In practical cases, detecting communities of well connected nodes in a graph is a major issue, motivating the study of the Stochastic Block Model (SBM). In this model, each node belongs to a particular community and edges are sampled independently according to a probability depending of the communities of the nodes. Aiming at progressively bridging the gap between models and reality, time evolving random graphs have been recently introduced. In [20], a Stochastic Block Temporal Model is considered where the temporal evolution is modeled through a discrete hidden Markov chain on the nodes membership and where the connection probabilities also evolve through time.

exp, graph, node, (13 more...)

2004.04402

Country:

Europe > United Kingdom (0.04)
Europe > France > Île-de-France (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.88)

Ting, Chee-Ming, Samdin, S. Balqis, Tang, Meini, Ombao, Hernando

Detecting Dynamic Community Structure in Functional Brain Networks Across Individuals: A Multilayer Apporach

arXiv.org Machine LearningApr-9-2020

We present a unified statistical framework for characterizing community structure of brain functional networks that captures variation across individuals and evolution over time. Existing methods for community detection focus only on single-subject analysis of dynamic networks; while recent extensions to multiple-subjects analysis are limited to static networks. To overcome these limitations, we propose a multi-subject, Markov-switching stochastic block model (MSS-SBM) to identify state-related changes in brain community organization over a group of individuals. We first formulate a multilayer extension of SBM to describe the time-dependent, multi-subject brain networks. We develop a novel procedure for fitting the multilayer SBM that builds on multislice modularity maximization which can uncover a common community partition of all layers (subjects) simultaneously. By augmenting with a dynamic Markov switching process, our proposed method is able to capture a set of distinct, recurring temporal states with respect to inter-community interactions over subjects and the change points between them. Simulation shows accurate community recovery and tracking of dynamic community regimes over multilayer networks by the MSS-SBM. Application to task fMRI reveals meaningful non-assortative brain community motifs, e.g., core-periphery structure at the group level, that are associated with language comprehension and motor functions suggesting their putative role in complex information integration. Our approach detected dynamic reconfiguration of modular connectivity elicited by varying task demands and identified unique profiles of intra and inter-community connectivity across different task conditions. The proposed multilayer network representation provides a principled way of detecting synchronous, dynamic modularity in brain networks across subjects.

community structure, connectivity, orbsupmed, (16 more...)

2004.04362

Country:

North America > United States (0.14)
Asia > Middle East > Saudi Arabia (0.04)
Asia > Malaysia (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology:

Information Technology > Communications > Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Information Technology > Data Science > Data Mining (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Stratos, Karl, Wiseman, Sam

Learning Discrete Structured Representations by Adversarially Maximizing Mutual Information

arXiv.org Machine LearningApr-8-2020

We propose learning discrete structured representations from unlabeled data by maximizing the mutual information between a structured latent variable and a target variable. Calculating mutual information is intractable in this setting. Our key technical contribution is an adversarial objective that can be used to tractably estimate mutual information assuming only the feasibility of cross entropy calculation. We develop a concrete realization of this general formulation with Markov distributions over binary encodings. We report critical and unexpected findings on practical aspects of the objective such as the choice of variational priors. We apply our model on document hashing and show that it outperforms current best baselines based on discrete and vector quantized variational autoencoders. It also yields highly compressed interpretable representations.

information, learning discrete structured representation, representation, (13 more...)

2004.03991

Country:

South America (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(5 more...)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment > Sports > Motorsports (1.00)
Law (0.93)
Government > Regional Government (0.68)
Media > Film (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)