AITopics

2009.00038

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(9 more...)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Giuseppi, Alessandro, Pietrabissa, Antonio

Chance-Constrained Control with Lexicographic Deep Reinforcement Learning

arXiv.org Artificial IntelligenceOct-19-2020

This paper proposes a lexicographic Deep Reinforcement Learning (DeepRL)-based approach to chance-constrained Markov Decision Processes, in which the controller seeks to ensure that the probability of satisfying the constraint is above a given threshold. Standard DeepRL approaches require i) the constraints to be included as additional weighted terms in the cost function, in a multi-objective fashion, and ii) the tuning of the introduced weights during the training phase of the Deep Neural Network (DNN) according to the probability thresholds. The proposed approach, instead, requires to separately train one constraint-free DNN and one DNN associated to each constraint and then, at each time-step, to select which DNN to use depending on the system observed state. The presented solution does not require any hyper-parameter tuning besides the standard DNN ones, even if the probability thresholds changes. A lexicographic version of the well-known DeepRL algorithm DQN is also proposed and validated via simulations.

constraint, machine learning, reinforcement learning, (14 more...)

doi: 10.1109/LCSYS.2020.2979635

2010.09468

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Italy > Lazio > Rome (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

arXiv.org Artificial IntelligenceOct-18-2020

Average-reward model-free reinforcement learning: a systematic review and literature mapping

Dewanto, Vektor, Dunn, George, Eshragh, Ali, Gallagher, Marcus, Roosta, Fred

Model-free reinforcement learning (RL) has been an active area of research and provides a fundamental framework for agent-based learning and decision-making in artificial intelligence. In this paper, we review a specific subset of this literature, namely work that utilizes optimization criteria based on average rewards, in the infinite horizon setting. Average reward RL has the advantage of being the most selective criterion in recurrent (ergodic) Markov decision processes. In comparison to widely-used discounted reward criterion, it also requires no discount factor, which is a critical hyperparameter, and properly aligns the optimization and performance metrics. Motivated by the solo survey by Mahadevan (1996a), we provide an updated review of work in this area and extend it to cover policy-iteration and function approximation methods (in addition to the value-iteration and tabular counterparts). We also identify and discuss opportunities for future work.

approximation, machine learning, reinforcement learning, (16 more...)

2010.0892

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Oceania > Australia > Queensland (0.04)
Asia > Middle East > Jordan (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)

Bresler, Guy, Buhai, Rares-Darius

Learning Restricted Boltzmann Machines with Sparse Latent Variables

arXiv.org Machine LearningOct-17-2020

Restricted Boltzmann Machines (RBMs) are a common family of undirected graphical models with latent variables. An RBM is described by a bipartite graph, with all observed variables in one layer and all latent variables in the other. We consider the task of learning an RBM given samples generated according to it. The best algorithms for this task currently have time complexity $\tilde{O}(n^2)$ for ferromagnetic RBMs (i.e., with attractive potentials) but $\tilde{O}(n^d)$ for general RBMs, where $n$ is the number of observed variables and $d$ is the maximum degree of a latent variable. Let the MRF neighborhood of an observed variable be its neighborhood in the Markov Random Field of the marginal distribution of the observed variables. In this paper, we give an algorithm for learning general RBMs with time complexity $\tilde{O}(n^{2^s+1})$, where $s$ is the maximum number of latent variables connected to the MRF neighborhood of an observed variable. This is an improvement when $s < \log_2 (d-1)$, which corresponds to RBMs with sparse latent variables. Furthermore, we give a version of this learning algorithm that recovers a model with small prediction error and whose sample complexity is independent of the minimum potential in the Markov Random Field of the observed variables. This is of interest because the sample complexity of current algorithms scales with the inverse of the minimum potential, which cannot be controlled in terms of natural properties of the RBM.

algorithm, artificial intelligence, machine learning, (15 more...)

2006.04166

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Colorado (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Artificial IntelligenceOct-17-2020

DeepAveragers: Offline Reinforcement Learning by Solving Derived Non-Parametric MDPs

Shrestha, Aayam, Lee, Stefan, Tadepalli, Prasad, Fern, Alan

We study an approach to offline reinforcement learning (RL) based on optimally solving finitely-represented MDPs derived from a static dataset of experience. This approach can be applied on top of any learned representation and has the potential to easily support multiple solution objectives as well as zero-shot adjustment to changing environments and goals. Our main contribution is to introduce the Deep Averagers with Costs MDP (DAC-MDP) and to investigate its solutions for offline RL. DAC-MDPs are a nonparametric model that can leverage deep representations and account for limited data by introducing costs for exploiting under-represented parts of the model. In theory, we show conditions that allow for lower-bounding the performance of DAC-MDP solutions. We also investigate the empirical behavior in a number of environments, including those with imagebased observations. Overall, the experiments demonstrate that the framework can work in practice and scale to large complex offline RL problems. Research in automated planning and control has produced powerful algorithms to solve for optimal, or near-optimal, decisions given accurate environment models. Examples include the classic valueand policy-iteration algorithms for tabular representations or more sophisticated symbolic variants for graphical model representations (e.g. In concept, these planners address many of the traditional challenges in reinforcement learning (RL). They can perform "zero-shot transfer" to new goals and changes to the environment model, accurately account for sparse reward or low-probability events, and solve for different optimization objectives (e.g. Effectively leveraging these planners, however, requires an accurate model grounded in observations and expressed in the planner's representation. On the other hand, model-based reinforcement learning (MBRL) aims to learn grounded models to improve RL's data efficiency.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2010.08891

Country: North America > United States > Oregon > Benton County > Corvallis (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Marjani, Aymen Al, Proutiere, Alexandre

Adaptive Sampling for Best Policy Identification in Markov Decision Processes

arXiv.org Machine LearningOct-16-2020

We investigate the problem of best-policy identification in discounted Markov Decision Processes (MDPs) when the learner has access to a generative model. The objective is to devise a learning algorithm returning the best policy as early as possible. We first derive a problem-specific lower bound of the sample complexity satisfied by any learning algorithm. This lower bound corresponds to an optimal sample allocation that solves a non-convex program, and hence, is hard to exploit in the design of efficient algorithms. We then provide a simple and tight upper bound of the sample complexity lower bound, whose corresponding nearly-optimal sample allocation becomes explicit. The upper bound depends on specific functionals of the MDP such as the sub-optimality gaps and the variance of the next-state value function, and thus really captures the hardness of the MDP. Finally, we devise KLB-TS (KL Ball Track-and-Stop), an algorithm tracking this nearly-optimal allocation, and provide asymptotic guarantees for its sample complexity (both almost surely and in expectation). The advantages of KLB-TS against state-of-the-art algorithms are discussed and illustrated numerically.

artificial intelligence, machine learning, sample complexity, (16 more...)

2009.13405

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.60)

Zhou, Huozhi, Chen, Jinglin, Varshney, Lav R., Jagmohan, Ashish

Nonstationary Reinforcement Learning with Linear Function Approximation

arXiv.org Machine LearningOct-15-2020

We consider reinforcement learning (RL) in episodic Markov decision processes (MDPs) with linear function approximation under drifting environment. Specifically, both the reward and state transition functions can evolve over time, as long as their respective total variations, quantified by suitable metrics, do not exceed certain \textit{variation budgets}. We first develop the $\texttt{LSVI-UCB-Restart}$ algorithm, an optimistic modification of least-squares value iteration combined with periodic restart, and establish its dynamic regret bound when variation budgets are known. We then propose a parameter-free algorithm, $\texttt{Ada-LSVI-UCB-Restart}$, that works without knowing the variation budgets, but with a slightly worse dynamic regret bound. We also derive the first minimax dynamic regret lower bound for nonstationary MDPs to show that our proposed algorithms are near-optimal. As a byproduct, we establish a minimax regret lower bound for linear MDPs, which is unsolved by \cite{jin2020provably}. In addition, we provide numerical experiments to demonstrate the effectiveness of our proposed algorithms. As far as we know, this is the first dynamic regret analysis in nonstationary reinforcement learning with function approximation.

dynamic regret, machine learning, reinforcement learning, (17 more...)

2010.04244

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Illinois > Champaign County > Urbana (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.58)
Information Technology (0.46)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

arXiv.org Machine LearningOct-15-2020

QR-MIX: Distributional Value Function Factorisation for Cooperative Multi-Agent Reinforcement Learning

Hu, Jian, Harding, Seth Austin, Wu, Haibin, Hu, Siyue, Liao, Shih-wei

In Cooperative Multi-Agent Reinforcement Learning (MARL) and under the setting of Centralized Training with Decentralized Execution (CTDE), agents observe and interact with their environment locally and independently. With local observation and random sampling, the randomness in rewards and observations leads to randomness in long-term returns. Existing methods such as Value Decomposition Network (VDN) and QMIX estimate the value of long-term returns as a scalar that does not contain the information of randomness. Our proposed model QR-MIX introduces quantile regression, modeling joint state-action values as a distribution, combining QMIX with Implicit Quantile Network (IQN). However, the monotonicity in QMIX limits the expression of joint state-action value distribution and may lead to incorrect estimation results in non-monotonic cases. Therefore, we proposed a flexible loss function to approximate the monotonicity found in QMIX. Our model is not only more tolerant of the randomness of returns, but also more tolerant of the randomness of monotonic constraints. The experimental results demonstrate that QR-MIX outperforms the previous state-of-the-art method QMIX in the StarCraft Multi-Agent Challenge (SMAC) environment.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

2009.04197

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Asia > Taiwan (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceOct-15-2020

Recurrent Distributed Reinforcement Learning for Partially Observable Robotic Assembly

Luo, Jieliang, Li, Hui

In this work we solve for partially observable reinforcement learning (RL) environments by adding recurrency. We focus on partially observable robotic assembly tasks in the continuous action domain, with force/torque sensing being the only observation. We have developed a new distributed RL agent, named Recurrent Distributed DDPG (RD2), which adds a recurrent neural network layer to Ape-X DDPG and makes two important improvements on prioritized experience replay to stabilize training. We demonstrate the effectiveness of RD2 on a variety of joint assembly tasks and a partially observable version of the pendulum task from OpenAI Gym. Our results show that RD2 is able to achieve better performance than Ape-X DDPG and PPO with LSTM on partially observable tasks with varying complexity. We also show that the trained models adapt well to different initial states and different types of noise injected in the simulated environment. The video presenting our experiments is available at https://sites.google.com/view/rd2-rl

artificial intelligence, assembly task, machine learning, (18 more...)

2010.08052

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.54)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Diao, Tina, Singla, Samriddhi, Mukhopadhyay, Ayan, Eldawy, Ahmed, Shachter, Ross, Kochenderfer, Mykel

Uncertainty Aware Wildfire Management

arXiv.org Artificial IntelligenceOct-15-2020

Recent wildfires in the United States have resulted in loss of life and billions of dollars, destroying countless structures and forests. Fighting wildfires is extremely complex. It is difficult to observe the true state of fires due to smoke and risk associated with ground surveillance. There are limited resources to be deployed over a massive area and the spread of the fire is challenging to predict. This paper proposes a decision-theoretic approach to combat wildfires. We model the resource allocation problem as a partially-observable Markov decision process. We also present a data-driven model that lets us simulate how fires spread as a function of relevant covariates. A major problem in using data-driven models to combat wildfires is the lack of comprehensive data sources that relate fires with relevant covariates. We present an algorithmic approach based on large-scale raster and vector analysis that can be used to create such a dataset. Our data with over 2 million data points is the first open-source dataset that combines existing fire databases with covariates extracted from satellite imagery. Through experiments using real-world wildfire data, we demonstrate that our forecasting model can accurately model the spread of wildfires. Finally, we use simulations to demonstrate that our response strategy can significantly reduce response times compared to baseline methods.

artificial intelligence, machine learning, wildfire, (19 more...)

2010.07915

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.05)
North America > United States > California > Riverside County > Riverside (0.04)
North America > United States > Rocky Mountains (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.93)