AITopics

2009.04203

Country:

Asia > China > Beijing > Beijing (0.05)
North America > United States (0.04)
Asia > Thailand (0.04)

Genre: Research Report (0.64)

Industry:

Government > Military (1.00)
Health & Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.75)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.69)

Schamberg, Gabe, Badgeley, Marcus, Brown, Emery N.

Controlling Level of Unconsciousness by Titrating Propofol with Deep Reinforcement Learning

arXiv.org Artificial IntelligenceSep-9-2020

Reinforcement Learning (RL) can be used to fit a mapping from patient state to a medication regimen. Prior studies have used deterministic and value-based tabular learning to learn a propofol dose from an observed anesthetic state. Deep RL replaces the table with a deep neural network and has been used to learn medication regimens from registry databases. Here we perform the first application of deep RL to closed-loop control of anesthetic dosing in a simulated environment. We use the cross-entropy method to train a deep neural network to map an observed anesthetic state to a probability of infusing a fixed propofol dosage. During testing, we implement a deterministic policy that transforms the probability of infusion to a continuous infusion rate. The model is trained and tested on simulated pharmacokinetic/pharmacodynamic models with randomized parameters to ensure robustness to patient variability. The deep RL agent significantly outperformed a proportional-integral-derivative controller (median absolute performance error 1.7% +/- 0.6 and 3.4% +/- 1.2). Modeling continuous input variables instead of a table affords more robust pattern recognition and utilizes our prior domain knowledge. Deep RL learned a smooth policy with a natural interpretation to data scientists and anesthesia care providers alike.

controller, machine learning, reinforcement learning, (18 more...)

2008.12333

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area (0.68)
Health & Medicine > Health Care Providers & Services (0.68)
Health & Medicine > Surgery (0.52)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Machine LearningSep-9-2020

Phasic Policy Gradient

Cobbe, Karl, Hilton, Jacob, Klimov, Oleg, Schulman, John

We introduce Phasic Policy Gradient (PPG), a reinforcement learning framework which modifies traditional on-policy actor-critic methods by separating policy and value function training into distinct phases. In prior methods, one must choose between using a shared network or separate networks to represent the policy and value function. Using separate networks avoids interference between objectives, while using a shared network allows useful features to be shared. PPG is able to achieve the best of both worlds by splitting optimization into two phases, one that advances training and one that distills features. PPG also enables the value function to be more aggressively optimized with a higher level of sample reuse. Compared to PPO, we find that PPG significantly improves sample efficiency on the challenging Procgen Benchmark.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

2009.04416

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.52)

Alvarez, Victor M. Martinez, Roşca, Rareş, Fălcuţescu, Cristian G.

DyNODE: Neural Ordinary Differential Equations for Dynamics Modeling in Continuous Control

arXiv.org Machine LearningSep-9-2020

We present a novel approach (DyNODE) that captures the underlying dynamics of a system by incorporating control in a neural ordinary differential equation framework. We conduct a systematic evaluation and comparison of our method and standard neural network architectures for dynamics modeling. Our results indicate that a simple DyNODE architecture when combined with an actor-critic reinforcement learning (RL) algorithm that uses model predictions to improve the critic's target values, outperforms canonical neural networks, both in sample efficiency and predictive performance across a diverse range of continuous tasks that are frequently used to benchmark RL algorithms. This approach provides a new avenue for the development of models that are more suited to learn the evolution of dynamical systems, particularly useful in the context of model-based reinforcement learning. To assist related work, we have made code available at https://github.com/vmartinezalvarez/DyNODE .

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

2009.04278

Country:

Europe > Romania > Nord-Vest Development Region > Cluj County > Cluj-Napoca (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Swazinna, Phillip, Udluft, Steffen, Runkler, Thomas

Overcoming Model Bias for Robust Offline Deep Reinforcement Learning

arXiv.org Machine LearningSep-9-2020

State-of-the-art reinforcement learning algorithms mostly rely on being allowed to directly interact with their environment to collect millions of observations. This makes it hard to transfer their success to industrial control problems, where simulations are often very costly or do not exist, and exploring in the real environment can potentially lead to catastrophic events. Recently developed, model-free, offline algorithms, can learn from a single dataset by mitigating extrapolation error in value functions. However, the robustness of the training process is still comparatively low, a problem known from methods using value functions. To improve robustness and stability of the learning process, we use dynamics models to assess policy performance instead of value functions, resulting in MOOSE (MOdel-based Offline policy Search with Ensembles), an algorithm which ensures low model bias by keeping the policy within the support of the data. We compare MOOSE with state-of-the-art model-free, offline RL algorithms BEAR and BCQ on the Industrial Benchmark and Mujoco continuous control tasks in terms of robust performance, and find that MOOSE outperforms its model-free counterparts in almost all considered cases, often even by far.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

2008.05533

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry:

Energy (0.46)
Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

#artificialintelligenceSep-8-2020, 08:16:31 GMT

How Do Companies Use ML to Stay Ahead in the Competition

Reinforcement learning– Defined as the training of machine learning models to make a sequence of decisions, by trial and error.

artificial intelligence, company use ml, reinforcement learning, (2 more...)

#artificialintelligence

Industry: Media > News (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)

Furelos-Blanco, Daniel, Law, Mark, Jonsson, Anders, Broda, Krysia, Russo, Alessandra

Induction and Exploitation of Subgoal Automata for Reinforcement Learning

In this paper we present ISA, an approach for learning and exploiting subgoals in episodic reinforcement learning (RL) tasks. ISA interleaves reinforcement learning with the induction of a subgoal automaton, an automaton whose edges are labeled by the task's subgoals expressed as propositional logic formulas over a set of high-level events. A subgoal automaton also consists of two special states: a state indicating the successful completion of the task, and a state indicating that the task has finished without succeeding. A state-of-the-art inductive logic programming system is used to learn a subgoal automaton that covers the traces of high-level events observed by the RL agent. When the currently exploited automaton does not correctly recognize a trace, the automaton learner induces a new automaton that covers that trace. The interleaving process guarantees the induction of automata with the minimum number of states, and applies a symmetry breaking mechanism to shrink the search space whilst remaining complete. We evaluate ISA in several grid-world and continuous state space problems using different RL algorithms that leverage the automaton structures. We provide an in-depth empirical analysis of the automaton learning process performance in terms of the traces, the symmetric breaking and specific restrictions imposed on the final learnable automaton. For each class of RL problem, we show that the learned automata can be successfully exploited to learn policies that reach the goal, achieving an average reward comparable to the case where automata are not learned but handcrafted and given beforehand.

logic & formal reasoning, machine learning, reinforcement learning, (18 more...)

2009.03855

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Spain (0.04)

Genre:

Research Report (0.50)
Workflow (0.45)

Industry: Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Upadhyaya, Pratheek S., Shah, Vijay K., Reed, Jeffrey H.

Cross-layer Band Selection and Routing Design for Diverse Band-aware DSA Networks

As several new spectrum bands are opening up for shared use, a new paradigm of \textit{Diverse Band-aware Dynamic Spectrum Access} (d-DSA) has emerged. d-DSA equips a secondary device with software defined radios (SDRs) and utilize whitespaces (or idle channels) in \textit{multiple bands}, including but not limited to TV, LTE, Citizen Broadband Radio Service (CBRS), unlicensed ISM. In this paper, we propose a decentralized, online multi-agent reinforcement learning based cross-layer BAnd selection and Routing Design (BARD) for such d-DSA networks. BARD not only harnesses whitespaces in multiple spectrum bands, but also accounts for unique electro-magnetic characteristics of those bands to maximize the desired quality of service (QoS) requirements of heterogeneous message packets; while also ensuring no harmful interference to the primary users in the utilized band. Our extensive experiments demonstrate that BARD outperforms the baseline dDSAaR algorithm in terms of message delivery ratio, however, at a relatively higher network latency, for varying number of primary and secondary users. Furthermore, BARD greatly outperforms its single-band DSA variants in terms of both the metrics in all considered scenarios.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

2009.03821

Country: North America > United States > Virginia > Montgomery County > Blacksburg (0.04)

Genre: Research Report (0.40)

Industry:

Telecommunications (0.95)
Media > Radio (0.68)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

van der Himst, Otto, Lanillos, Pablo

Deep Active Inference for Partially Observable MDPs

Deep active inference has been proposed as a scalable approach to perception and action that deals with large policy and state spaces. However, current models are limited to fully observable domains. In this paper, we describe a deep active inference model that can learn successful policies directly from high-dimensional sensory inputs. The deep learning architecture optimizes a variant of the expected free energy and encodes the continuous state representation by means of a variational autoencoder. We show, in the OpenAI benchmark, that our approach has comparable or better performance than deep Q-learning, a state-of-the-art deep reinforcement learning algorithm.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

2009.03622

Country: Europe > Netherlands > Gelderland > Nijmegen (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Evolutionary Reinforcement Learning via Cooperative Coevolutionary Negatively Correlated Search

Zhang, Hu, Yang, Peng, Yu, Yanglong, Li, Mingjia, Tang, Ke

Evolutionary algorithms (EAs) have been successfully applied to optimize the policies for Reinforcement Learning (RL) tasks due to their exploration ability. The recently proposed Negatively Correlated Search (NCS) provides a distinct parallel exploration search behavior and is expected to facilitate RL more effectively. Considering that the commonly adopted neural policies usually involves millions of parameters to be optimized, the direct application of NCS to RL may face a great challenge of the large-scale search space. To address this issue, this paper presents an NCS-friendly Cooperative Coevolution (CC) framework to scale-up NCS while largely preserving its parallel exploration search behavior. The issue of traditional CC that can deteriorate NCS is also discussed. Empirical studies on 10 popular Atari games show that the proposed method can significantly outperform three state-of-the-art deep RL methods with 50% less computational time by effectively exploring a 1.7 million-dimensional search space.

doi: 10.1016/j.swevo.2021.100974

2009.03603

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Computer Games (0.53)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)