AITopics

2006.12484

Country: North America > United States > New York (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret

Fei, Yingjie, Yang, Zhuoran, Chen, Yudong, Wang, Zhaoran, Xie, Qiaomin

We study risk-sensitive reinforcement learning in episodic Markov decision processes with unknown transition kernels, where the goal is to optimize the total reward under the risk measure of exponential utility. We propose two provably efficient model-free algorithms, Risk-Sensitive Value Iteration (RSVI) and Risk-Sensitive Q-learning (RSQ). These algorithms implement a form of risk-sensitive optimism in the face of uncertainty, which adapts to both risk-seeking and risk-averse modes of exploration. We prove that RSVI attains an $\tilde{O}\big(\lambda(|\beta| H^2) \cdot \sqrt{H^{3} S^{2}AT} \big)$ regret, while RSQ attains an $\tilde{O}\big(\lambda(|\beta| H^2) \cdot \sqrt{H^{4} SAT} \big)$ regret, where $\lambda(u) = (e^{3u}-1)/u$ for $u>0$. In the above, $\beta$ is the risk parameter of the exponential utility function, $S$ the number of states, $A$ the number of actions, $T$ the total number of timesteps, and $H$ the episode length. On the flip side, we establish a regret lower bound showing that the exponential dependence on $|\beta|$ and $H$ is unavoidable for any algorithm with an $\tilde{O}(\sqrt{T})$ regret (even when the risk objective is on the same scale as the original reward), thus certifying the near-optimality of the proposed algorithms. Our results demonstrate that incorporating risk awareness into reinforcement learning necessitates an exponential cost in $|\beta|$ and $H$, which quantifies the fundamental tradeoff between risk sensitivity (related to aleatoric uncertainty) and sample efficiency (related to epistemic uncertainty). To the best of our knowledge, this is the first regret analysis of risk-sensitive reinforcement learning with the exponential utility.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2006.13827

Country:

Asia > Middle East > Jordan (0.04)
Europe > Netherlands > South Holland > Delft (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Ghosh, Soumya, Stephenson, William T., Nguyen, Tin D., Deshpande, Sameer K., Broderick, Tamara

Approximate Cross-Validation for Structured Models

Many modern data analyses benefit from explicitly modeling dependence structure in data - such as measurements across time or space, ordered words in a sentence, or genes in a genome. A gold standard evaluation technique is structured cross-validation (CV), which leaves out some data subset (such as data within a time interval or data in a geographic region) in each fold. But CV here can be prohibitively slow due to the need to rerun already-expensive learning algorithms many times. Previous work has shown approximate cross-validation (ACV) methods provide a fast and provably accurate alternative in the setting of empirical risk minimization. But this existing ACV work is restricted to simpler models by the assumptions that (i) data across CV folds are independent and (ii) an exact initial model fit is available. In structured data analyses, both these assumptions are often untrue. In the present work, we address (i) by extending ACV to CV schemes with dependence structure between the folds. To address (ii), we verify - both theoretically and empirically - that ACV quality deteriorates smoothly with noise in the initial fit. We demonstrate the accuracy and computational benefits of our proposed methods on a diverse set of real-world applications.

approximation, artificial intelligence, machine learning, (19 more...)

2006.12669

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
(4 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
(3 more...)

Wang, Lingxiao, Yang, Zhuoran, Wang, Zhaoran

Provably Efficient Causal Reinforcement Learning with Confounded Observational Data

Empowered by expressive function approximators such as neural networks, deep reinforcement learning (DRL) achieves tremendous empirical successes. However, learning expressive function approximators requires collecting a large dataset (interventional data) by interacting with the environment. Such a lack of sample efficiency prohibits the application of DRL to critical scenarios, e.g., autonomous driving and personalized medicine, since trial and error in the online setting is often unsafe and even unethical. In this paper, we study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting. To incorporate the possibly confounded observational data, we propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner. More specifically, DOVI explicitly adjusts for the confounding bias in the observational data, where the confounders are partially observed or unobserved. In both cases, such adjustments allow us to construct the bonus based on a notion of information gain, which takes into account the amount of information acquired from the offline setting. In particular, we prove that the regret of DOVI is smaller than the optimal regret achievable in the pure online setting by a multiplicative factor, which decreases towards zero when the confounded observational data are more informative upon the adjustments. Our algorithm and analysis serve as a step towards causal reinforcement learning.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

2006.12311

Country:

Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Hälvä, Hermanni, Hyvärinen, Aapo

Hidden Markov Nonlinear ICA: Unsupervised Learning from Nonstationary Time Series

Recent advances in nonlinear Independent Component Analysis (ICA) provide a principled framework for unsupervised feature learning and disentanglement. The central idea in such works is that the latent components are assumed to be independent conditional on some observed auxiliary variables, such as the time-segment index. This requires manual segmentation of data into non-stationary segments which is computationally expensive, inaccurate and often impossible. These models are thus not fully unsupervised. We remedy these limitations by combining nonlinear ICA with a Hidden Markov Model, resulting in a model where a latent state acts in place of the observed segment-index. We prove identifiability of the proposed model for a general mixing nonlinearity, such as a neural network. We also show how maximum likelihood estimation of the model can be done using the expectation-maximization algorithm. Thus, we achieve a new nonlinear ICA framework which is unsupervised, more efficient, as well as able to model underlying temporal dynamics.

artificial intelligence, latent state, machine learning, (16 more...)

2006.12107

Country:

Asia > Japan > Honshū > Tōhoku > Iwate Prefecture > Morioka (0.05)
Europe > Finland > Uusimaa > Helsinki (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Cuervo, Santiago, Alzate, Marco

Emergent cooperation through mutual information maximization

arXiv.org Artificial IntelligenceJun-21-2020

With artificial intelligence systems becoming ubiquitous in our society, its designers will soon have to start to consider its social dimension, as many of these systems will have to interact among them to work efficiently. With this in mind, we propose a decentralized deep reinforcement learning algorithm for the design of cooperative multi-agent systems. The algorithm is based on the hypothesis that highly correlated actions are a feature of cooperative systems, and hence, we propose the insertion of an auxiliary objective of maximization of the mutual information between the actions of agents in the learning problem. Our system is applied to a social dilemma, a problem whose optimal solution requires that agents cooperate to maximize a macroscopic performance function despite the divergent individual objectives of each agent. By comparing the performance of the proposed system to a system without the auxiliary objective, we conclude that the maximization of mutual information among agents promotes the emergence of cooperation in social dilemmas.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2006.11769

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Oceania > Australia (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(4 more...)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Pacchiano, Aldo, Ball, Philip, Parker-Holder, Jack, Choromanski, Krzysztof, Roberts, Stephen

On Optimism in Model-Based Reinforcement Learning

arXiv.org Machine LearningJun-21-2020

The principle of optimism in the face of uncertainty is prevalent throughout sequential decision making problems such as multi-armed bandits and reinforcement learning (RL), often coming with strong theoretical guarantees. However, it remains a challenge to scale these approaches to the deep RL paradigm, which has achieved a great deal of attention in recent years. In this paper, we introduce a tractable approach to optimism via noise augmented Markov Decision Processes (MDPs), which we show can obtain a competitive regret bound: $\tilde{\mathcal{O}}( |\mathcal{S}|H\sqrt{|\mathcal{S}||\mathcal{A}| T } )$ when augmenting using Gaussian noise, where $T$ is the total number of environment steps. This tractability allows us to apply our approach to the deep RL setting, where we rigorously evaluate the key factors for success of optimistic model-based RL algorithms, bridging the gap between theory and practice.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

2006.11911

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
(12 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Krishnamurthy, Vikram, Angley, Daniel, Evans, Robin, Moran, William

Identifying Cognitive Radars -- Inverse Reinforcement Learning using Revealed Preferences

arXiv.org Artificial IntelligenceJun-20-2020

We consider an inverse reinforcement learning problem involving us versus an enemy radar equipped with a Bayesian tracker. By observing the emissions of the enemy radar,how can we identify if the radar is cognitive (constrained utility maximizer)? Given the observed sequence of actions taken by the enemy's radar, we consider three problems: (i) Are the enemy radar's actions (waveform choice, beam scheduling) consistent with constrained utility maximization? If so how can we estimate the cognitive radar's utility function that is consistent with its actions. We formulate and solve the problem in terms of the spectra (eigenvalues) of the state and observation noise covariance matrices, and the algebraic Riccati equation. (ii) How to construct a statistical test for detecting a cognitive radar (constrained utility maximization) when we observe the radar's actions in noise or the radar observes our probe signal in noise? We propose a statistical detector with a tight Type-II error bound. (iii) How can we optimally probe (interrogate) the enemy's radar by choosing our state to minimize the Type-II error of detecting if the radar is deploying an economic rational strategy, subject to a constraint on the Type-I detection error? We present a stochastic optimization algorithm to optimize our probe signal. The main analysis framework used in this paper is that of revealed preferences from microeconomics.

machine learning, radar, reinforcement learning, (20 more...)

doi: 10.1109/TSP.2020.3013516

1912.00331

Country:

North America > United States > New York (0.04)
North America > United States > New Jersey (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Cognitive Architectures (0.85)
(3 more...)

Krishnamurthy, Vikram, Yin, George

Langevin Dynamics for Inverse Reinforcement Learning of Stochastic Gradient Algorithms

arXiv.org Machine LearningJun-20-2020

Inverse reinforcement learning (IRL) aims to estimate the reward function of optimizing agents by observing their response (estimates or actions). This paper considers IRL when noisy estimates of the gradient of a reward function generated by multiple stochastic gradient agents are observed. We present a generalized Langevin dynamics algorithm to estimate the reward function $R(\theta)$; specifically, the resulting Langevin algorithm asymptotically generates samples from the distribution proportional to $\exp(R(\theta))$. The proposed IRL algorithms use kernel-based passive learning schemes. We also construct multi-kernel passive Langevin algorithms for IRL which are suitable for high dimensional data. The performance of the proposed IRL algorithms are illustrated on examples in adaptive Bayesian learning, logistic regression (high dimensional problem) and constrained Markov decision processes. We prove weak convergence of the proposed IRL algorithms using martingale averaging methods. We also analyze the tracking performance of the IRL algorithms in non-stationary environments where the utility function $R(\theta)$ jump changes over time as a slow Markov chain.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2006.11674

Country:

North America > United States > New York (0.04)
North America > United States > Michigan > Wayne County > Detroit (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.90)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Xie, Annie, Harrison, James, Finn, Chelsea

Deep Reinforcement Learning amidst Lifelong Non-Stationarity

arXiv.org Artificial IntelligenceJun-18-2020

As humans, our goals and our environment are persistently changing throughout our lifetime based on our experiences, actions, and internal and external drives. In contrast, typical reinforcement learning problem set-ups consider decision processes that are stationary across episodes. Can we develop reinforcement learning algorithms that can cope with the persistent change in the former, more realistic problem settings? While on-policy algorithms such as policy gradients in principle can be extended to non-stationary settings, the same cannot be said for more efficient off-policy algorithms that replay past experiences when learning. In this work, we formalize this problem setting, and draw upon ideas from the online learning and probabilistic inference literature to derive an off-policy RL algorithm that can reason about and tackle such lifelong non-stationarity. Our method leverages latent variable models to learn a representation of the environment from current and past experiences, and performs off-policy RL with this representation. We further introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2006.10701

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > Massachusetts (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)

Genre: Research Report (0.82)

Industry: Education > Educational Setting (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)