AITopics

2101.03735

Country:

Europe > Netherlands > North Brabant > Eindhoven (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
(2 more...)

Genre: Research Report (0.49)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.92)
Health & Medicine > Therapeutic Area > Immunology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Zhang, Shangtong, Wan, Yi, Sutton, Richard S., Whiteson, Shimon

Average-Reward Off-Policy Policy Evaluation with Function Approximation

arXiv.org Artificial IntelligenceJan-7-2021

We consider off-policy policy evaluation with function approximation (FA) in average-reward MDPs, where the goal is to estimate both the reward rate and the differential value function. For this problem, bootstrapping is necessary and, along with off-policy learning and FA, results in the deadly triad (Sutton & Barto, 2018). To address the deadly triad, we propose two novel algorithms, reproducing the celebrated success of Gradient TD algorithms in the average-reward setting. In terms of estimating the differential value function, the algorithms are the first convergent off-policy linear function approximation algorithms. In terms of estimating the reward rate, the algorithms are the first convergent off-policy linear function approximation algorithms that do not require estimating the density ratio. We demonstrate empirically the advantage of the proposed algorithms, as well as their nonlinear variants, over a competitive density-ratio-based approach, in a simple domain as well as challenging robot simulation tasks.

algorithm, average-reward off-policy policy evaluation, function approximation, (10 more...)

2101.02808

Country:

North America > Canada > Alberta (0.14)
North America > United States > Massachusetts (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
(2 more...)

Mollenhauer, Mattes, Koltai, Péter

Nonparametric approximation of conditional expectation operators

arXiv.org Machine LearningJan-7-2021

Given the joint distribution of two random variables $X,Y$ on some second countable locally compact Hausdorff space, we investigate the statistical approximation of the $L^2$-operator defined by $[Pf](x) := \mathbb{E}[ f(Y) \mid X = x ]$ under minimal assumptions. By modifying its domain, we prove that $P$ can be arbitrarily well approximated in operator norm by Hilbert--Schmidt operators acting on a reproducing kernel Hilbert space. This fact allows to estimate $P$ uniformly by finite-rank operators over a dense subspace even when $P$ is not compact. In terms of modes of convergence, we thereby obtain the superiority of kernel-based techniques over classically used parametric projection approaches such as Galerkin methods. This also provides a novel perspective on which limiting object the nonparametric estimate of $P$ converges to. As an application, we show that these results are particularly important for a large family of spectral analysis techniques for Markov transition operators. Our investigation also gives a new asymptotic perspective on the so-called kernel conditional mean embedding, which is the theoretical foundation of a wide variety of techniques in kernel-based nonparametric inference.

approximation, assumption, operator, (15 more...)

2012.12917

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Massachusetts (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

arXiv.org Artificial IntelligenceJan-6-2021

Artificial Intelligence Methods in In-Cabin Use Cases: A Survey

Rong, Yao, Han, Chao, Hellert, Christian, Loyal, Antje, Kasneci, Enkelejda

As interest in autonomous driving increases, efforts are being made to meet requirements for the high-level automation of vehicles. In this context, the functionality inside the vehicle cabin plays a key role in ensuring a safe and pleasant journey for driver and passenger alike. At the same time, recent advances in the field of artificial intelligence (AI) have enabled a whole range of new applications and assistance systems to solve automated problems in the vehicle cabin. This paper presents a thorough survey on existing work that utilizes AI methods for use-cases inside the driving cabin, focusing, in particular, on application scenarios related to (1) driving safety and (2) driving comfort. Results from the surveyed works show that AI technology has a promising future in tackling in-cabin tasks within the autonomous driving aspect.

detection, participant, vehicle, (15 more...)

2101.02082

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States > Michigan (0.04)

Genre:

Overview (0.87)
Research Report > New Finding (0.46)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (1.00)
Health & Medicine (1.00)
(2 more...)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
(5 more...)

Wang, Tianhao, Zhou, Dongruo, Gu, Quanquan

Provably Efficient Reinforcement Learning with Linear Function Approximation Under Adaptivity Constraints

arXiv.org Machine LearningJan-6-2021

We study reinforcement learning (RL) with linear function approximation under the adaptivity constraint. We consider two popular limited adaptivity models: batch learning model and rare policy switch model, and propose two efficient online RL algorithms for linear Markov decision processes. In specific, for the batch learning model, our proposed LSVI-UCB-Batch algorithm achieves an $\tilde O(\sqrt{d^3H^3T} + dHT/B)$ regret, where $d$ is the dimension of the feature mapping, $H$ is the episode length, $T$ is the number of interactions and $B$ is the number of batches. Our result suggests that it suffices to use only $\sqrt{T/dH}$ batches to obtain $\tilde O(\sqrt{d^3H^3T})$ regret. For the rare policy switch model, our proposed LSVI-UCB-RareSwitch algorithm enjoys an $\tilde O(\sqrt{d^3H^3T[1+T/(dH)]^{dH/B}})$ regret, which implies that $dH\log T$ policy switches suffice to obtain the $\tilde O(\sqrt{d^3H^3T})$ regret. Our algorithms achieve the same regret as the LSVI-UCB algorithm (Jin et al., 2019), yet with a substantially smaller amount of adaptivity.

algorithm, batch, rare policy switch model, (11 more...)

2101.02195

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Asia > Middle East > Jordan (0.04)
North America > United States > Connecticut > New Haven County > New Haven (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.78)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

arXiv.org Artificial IntelligenceJan-5-2021

Meta Variationally Intrinsic Motivated Reinforcement Learning for Decentralized Traffic Signal Control

Zhu, Liwen, Peng, Peixi, Lu, Zongqing, Wang, Xiangqian, Tian, Yonghong

The goal of traffic signal control is to coordinate multiple traffic signals to improve the traffic efficiency of a district or a city. In this work, we propose a novel Meta Variationally Intrinsic Motivated (MetaVIM) RL method, and aim to learn the decentralized polices of each traffic signal only conditioned on its local observation. MetaVIM makes three novel contributions. Firstly, to make the model available to new unseen target scenarios, we formulate the traffic signal control as a meta-learning problem over a set of related tasks. The train scenario is divided as multiple partially observable Markov decision process (POMDP) tasks, and each task corresponds to a traffic light. In each task, the neighbours are regarded as an unobserved part of the state. Secondly, we assume that the reward, transition and policy functions vary across different tasks but share a common structure, and a learned latent variable conditioned on the past trajectories is proposed for each task to represent the specific information of the current task in these functions, then is further brought into policy for automatically trade off between exploration and exploitation to induce the RL agent to choose the reasonable action. In addition, to make the policy learning stable, four decoders are introduced to predict the received observations and rewards of the current agent with/without neighbour agents' policies, and a novel intrinsic reward is designed to encourage the received observation and reward invariant to the neighbour agents. Empirically, extensive experiments conducted on CityFlow demonstrate that the proposed method substantially outperforms existing methods and shows superior generalizability.

agent, intersection, traffic signal control, (14 more...)

2101.00746

Country:

Asia > China > Guangdong Province > Shenzhen (0.05)
Asia > China > Zhejiang Province > Hangzhou (0.05)
North America > United States > New York (0.05)
(2 more...)

Genre: Research Report (0.82)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Artificial IntelligenceJan-4-2021

Improving Training Result of Partially Observable Markov Decision Process by Filtering Beliefs

Hsu, Oscar LiJen

In this study I proposed a filtering beliefs method for improving performance of Partially Observable Markov Decision Processes(POMDPs), which is a method wildly used in autonomous robot and many other domains concerning control policy. My method search and compare every similar belief pair. Because a similar belief have insignificant influence on control policy, the belief is filtered out for reducing training time. The empirical results show that the proposed method outperforms the point-based approximate POMDPs in terms of the quality of training results as well as the efficiency of the method.

control policy, sample belief, vector, (15 more...)

2101.02178

Genre: Research Report > New Finding (0.57)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Thul, Lawrence, Powell, Warren

Stochastic Optimization for Vaccine and Testing Kit Allocation for the COVID-19 Pandemic

arXiv.org Artificial IntelligenceJan-4-2021

The pandemic caused by the SARS-CoV-2 virus has exposed many flaws in the decision-making strategies used to distribute resources to combat global health crises. In this paper, we leverage reinforcement learning and optimization to improve upon the allocation strategies for various resources. In particular, we consider a problem where a central controller must decide where to send testing kits to learn about the uncertain states of the world (active learning); then, use the new information to construct beliefs about the states and decide where to allocate resources. We propose a general model coupled with a tunable lookahead policy for making vaccine allocation decisions without perfect knowledge about the state of the world. The lookahead policy is compared to a population-based myopic policy which is more likely to be similar to the present strategies in practice. Each vaccine allocation policy works in conjunction with a testing kit allocation policy to perform active learning. Our simulation results demonstrate that an optimization-based lookahead decision making strategy will outperform the presented myopic policy.

belief state, equation, probability, (16 more...)

2101.01204

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > District of Columbia > Washington (0.04)
North America > United States > Wyoming (0.04)
(4 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Therapeutic Area > Vaccines (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Tung, Tze-Yang, Pujol, Joan Roig, Kobus, Szymon, Gunduz, Deniz

A Joint Learning and Communication Framework for Multi-Agent Reinforcement Learning over Noisy Channels

arXiv.org Artificial IntelligenceJan-2-2021

We propose a novel formulation of the "effectiveness problem" in communications, put forth by Shannon and Weaver in their seminal work [2], by considering multiple agents communicating over a noisy channel in order to achieve better coordination and cooperation in a multi-agent reinforcement learning (MARL) framework. Specifically, we consider a multi-agent partially observable Markov decision process (MA-POMDP), in which the agents, in addition to interacting with the environment can also communicate with each other over a noisy communication channel. The noisy communication channel is considered explicitly as part of the dynamics of the environment and the message each agent sends is part of the action that the agent can take. As a result, the agents learn not only to collaborate with each other but also to communicate "effectively" over a noisy channel. This framework generalizes both the traditional communication problem, where the main goal is to convey a message reliably over a noisy channel, and the "learning to communicate" framework that has received recent attention in the MARL literature, where the underlying communication channels are assumed to be error-free. We show via examples that the joint policy learned using the proposed framework is superior to that where the communication is considered separately from the underlying MA-POMDP. This is a very powerful framework, which has many real world applications, from autonomous vehicle planning to drone swarm control, and opens up the rich toolbox of deep reinforcement learning for the design of multi-user communication systems. This work was supported in part by the European Research Council (ERC) Starting Grant BEACON (grant agreement no. An earlier version of this work was presented at the IEEE Global Communications Conference (GLOBECOM) in December 2020 [1]. Communication is essential for our society. Humans use language to communicate ideas, which has given rise to complex social structures, and scientists have observed either gestural or vocal communication in other animal groups, complexity of which increases with the complexity of the social structure of the group [3].

agent, communication, communication channel, (15 more...)

2101.10369

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Machine LearningJan-2-2021

A Provably Efficient Algorithm for Linear Markov Decision Process with Low Switching Cost

Gao, Minbo, Xie, Tianle, Du, Simon S., Yang, Lin F.

Many real-world applications, such as those in medical domains, recommendation systems, etc, can be formulated as large state space reinforcement learning problems with only a small budget of the number of policy changes, i.e., low switching cost. This paper focuses on the linear Markov Decision Process (MDP) recently studied in [Yang et al 2019, Jin et al 2020] where the linear function approximation is used for generalization on the large state space. We present the first algorithm for linear MDP with a low switching cost. Our algorithm achieves an $\widetilde{O}\left(\sqrt{d^3H^4K}\right)$ regret bound with a near-optimal $O\left(d H\log K\right)$ global switching cost where $d$ is the feature dimension, $H$ is the planning horizon and $K$ is the number of episodes the agent plays. Our regret bound matches the best existing polynomial algorithm by [Jin et al 2020] and our switching cost is exponentially smaller than theirs. When specialized to tabular MDP, our switching cost bound improves those in [Bai et al 2019, Zhang et al 20020]. We complement our positive result with an $\Omega\left(dH/\log d\right)$ global switching cost lower bound for any no-regret algorithm.

algorithm, global switching cost, switching cost, (12 more...)

2101.00494

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(2 more...)

Genre:

Research Report (0.64)
Workflow (0.46)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)