AITopics

doi: 10.1613/jair.1.12105

AI Access Foundation

12105

Country:

North America > United States (0.28)
North America > Canada > Alberta (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre:

Workflow (0.46)
Overview (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Derman, Esther, Dalal, Gal, Mannor, Shie

Acting in Delayed Environments with Non-Stationary Markov Policies

arXiv.org Artificial IntelligenceJan-28-2021

The standard Markov Decision Process (MDP) formulation hinges on the assumption that an action is executed immediately after it was chosen. However, assuming it is often unrealistic and can lead to catastrophic failures in applications such as robotic manipulation, cloud computing, and finance. We introduce a framework for learning and planning in MDPs where the decision-maker commits actions that are executed with a delay of $m$ steps. The brute-force state augmentation baseline where the state is concatenated to the last $m$ committed actions suffers from an exponential complexity in $m$, as we show for policy iteration. We then prove that with execution delay, Markov policies in the original state-space are sufficient for attaining maximal reward, but need to be non-stationary. As for stationary Markov policies, we show they are sub-optimal in general. Consequently, we devise a non-stationary Q-learning style model-based algorithm that solves delayed execution tasks without resorting to state-augmentation. Experiments on tabular, physical, and Atari domains reveal that it converges quickly to high performance even for substantial delays, while standard approaches that either ignore the delay or rely on state-augmentation struggle or fail due to divergence. The code is available at https://github.com/galdl/rl_delay_basic.git.

conference paper, iclr 2021, ps 0, (16 more...)

2101.11992

Country: North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Goh, Wei Zhong, Ursekar, Varun, Howard, Marc W.

Predicting the future with a scale-invariant temporal memory for the past

arXiv.org Artificial IntelligenceJan-26-2021

In recent years it has become clear that the brain maintains a temporal memory of recent events stretching far into the past. This paper presents a neurally-inspired algorithm to use a scale-invariant temporal representation of the past to predict a scale-invariant future. The result is a scale-invariant estimate of future events as a function of the time at which they are expected to occur. The algorithm is time-local, with credit assigned to the present event by observing how it affects the prediction of the future. To illustrate the potential utility of this approach, we test the model on simultaneous renewal processes with different time scales. The algorithm scales well on these problems despite the fact that the number of states needed to describe them as a Markov process grows exponentially.

algorithm, event type, prediction, (17 more...)

2101.10953

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

arXiv.org Artificial IntelligenceJan-23-2021

Indoor Group Activity Recognition using Multi-Layered HMMs

Elangovan, Vinayak

Discovery and recognition of Group Activities (GA) based on imagery data processing have significant applications in persistent surveillance systems, which play an important role in some Internet services. The process is involved with analysis of sequential imagery data with spatiotemporal associations. Discretion of video imagery requires a proper inference system capable of discriminating and differentiating cohesive observations and interlinking them to known ontologies. We propose an Ontology based GAR with a proper inference model that is capable of identifying and classifying a sequence of events in group activities. A multi-layered Hidden Markov Model (HMM) is proposed to recognize different levels of abstract GA. The multi-layered HMM consists of N layers of HMMs where each layer comprises of M number of HMMs running in parallel. The number of layers depends on the order of information to be extracted. At each layer, by matching and correlating attributes of detected group events, the model attempts to associate sensory observations to known ontology perceptions. This paper demonstrates and compares performance of three different implementation of HMM, namely, concatenated N-HMM, cascaded C-HMM and hybrid H-HMM for building effective multi-layered HMM.

indoor group activity recognition, input sequence, sequence, (10 more...)

2101.10857

Country:

North America > United States > Tennessee (0.04)
North America > United States > Maryland > Baltimore (0.04)
North America > United States > Florida > Orange County > Orlando (0.04)

Genre: Research Report (0.50)

Industry: Information Technology (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.80)

Giacobbe, Mirco, Hasanbeig, Mohammadhosein, Kroening, Daniel, Wijk, Hjalmar

Shielding Atari Games with Bounded Prescience

arXiv.org Artificial IntelligenceJan-22-2021

Deep reinforcement learning (DRL) is applied in safety-critical domains such as robotics and autonomous driving. It achieves superhuman abilities in many tasks, however whether DRL agents can be shown to act safely is an open problem. Atari games are a simple yet challenging exemplar for evaluating the safety of DRL agents and feature a diverse portfolio of game mechanics. The safety of neural agents has been studied before using methods that either require a model of the system dynamics or an abstraction; unfortunately, these are unsuitable to Atari games because their low-level dynamics are complex and hidden inside their emulator. We present the first exact method for analysing and ensuring the safety of DRL agents for Atari games. Our method only requires access to the emulator. First, we give a set of 43 properties that characterise "safe behaviour" for 30 games. Second, we develop a method for exploring all traces induced by an agent and a game and consider a variety of sources of game non-determinism. We observe that the best available DRL agents reliably satisfy only very few properties; several critical properties are violated by all agents. Finally, we propose a countermeasure that combines a bounded explicit-state exploration with shielding. We demonstrate that our method improves the safety of all agents over multiple properties.

agent, algorithm, safety property, (14 more...)

2101.08153

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Hungary (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.94)

arXiv.org Artificial IntelligenceJan-21-2021

Robust Reinforcement Learning on State Observations with Learned Optimal Adversary

Zhang, Huan, Chen, Hongge, Boning, Duane, Hsieh, Cho-Jui

We study the robustness of reinforcement learning (RL) with adversarially perturbed state observations, which aligns with the setting of many adversarial attacks to deep reinforcement learning (DRL) and is also important for rolling out real-world RL agent under unpredictable sensing noise. With a fixed agent policy, we demonstrate that an optimal adversary to perturb state observations can be found, which is guaranteed to obtain the worst case agent reward. For DRL settings, this leads to a novel empirical adversarial attack to RL agents via a learned adversary that is much stronger than previous ones. To enhance the robustness of an agent, we propose a framework of alternating training with learned adversaries (ATLA), which trains an adversary online together with the agent using policy gradient following the optimal adversarial attack framework. Additionally, inspired by the analysis of state-adversarial Markov decision process (SA-MDP), we show that past states and actions (history) can be useful for learning a robust agent, and we empirically find a LSTM based policy can be more robust under adversaries. Empirical evaluations on a few continuous control environments show that ATLA achieves state-of-the-art performance under strong adversaries. Our code is available at https://github.com/huanzhang12/ATLA_robust_RL.

adversary, agent, atla-ppo, (12 more...)

2101.08452

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (0.90)
Government > Military (0.90)
Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Cao, Yongcan, Zhan, Huixin

Efficient Multi-objective Reinforcement Learning via Multiple-gradient Descent with Iteratively Discovered Weight-Vector Sets

Journal of Artificial Intelligence ResearchJan-20-2021

Solving multi-objective optimization problems is important in various applications where users are interested in obtaining optimal policies subject to multiple (yet often conflicting) objectives. A typical approach to obtain the optimal policies is to first construct a loss function based on the scalarization of individual objectives and then derive optimal policies that minimize the scalarized loss function. Albeit simple and efficient, the typical approach provides no insights/mechanisms on the optimization of multiple objectives due to the lack of ability to quantify the inter-objective relationship. To address the issue, we propose to develop a new efficient gradient-based multi-objective reinforcement learning approach that seeks to iteratively uncover the quantitative inter-objective relationship via finding a minimum-norm point in the convex hull of the set of multiple policy gradients when the impact of one objective on others is unknown a priori. In particular, we first propose a new PAOLS algorithm that integrates pruning and approximate optimistic linear support algorithm to efficiently discover the weight-vector sets of multiple gradients that quantify the inter-objective relationship. Then we construct an actor and a multi-objective critic that can co-learn the policy and the multi-objective vector value function. Finally, the weight discovery process and the policy and vector value function learning process can be iteratively executed to yield stable weight-vector sets and policies. To validate the effectiveness of the proposed approach, we present a quantitative evaluation of the approach based on three case studies.

algorithm, objective, optimization, (11 more...)

doi: 10.1613/jair.1.12270

AI Access Foundation

12270

Country:

North America > United States > Texas > Bexar County > San Antonio (0.14)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report (0.67)
Overview (0.46)

Industry:

Leisure & Entertainment > Games (0.67)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Žnidarič, Luka, Nusev, Gjorgji, Morel, Bertrand, Mougin, Julie, Juričić, Đani, Boškoski, Pavle

Evaluating uncertainties in electrochemical impedance spectra of solid oxide fuel cells

arXiv.org Machine LearningJan-20-2021

Electrochemical impedance spectra is a widely used tool for characterization of fuel cells and electrochemical conversion systems in general. When applied to the on-line monitoring in context of in-field applications, the disturbances, drifts and sensor noise may cause severe distortions in the evaluated spectra, especially in the low-frequency part. Failure to account for the random effects can implicate difficulties in interpreting the spectra and misleading diagnostic reasoning. In the literature, this fact has been largely ignored. In this paper, we propose a computationally efficient approach to the quantification of the spectral uncertainty by quantifying the uncertainty of the equivalent circuit model (ECM) parameters by means of the Variational Bayes (VB) approach. To assess the quality of the VB posterior estimates, we compare the results of VB approach with those obtained with the Markov Chain Monte Carlo (MCMC) algorithm. Namely, MCMC algorithm is expected to return accurate posterior distributions, while VB approach provides the approximative distributions. By using simulated and real data we show that VB approach generates approximations, which although slightly over-optimistic, are still pretty close to the more realistic MCMC estimates. A great advantage of the VB method for online monitoring is low computational load, which is several orders of magnitude lighter than that of MCMC. The performance of VB algorithm is demonstrated on a case of ECM parameters estimation in a 6 cell solid-oxide fuel cell stack. The complete numerical implementation for recreating the results can be found at https://repo.ijs.si/lznidaric/variational-bayes-supplementary-material.

lognormal, posterior distribution, variational distribution, (15 more...)

arXiv.org Machine Learning

2101.08049

Country:

Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.04)
North America > United States > New York (0.04)
Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)

Genre: Research Report (1.00)

Industry:

Energy > Renewable > Hydrogen (0.82)
Energy > Energy Storage (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Wang, H. J. Austin, Narasimhan, Karthik

Grounding Language to Entities and Dynamics for Generalization in Reinforcement Learning

arXiv.org Artificial IntelligenceJan-18-2021

In this paper, we consider the problem of leveraging textual descriptions to improve generalization of control policies to new scenarios. Unlike prior work in this space, we do not assume access to any form of prior knowledge connecting text and state observations, and learn both symbol grounding and control policy simultaneously. This is challenging due to a lack of concrete supervision, and incorrect groundings can result in worse performance than policies that do not use the text at all. We develop a new model, EMMA (Entity Mapper with Multi-modal Attention) which uses a multi-modal entity-conditioned attention module that allows for selective focus over relevant sentences in the manual for each entity in the environment. EMMA is end-to-end differentiable and can learn a latent grounding of entities and dynamics from text to observations using environment rewards as the only source of supervision. To empirically test our model, we design a new framework of 1320 games and collect text manuals with free-form natural language via crowd-sourcing. We demonstrate that EMMA achieves successful zero-shot generalization to unseen games with new dynamics, obtaining significantly higher rewards compared to multiple baselines. The grounding acquired by EMMA is also robust to noisy descriptions and linguistic variation.

agent, entity and dynamic, grounding language, (12 more...)

2101.07393

Country: North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)

Genre: Research Report (0.40)

Industry:

Education (0.68)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
(4 more...)

arXiv.org Machine LearningJan-17-2021

Discrete Graph Structure Learning for Forecasting Multiple Time Series

Shang, Chao, Chen, Jie, Bi, Jinbo

Time series forecasting is an extensively studied subject in statistics, economics, and computer science. Exploration of the correlation and causation among the variables in a multivariate time series shows promise in enhancing the performance of a time series model. When using deep neural networks as forecasting models, we hypothesize that exploiting the pairwise information among multiple (multivariate) time series also improves their forecast. If an explicit graph structure is known, graph neural networks (GNNs) have been demonstrated as powerful tools to exploit the structure. In this work, we propose learning the structure simultaneously with the GNN if the graph is unknown. We cast the problem as learning a probabilistic graph model through optimizing the mean performance over the graph distribution. The distribution is parameterized by a neural network so that discrete graphs can be sampled differentiably through reparameterization. Empirical evaluations show that our method is simpler, more efficient, and better performing than a recently proposed bilevel learning approach for graph structure learning, as well as a broad array of forecasting models, either deep or non-deep learning based, and graph or non-graph based.

graph, graph structure, time sery, (15 more...)

arXiv.org Machine Learning

2101.06861

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)
North America > United States > Connecticut (0.04)
North America > United States > California (0.04)

Genre: Research Report (0.40)

Industry:

Energy (0.68)
Government > Regional Government > North America Government > United States Government (0.47)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)