AITopics

2112.09477

Country:

North America > Canada > Ontario > Toronto (0.14)
South America > Chile (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Government (0.67)
Leisure & Entertainment > Games (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Deglurkar, Sampada, Lim, Michael H., Tucker, Johnathan, Sunberg, Zachary N., Faust, Aleksandra, Tomlin, Claire J.

Visual Learning-based Planning for Continuous High-Dimensional POMDPs

arXiv.org Artificial IntelligenceDec-17-2021

The Partially Observable Markov Decision Process (POMDP) is a powerful framework for capturing decision-making problems that involve state and transition uncertainty. However, most current POMDP planners cannot effectively handle very high-dimensional observations they often encounter in the real world (e.g. image observations in robotic domains). In this work, we propose Visual Tree Search (VTS), a learning and planning procedure that combines generative models learned offline with online model-based POMDP planning. VTS bridges offline model training and online planning by utilizing a set of deep generative observation models to predict and evaluate the likelihood of image observations in a Monte Carlo tree search planner. We show that VTS is robust to different observation noises and, since it utilizes online, model-based planning, can adapt to different reward structures without the need to re-train. This new approach outperforms a baseline state-of-the-art on-policy planning algorithm while using significantly less offline training time.

dualsmc, particle, pomdp, (16 more...)

2112.09456

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Massachusetts (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > Netherlands > South Holland > Delft (0.04)

Genre: Research Report (1.00)

Industry: Transportation (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Lejarza, Fernando, Calvert, Jacob, Attwood, Misty M, Evans, Daniel, Mao, Qingqing

Optimal discharge of patients from intensive care via a data-driven policy learning framework

arXiv.org Artificial IntelligenceDec-16-2021

Clinical decision support tools rooted in machine learning and optimization can provide significant value to healthcare providers, including through better management of intensive care units. In particular, it is important that the patient discharge task addresses the nuanced trade-off between decreasing a patient's length of stay (and associated hospitalization costs) and the risk of readmission or even death following the discharge decision. This work introduces an end-to-end general framework for capturing this trade-off to recommend optimal discharge timing decisions given a patient's electronic health records. A data-driven approach is used to derive a parsimonious, discrete state space representation that captures a patient's physiological condition. Based on this model and a given cost function, an infinite-horizon discounted Markov decision process is formulated and solved numerically to compute an optimal discharge policy, whose value is assessed using off-policy evaluation strategies. Extensive numerical experiments are performed to validate the proposed framework using real-life intensive care unit patient data.

discharge, health state, readmission, (15 more...)

2112.09315

Country:

North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > Texas > Harris County > Houston (0.14)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Health Care Technology > Medical Record (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
(2 more...)

Peng, Xiangyu, Riedl, Mark O., Ammanabrolu, Prithviraj

Inherently Explainable Reinforcement Learning in Natural Language

arXiv.org Artificial IntelligenceDec-16-2021

We focus on the task of creating a reinforcement learning agent that is inherently explainable -- with the ability to produce immediate local explanations by thinking out loud while performing a task and analyzing entire trajectories post-hoc to produce causal explanations. This Hierarchically Explainable Reinforcement Learning agent (HEX-RL), operates in Interactive Fictions, text-based game environments in which an agent perceives and acts upon the world using textual natural language. These games are usually structured as puzzles or quests with long-term dependencies in which an agent must complete a sequence of actions to succeed -- providing ideal environments in which to test an agent's ability to explain its actions. Our agent is designed to treat explainability as a first-class citizen, using an extracted symbolic knowledge graph-based state representation coupled with a Hierarchical Graph Attention mechanism that points to the facts in the internal graph representation that most influenced the choice of actions. Experiments show that this agent provides significantly improved explanations over strong baselines, as rated by human participants generally unfamiliar with the environment, while also matching state-of-the-art task performance.

agent, causal explanation, explanation, (15 more...)

2112.08907

Country: Oceania > Australia > Victoria > Melbourne (0.04)

Genre: Research Report (0.85)

Industry: Leisure & Entertainment > Games > Computer Games (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

DeepScalper: A Risk-Aware Deep Reinforcement Learning Framework for Intraday Trading with Micro-level Market Embedding

Sun, Shuo, Wang, Rundong, He, Xu, Zhu, Junlei, Li, Jian, An, Bo

Reinforcement learning (RL) techniques have shown great success in quantitative investment tasks, such as portfolio management and algorithmic trading. Especially, intraday trading is one of the most profitable and risky tasks because of the intraday behaviors of the financial market that reflect billions of rapidly fluctuating values. However, it is hard to apply existing RL methods to intraday trading due to the following three limitations: 1) overlooking micro-level market information (e.g., limit order book); 2) only focusing on local price fluctuation and failing to capture the overall trend of the whole trading day; 3) neglecting the impact of market risk. To tackle these limitations, we propose DeepScalper, a deep reinforcement learning framework for intraday trading. Specifically, we adopt an encoder-decoder architecture to learn robust market embedding incorporating both macro-level and micro-level market information. Moreover, a novel hindsight reward function is designed to provide the agent a long-term horizon for capturing the overall price trend. In addition, we propose a risk-aware auxiliary task by predicting future volatility, which helps the agent take market risk into consideration while maximizing profit. Finally, extensive experiments on two stock index futures and four treasury bond futures demonstrate that DeepScalper achieves significant improvement against many state-of-the-art approaches.

deepscalper, hindsight reward function, intraday trading, (15 more...)

2201.09058

Country: North America > United States > New York (0.04)

Genre: Research Report (1.00)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

SanMove: Next Location Recommendation via Self-Attention Network

Li, Huifeng, Wang, Bin, Zhu, Sulei, Xu, Yanyan

Currently, next location recommendation plays a vital role in location-based social network applications and services. Although many methods have been proposed to solve this problem, three important challenges have not been well addressed so far: (1) most existing methods are based on recurrent network, which is time-consuming to train long sequences due to not allowing for full parallelism; (2) personalized preferences generally are not considered reasonably; (3) existing methods rarely systematically studied how to efficiently utilize various auxiliary information (e.g., user ID and timestamp) in trajectory data and the spatio-temporal relations among non-consecutive locations. To address the above challenges, we propose a novel method named SanMove, a self-attention network based model, to predict the next location via capturing the long- and short-term mobility patterns of users. Specifically, SanMove introduces a long-term preference learning module, and it uses a self-attention module to capture the users long-term mobility pattern which can represent personalized location preferences of users. Meanwhile, SanMove uses a spatial-temporal guided non-invasive self-attention (STNOVA) to exploit auxiliary information to learn short-term preferences. We evaluate SanMove with two real-world datasets, and demonstrate SanMove is not only faster than the state-of-the-art RNN-based predict model but also outperforms the baselines for next location prediction.

dataset, information, trajectory, (17 more...)

2112.09076

Country:

Asia > China > Shanghai > Shanghai (0.05)
North America > United States > New York (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry: Information Technology > Services (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)

Carvalho, Wilka, Lampinen, Andrew, Nikiforou, Kyriacos, Hill, Felix, Shanahan, Murray

Feature-Attending Recurrent Modules for Generalization in Reinforcement Learning

Deep reinforcement learning (Deep RL) has recently seen significant progress in developing algorithms for generalization. However, most algorithms target a single type of generalization setting. In this work, we study generalization across three disparate task structures: (a) tasks composed of spatial and temporal compositions of regularly occurring object motions; (b) tasks composed of active perception of and navigation towards regularly occurring 3D objects; and (c) tasks composed of remembering goal-information over sequences of regularly occurring object-configurations. These diverse task structures all share an underlying idea of compositionality: task completion always involves combining recurring segments of task-oriented perception and behavior. We hypothesize that an agent can generalize within a task structure if it can discover representations that capture these recurring task-segments. For our tasks, this corresponds to representations for recognizing individual object motions, for navigation towards 3D objects, and for navigating through object-configurations. Taking inspiration from cognitive science, we term representations for recurring segments of an agent's experience, "perceptual schemas". We propose Feature Attending Recurrent Modules (FARM), which learns a state representation where perceptual schemas are distributed across multiple, relatively small recurrent modules. We compare FARM to recurrent architectures that leverage spatial attention, which reduces observation features to a weighted average over spatial positions. Our experiments indicate that our feature-attention mechanism better enables FARM to generalize across the diverse object-centric domains we study.

agent, module, regularity, (16 more...)

2112.08369

Country:

North America > United States > Michigan (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

GM, Harshvardhan, Sahu, Aanchal, Gourisaria, Mahendra Kumar

GM Score: Incorporating inter-class and intra-class generator diversity, discriminability of disentangled representation, and sample fidelity for evaluating GANs

While generative adversarial networks (GAN) are popular for their higher sample quality as opposed to other generative models like the variational autoencoders (VAE) and Boltzmann machines, they suffer from the same difficulty of the evaluation of generated samples. Various aspects must be kept in mind, such as the quality of generated samples, the diversity of classes (within a class and among classes), the use of disentangled latent spaces, agreement of said evaluation metric with human perception, etc. In this paper, we propose a new score, namely, GM Score, which takes into various factors such as sample quality, disentangled representation, intra-class and inter-class diversity, and other metrics such as precision, recall, and F1 score are employed for discriminability of latent space of deep belief network (DBN) and restricted Boltzmann machine (RBM). The evaluation is done for different GANs (GAN, DCGAN, BiGAN, CGAN, CoupledGAN, LSGAN, SGAN, WGAN, and WGAN Improved) trained on the benchmark MNIST dataset.

diversity, gan, intra-class diversity, (16 more...)

2112.06431

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
Asia > India (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.54)

arXiv.org Artificial IntelligenceDec-14-2021

Modeling Strong and Human-Like Gameplay with KL-Regularized Search

Jacob, Athul Paul, Wu, David J., Farina, Gabriele, Lerer, Adam, Bakhtin, Anton, Andreas, Jacob, Brown, Noam

We consider the task of building strong but human-like policies in multi-agent decision-making problems, given examples of human behavior. Imitation learning is effective at predicting human actions but may not match the strength of expert humans, while self-play learning and search techniques (e.g. AlphaZero) lead to strong performance but may produce policies that are difficult for humans to understand and coordinate with. We show in chess and Go that regularizing search policies based on the KL divergence from an imitation-learned policy by applying Monte Carlo tree search produces policies that have higher human prediction accuracy and are stronger than the imitation policy. We then introduce a novel regret minimization algorithm that is regularized based on the KL divergence from an imitation-learned policy, and show that applying this algorithm to no-press Diplomacy yields a policy that maintains the same human prediction accuracy as imitation learning while being substantially stronger.

accuracy, agent, modeling strong and human-like gameplay, (14 more...)

2112.07544

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > United Kingdom > England (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games > Chess (0.68)
Leisure & Entertainment > Games > Go (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Lindenberg, Björn, Nordqvist, Jonas, Lindahl, Karl-Olof

Conjugated Discrete Distributions for Distributional Reinforcement Learning

arXiv.org Artificial IntelligenceDec-14-2021

In this work we continue to build upon recent advances in reinforcement learning for finite Markov processes. A common approach among previous existing algorithms, both single-actor and distributed, is to either clip rewards or to apply a transformation method on Q-functions to handle a large variety of magnitudes in real discounted returns. We theoretically show that one of the most successful methods may not yield an optimal policy if we have a non-deterministic process. As a solution, we argue that distributional reinforcement learning lends itself to remedy this situation completely. By the introduction of a conjugated distributional operator we may handle a large class of transformations for real returns with guaranteed theoretical convergence. We propose an approximating single-actor algorithm based on this operator that trains agents directly on unaltered rewards using a proper distributional metric given by the Cram\'er distance. To evaluate its performance in a stochastic setting we train agents on a suite of 55 Atari 2600 games using sticky-actions and obtain state-of-the-art performance compared to other well-known algorithms in the Dopamine framework.

algorithm, bellemare, operator, (14 more...)

2112.07424

Country:

Europe > Sweden > Kronoberg County > Växjö (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)