AITopics

2211.07092

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Japan > Honshū > Tōhoku (0.04)

Genre:

Instructional Material > Course Syllabus & Notes (0.67)
Research Report > New Finding (0.48)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Grand-Clément, Julien, Petrik, Marek

Reducing Blackwell and Average Optimality to Discounted MDPs via the Blackwell Discount Factor

arXiv.org Artificial IntelligenceJan-31-2023

We introduce the Blackwell discount factor for Markov Decision Processes (MDPs). Classical objectives for MDPs include discounted, average, and Blackwell optimality. Many existing approaches to computing average-optimal policies solve for discounted optimal policies with a discount factor close to $1$, but they only work under strong or hard-to-verify assumptions such as ergodicity or weakly communicating MDPs. In this paper, we show that when the discount factor is larger than the Blackwell discount factor $\gamma_{\mathrm{bw}}$, all discounted optimal policies become Blackwell- and average-optimal, and we derive a general upper bound on $\gamma_{\mathrm{bw}}$. The upper bound on $\gamma_{\mathrm{bw}}$ provides the first reduction from average and Blackwell optimality to discounted optimality, without any assumptions, and new polynomial-time algorithms for average- and Blackwell-optimal policies. Our work brings new ideas from the study of polynomials and algebraic numbers to the analysis of MDPs. Our results also apply to robust MDPs, enabling the first algorithms to compute robust Blackwell-optimal policies.

blackwell-optimal policy, machine learning, reinforcement learning, (17 more...)

2302.00036

Country:

North America > United States > New York > Suffolk County > Stony Brook (0.04)
North America > United States > New Hampshire (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

arXiv.org Artificial IntelligenceJan-31-2023

An Efficient Solution to s-Rectangular Robust Markov Decision Processes

Kumar, Navdeep, Levy, Kfir, Wang, Kaixin, Mannor, Shie

In Markov Decision Processes (MDPs), an agent interacts with the environment and learns to optimally behave in it [28]. However, the MDP solution may be very sensitive to little changes in the model parameters [23]. Hence we should be cautious applying the solution of the MDP, when the model is changing or when there is uncertainty in the model parameters. Robust MDPs provide a way to address this issue, where an agent can learn to optimally behave even when the model parameters are uncertain [15, 29, 18]. Another motivation to study robust MDPs is that they can lead to better generalization [33, 34, 25] compared to non-robust solutions.

artificial intelligence, iteration, machine learning, (18 more...)

2301.13642

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Singapore (0.04)
Asia > China (0.04)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.61)

Brahmachari, Shrigyan, Lumbreras, Josep, Tomamichel, Marco

Quantum contextual bandits and recommender systems for quantum data

arXiv.org Artificial IntelligenceJan-31-2023

Recommender systems are a class of online reinforcement learning algorithms that interact sequentially with an environment suggesting relevant items to a user. During the last decade, there has been an increasing interest in online recommendation techniques due to the importance of advertisement recommendation for e-commerce websites or the rise of movies and music streaming platforms [1, 2]. Among different settings for recommender systems, in this work, we focus on the contextual bandit framework applied to the recommendation of quantum data. The contextual bandit problem is a variant of the multi-armed bandit problem where a learner at each round receives a context and given a set of actions (also called actions) has to decide the best action using the context information. After selecting an action the learner will receive a reward and for the next rounds, they will use the previous information of contexts and rewards in order to make their future choices.

artificial intelligence, data mining, machine learning, (20 more...)

2301.13524

Country:

Asia > Singapore > Central Region > Singapore (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Services > e-Commerce Services (0.54)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Streaming Anomaly Detection

Bhatia, Siddharth

Anomaly detection is critical for finding suspicious behavior in innumerable systems. We need to detect anomalies in real-time, i.e. determine if an incoming entity is anomalous or not, as soon as we receive it, to minimize the effects of malicious activities and start recovery as soon as possible. Therefore, online algorithms that can detect anomalies in a streaming manner are essential. We first propose MIDAS which uses a count-min sketch to detect anomalous edges in dynamic graphs in an online manner, using constant time and memory. We then propose two variants, MIDAS-R which incorporates temporal and spatial relations, and MIDAS-F which aims to filter away anomalous edges to prevent them from negatively affecting the internal data structures. We then extend the count-min sketch to a Higher-Order sketch to capture complex relations in graph data, and to reduce detecting suspicious dense subgraph problem to finding a dense submatrix in constant time. Using this sketch, we propose four streaming methods to detect edge and subgraph anomalies. Next, we broaden the graph setting to multi-aspect data. We propose MStream which detects explainable anomalies in multi-aspect data streams. We further propose MStream-PCA, MStream-IB, and MStream-AE to incorporate correlation between features. Finally, we consider multi-dimensional data streams with concept drift and propose MemStream. MemStream leverages the power of a denoising autoencoder to learn representations and a memory module to learn the dynamically changing trend in data without the need for labels. We prove a theoretical bound on the size of memory for effective drift handling. In addition, we allow quick retraining when the arriving stream becomes sufficiently different from the training data. Furthermore, MemStream makes use of two architecture design choices to be robust to memory poisoning.

data mining, machine learning, weakly correlated semi-supervision, (19 more...)

2301.13199

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
South America > Brazil (0.04)
North America > United States > New York (0.04)
(12 more...)

Genre:

Research Report (1.00)
Overview (0.92)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area (1.00)
Government > Regional Government > North America Government > United States Government (0.68)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.67)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Vijayan, Nithia, A, Prashanth L.

Policy Gradient Methods for Distortion Risk Measures

We propose policy gradient algorithms which learn risk-sensitive policies in a reinforcement learning (RL) framework. Our proposed algorithms maximize the distortion risk measure (DRM) of the cumulative reward in an episodic Markov decision process in on-policy as well as off-policy RL settings. We derive a variant of the policy gradient theorem that caters to the DRM objective, and use this theorem in conjunction with a likelihood ratio-based gradient estimation scheme. We derive non-asymptotic bounds that establish the convergence of our proposed algorithms to an approximate stationary point of the DRM objective.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2107.04422

Country:

North America > Mexico (0.04)
Asia > India (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Altundas, Batuhan, Wang, Zheyuan, Bishop, Joshua, Gombolay, Matthew

Learning Coordination Policies over Heterogeneous Graphs for Human-Robot Teams via Recurrent Neural Schedule Propagation

As human-robot collaboration increases in the workforce, it becomes essential for human-robot teams to coordinate efficiently and intuitively. Traditional approaches for human-robot scheduling either utilize exact methods that are intractable for large-scale problems and struggle to account for stochastic, time varying human task performance, or application-specific heuristics that require expert domain knowledge to develop. We propose a deep learning-based framework, called HybridNet, combining a heterogeneous graph-based encoder with a recurrent schedule propagator for scheduling stochastic human-robot teams under upper- and lower-bound temporal constraints. The HybridNet's encoder leverages Heterogeneous Graph Attention Networks to model the initial environment and team dynamics while accounting for the constraints. By formulating task scheduling as a sequential decision-making process, the HybridNet's recurrent neural schedule propagator leverages Long Short-Term Memory (LSTM) models to propagate forward consequences of actions to carry out fast schedule generation, removing the need to interact with the environment between every task-agent pair selection. The resulting scheduling policy network provides a computationally lightweight yet highly expressive model that is end-to-end trainable via Reinforcement Learning algorithms. We develop a virtual task scheduling environment for mixed human-robot teams in a multi-round setting, capable of modeling the stochastic learning behaviors of human workers. Experimental results showed that HybridNet outperformed other human-robot scheduling solutions across problem sizes for both deterministic and stochastic human performance, with faster runtime compared to pure-GNN-based schedulers.

constraint, machine learning, reinforcement learning, (19 more...)

doi: 10.1109/IROS47612.2022.9981748

2301.13279

Country: North America > United States > Georgia > Fulton County > Atlanta (0.04)

Genre: Research Report > New Finding (0.54)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Humanoid Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Vos, Daniël, Verwer, Sicco

Optimal Decision Tree Policies for Markov Decision Processes

Interpretability of reinforcement learning policies is essential for many real-world tasks but learning such interpretable policies is a hard problem. Particularly rule-based policies such as decision trees and rules lists are difficult to optimize due to their non-differentiability. While existing techniques can learn verifiable decision tree policies there is no guarantee that the learners generate a decision that performs optimally. In this work, we study the optimization of size-limited decision trees for Markov Decision Processes (MPDs) and propose OMDTs: Optimal MDP Decision Trees. Given a user-defined size limit and MDP formulation OMDT directly maximizes the expected discounted return for the decision tree using Mixed-Integer Linear Programming. By training optimal decision tree policies for different MDPs we empirically study the optimality gap for existing imitation learning techniques and find that they perform sub-optimally. We show that this is due to an inherent shortcoming of imitation learning, namely that complex policies cannot be represented using size-limited trees. In such cases, it is better to directly optimize the tree for expected return. While there is generally a trade-off between the performance and interpretability of machine learning models, we find that OMDTs limited to a depth of 3 often perform close to the optimal limit.

artificial intelligence, decision tree, machine learning, (16 more...)

2301.13185

Country:

North America > United States (0.14)
Europe > Netherlands > South Holland > Delft (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.84)

Emergence of Maps in the Memories of Blind Navigation Agents

Wijmans, Erik, Savva, Manolis, Essa, Irfan, Lee, Stefan, Morcos, Ari S., Batra, Dhruv

Animal navigation research posits that organisms build and maintain internal spatial representations, or maps, of their environment. We ask if machines -- specifically, artificial intelligence (AI) navigation agents -- also build implicit (or 'mental') maps. A positive answer to this question would (a) explain the surprising phenomenon in recent literature of ostensibly map-free neural-networks achieving strong performance, and (b) strengthen the evidence of mapping as a fundamental mechanism for navigation by intelligent embodied agents, whether they be biological or artificial. Unlike animal navigation, we can judiciously design the agent's perceptual system and control the learning paradigm to nullify alternative navigation mechanisms. Specifically, we train 'blind' agents -- with sensing limited to only egomotion and no other sensing of any kind -- to perform PointGoal navigation ('go to $\Delta$ x, $\Delta$ y') via reinforcement learning. Our agents are composed of navigation-agnostic components (fully-connected and recurrent neural networks), and our experimental setup provides no inductive bias towards mapping. Despite these harsh conditions, we find that blind agents are (1) surprisingly effective navigators in new environments (~95% success); (2) they utilize memory over long horizons (remembering ~1,000 steps of past experience in an episode); (3) this memory enables them to exhibit intelligent behavior (following walls, detecting collisions, taking shortcuts); (4) there is emergence of maps and collision detection neurons in the representations of the environment built by a blind agent as it navigates; and (5) the emergent maps are selective and task dependent (e.g. the agent 'forgets' exploratory detours). Overall, this paper presents no new techniques for the AI audience, but a surprising finding, an insight, and an explanation.

artificial intelligence, deep learning, machine learning, (20 more...)

2301.13261

Country: North America > United States > Oregon (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)

Industry:

Health & Medicine (0.46)
Media (0.34)
Leisure & Entertainment (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Zhou, Hanhan, Lan, Tian, Aggarwal, Vaneet

PAC: Assisted Value Factorisation with Counterfactual Predictions in Multi-Agent Reinforcement Learning

arXiv.org Artificial IntelligenceJan-29-2023

Multi-agent reinforcement learning (MARL) has witnessed significant progress with the development of value function factorization methods. It allows optimizing a joint action-value function through the maximization of factorized per-agent utilities due to monotonicity. In this paper, we show that in partially observable MARL problems, an agent's ordering over its own actions could impose concurrent constraints (across different states) on the representable function class, causing significant estimation error during training. We tackle this limitation and propose PAC, a new framework leveraging Assistive information generated from Counterfactual Predictions of optimal joint action selection, which enable explicit assistance to value function factorization through a novel counterfactual loss. A variational inference-based information encoding method is developed to collect and encode the counterfactual predictions from an estimated baseline. To enable decentralized execution, we also derive factorized per-agent policies inspired by a maximum-entropy MARL framework. We evaluate the proposed PAC on multi-agent predator-prey and a set of StarCraft II micromanagement tasks. Empirical results demonstrate improved results of PAC over state-of-the-art value-based and policy-based multi-agent reinforcement learning algorithms on all benchmarks.

information, machine learning, reinforcement learning, (14 more...)

2206.1142

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)