AITopics

2210.12556

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Jung, Jeahan, Choi, Minseok

Bayesian deep learning framework for uncertainty quantification in high dimensions

We develop a novel deep learning method for uncertainty quantification in stochastic partial differential equations based on Bayesian neural network (BNN) and Hamiltonian Monte Carlo (HMC). A BNN efficiently learns the posterior distribution of the parameters in deep neural networks by performing Bayesian inference on the network parameters. The posterior distribution is efficiently sampled using HMC to quantify uncertainties in the system. Several numerical examples are shown for both forward and inverse problems in high dimension to demonstrate the effectiveness of the proposed method for uncertainty quantification. These also show promising results that the computational cost is almost independent of the dimension of the problem demonstrating the potential of the method for tackling the so-called curse of dimensionality.

artificial intelligence, deep learning, machine learning, (17 more...)

2210.11737

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Biologically Plausible Variational Policy Gradient with Spiking Recurrent Winner-Take-All Networks

Yang, Zhile, Guo, Shangqi, Fang, Ying, Liu, Jian K.

One stream of reinforcement learning research is exploring biologically plausible models and algorithms to simulate biological intelligence and fit neuromorphic hardware. Among them, reward-modulated spike-timing-dependent plasticity (R-STDP) is a recent branch with good potential in energy efficiency. However, current R-STDP methods rely on heuristic designs of local learning rules, thus requiring task-specific expert knowledge. In this paper, we consider a spiking recurrent winner-take-all network, and propose a new R-STDP method, spiking variational policy gradient (SVPG), whose local learning rules are derived from the global policy gradient and thus eliminate the need for heuristic designs. In experiments of MNIST classification and Gym InvertedPendulum, our SVPG achieves good training performance, and also presents better robustness to various kinds of noises than conventional methods.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2210.13225

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Muglich, Darius, de Witt, Christian Schroeder, van der Pol, Elise, Whiteson, Shimon, Foerster, Jakob

Equivariant Networks for Zero-Shot Coordination

Successful coordination in Dec-POMDPs requires agents to adopt robust strategies and interpretable styles of play for their partner. A common failure mode is symmetry breaking, when agents arbitrarily converge on one out of many equivalent but mutually incompatible policies. Commonly these examples include partial observability, e.g. waving your right hand vs. left hand to convey a covert message. In this paper, we present a novel equivariant network architecture for use in Dec-POMDPs that prevents the agent from learning policies which break symmetries, doing so more effectively than prior methods. Our method also acts as a "coordination-improvement operator" for generic, pre-trained policies, and thus may be applied at test-time in conjunction with any self-play algorithm. We provide theoretical guarantees of our work and test on the AI benchmark task of Hanabi, where we demonstrate our methods outperforming other symmetry-aware baselines in zero-shot coordination, as well as able to improve the coordination ability of a variety of pre-trained policies. In particular, we show our method can be used to improve on the state of the art for zero-shot coordination on the Hanabi benchmark.

large language model, machine learning, reinforcement learning, (16 more...)

2210.12124

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
(2 more...)

Explainability in autonomous pedagogically structured scenarios

Patil, Minal Suresh

We present the notion of explainability for decision-making processes in a pedagogically structured autonomous environment. Multi-agent systems that are structured pedagogically consist of pedagogical teachers and learners that operate in environments in which both are sometimes not fully aware of all the states in the environment and beliefs of other agents thus making it challenging to explain their decisions and actions with one another. This work emphasises the need for robust and iterative explanation-based communication between the pedagogical teacher and the learner. Explaining the rationale behind multi-agent decisions in an interactive, partially observable environment is necessary to build trustworthy and reliable communication between pedagogical teachers and learners. Ongoing research is primarily focused on explanations of the agents' behaviour towards humans, and there is a lack of research on inter-agent explainability.

agent, artificial intelligence, machine learning, (16 more...)

2210.1214

Country:

Europe > Sweden > Västerbotten County > Umeå (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)

Genre: Research Report (0.40)

Industry: Education (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.74)

Donâncio, Henrique, Vercouter, Laurent, Roclawski, Harald

The Pump Scheduling Problem: A Real-World Scenario for Reinforcement Learning

Deep Reinforcement Learning (DRL) has achieved remarkable success in scenarios such as games and has emerged as a potential solution for control tasks. That is due to its ability to leverage scalability and handle complex dynamics. However, few works have targeted environments grounded in real-world settings. Indeed, real-world scenarios can be challenging, especially when faced with the high dimensionality of the state space and unknown reward function. We release a testbed consisting of an environment simulator and demonstrations of human operation concerning pump scheduling of a real-world water distribution facility to facilitate research. The pump scheduling problem can be viewed as a decision process to decide when to operate pumps to supply water while limiting electricity consumption and meeting system constraints. To provide a starting point, we release a well-documented codebase, present an overview of some challenges that can be addressed and provide a baseline representation of the problem.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2210.11111

Genre:

Overview (0.86)
Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Mode Reduction for Markov Jump Systems

Du, Zhe, Balzano, Laura, Ozay, Necmiye

Switched systems are capable of modeling processes with underlying dynamics that may change abruptly over time. To achieve accurate modeling in practice, one may need a large number of modes, but this may in turn increase the model complexity drastically. Existing work on reducing system complexity mainly considers state space reduction, whereas reducing the number of modes is less studied. In this work, we consider Markov jump linear systems (MJSs), a special class of switched systems where the active mode switches according to a Markov chain, and several issues associated with its mode complexity. Specifically, inspired by clustering techniques from unsupervised learning, we are able to construct a reduced MJS with fewer modes that approximates the original MJS well under various metrics. Furthermore, both theoretically and empirically, we show how one can use the reduced MJS to analyze stability and design controllers with significant reduction in computational cost while achieving guaranteed accuracy.

artificial intelligence, denote, machine learning, (17 more...)

doi: 10.1109/OJCSYS.2022.3212613

2205.02697

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
South America > Brazil (0.04)

Genre: Research Report (0.81)

Industry:

Energy (0.45)
Banking & Finance (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Bharadwaj, Diddigi Raghu Ram, Kumar, Lakshya, Jawaid, Saif, Vempati, Sreekanth

Fine-Grained Session Recommendations in E-commerce using Deep Reinforcement Learning

Sustaining users' interest and keeping them engaged in the platform is very important for the success of an e-commerce business. A session encompasses different activities of a user between logging into the platform and logging out or making a purchase. User activities in a session can be classified into two groups: Known Intent and Unknown intent. Known intent activity pertains to the session where the intent of a user to browse/purchase a specific product can be easily captured. Whereas in unknown intent activity, the intent of the user is not known. For example, consider the scenario where a user enters the session to casually browse the products over the platform, similar to the window shopping experience in the offline setting. While recommending similar products is essential in the former, accurately understanding the intent and recommending interesting products is essential in the latter setting in order to retain a user. In this work, we focus primarily on the unknown intent setting where our objective is to recommend a sequence of products to a user in a session to sustain their interest, keep them engaged and possibly drive them towards purchase. We formulate this problem in the framework of the Markov Decision Process (MDP), a popular mathematical framework for sequential decision making and solve it using Deep Reinforcement Learning (DRL) techniques. However, training the next product recommendation is difficult in the RL paradigm due to large variance in browse/purchase behavior of the users. Therefore, we break the problem down into predicting various product attributes, where a pattern/trend can be identified and exploited to build accurate models. We show that the DRL agent provides better performance compared to a greedy strategy.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2210.15451

Country:

North America > United States > District of Columbia > Washington (0.05)
Asia > India (0.05)
Asia > Myanmar > Tanintharyi Region > Dawei (0.05)
(2 more...)

Genre: Research Report (0.40)

Industry: Information Technology > Services > e-Commerce Services (0.72)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Berducci, Luigi, Grosu, Radu

Safe Policy Improvement in Constrained Markov Decision Processes

The automatic synthesis of a policy through reinforcement learning (RL) from a given set of formal requirements depends on the construction of a reward signal and consists of the iterative application of many policy-improvement steps. The synthesis algorithm has to balance target, safety, and comfort requirements in a single objective and to guarantee that the policy improvement does not increase the number of safety-requirements violations, especially for safety-critical applications. In this work, we present a solution to the synthesis problem by solving its two main challenges: reward-shaping from a set of formal requirements and safe policy update. For the former, we propose an automatic reward-shaping procedure, defining a scalar reward signal compliant with the task specification. For the latter, we introduce an algorithm ensuring that the policy is improved in a safe fashion with high-confidence guarantees. We also discuss the adoption of a model-based RL algorithm to efficiently use the collected data and train a model-free agent on the predicted trajectories, where the safety violation does not have the same impact as in the real world. Finally, we demonstrate in standard control benchmarks that the resulting learning procedure is effective and robust even under heavy perturbations of the hyperparameters.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

doi: 10.1007/978-3-031-19849-6_21

2210.11259

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(6 more...)

Genre:

Overview (0.67)
Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.51)

Jain, Arnav Kumar, Sujit, Shivakanth, Joshi, Shruti, Michalski, Vincent, Hafner, Danijar, Ebrahimi-Kahou, Samira

Learning Robust Dynamics through Variational Sparse Gating

Learning world models from their sensory inputs enables agents to plan for actions by imagining their future outcomes. World models have previously been shown to improve sample-efficiency in simulated environments with few objects, but have not yet been applied successfully to environments with many objects. In environments with many objects, often only a small number of them are moving or interacting at the same time. In this paper, we investigate integrating this inductive bias of sparse interactions into the latent dynamics of world models trained from pixels. First, we introduce Variational Sparse Gating (VSG), a latent dynamics model that updates its feature dimensions sparsely through stochastic binary gates. Moreover, we propose a simplified architecture Simple Variational Sparse Gating (SVSG) that removes the deterministic pathway of previous models, resulting in a fully stochastic transition function that leverages the VSG mechanism. We evaluate the two model architectures in the BringBackShapes (BBS) environment that features a large number of moving objects and partial observability, demonstrating clear improvements over prior models.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2210.11698

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > Canada > Quebec > Montreal (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)