AITopics

2109.06915

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

#artificialintelligenceOct-24-2022, 05:50:15 GMT

How Boltzmann Machines work part3(Artificial Intelligence)

Abstract: Graph clustering is the process of grouping vertices into densely connected sets called clusters. In doing so, we obtain a heuristic approximation to the intra-cluster density maximization problem. We use two variations of a Boltzmann machine heuristic to obtain numerical solutions. For benchmarking purposes, we compare solution quality and computational performances to those obtained using a commercial solver, Gurobi. We also compare clustering quality to the clusters obtained using the popular Louvain modularity maximization method.

boltzmann machine work part3, phase transition, quantum phase transition, (11 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.56)

Tirinzoni, Andrea, Al-Marjani, Aymen, Kaufmann, Emilie

Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs

In probably approximately correct (PAC) reinforcement learning (RL), an agent is required to identify an $\epsilon$-optimal policy with probability $1-\delta$. While minimax optimal algorithms exist for this problem, its instance-dependent complexity remains elusive in episodic Markov decision processes (MDPs). In this paper, we propose the first nearly matching (up to a horizon squared factor and logarithmic terms) upper and lower bounds on the sample complexity of PAC RL in deterministic episodic MDPs with finite state and action spaces. In particular, our bounds feature a new notion of sub-optimality gap for state-action pairs that we call the deterministic return gap. While our instance-dependent lower bound is written as a linear program, our algorithms are very simple and do not require solving such an optimization problem during learning. Their design and analyses employ novel ideas, including graph-theoretical concepts (minimum flows) and a new maximum-coverage exploration strategy.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2203.09251

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)

Corso, Anthony, Wang, Yizheng, Zechner, Markus, Caers, Jef, Kochenderfer, Mykel J.

A POMDP Model for Safe Geological Carbon Sequestration

Geological carbon capture and sequestration (CCS), where CO$_2$ is stored in subsurface formations, is a promising and scalable approach for reducing global emissions. However, if done incorrectly, it may lead to earthquakes and leakage of CO$_2$ back to the surface, harming both humans and the environment. These risks are exacerbated by the large amount of uncertainty in the structure of the storage formation. For these reasons, we propose that CCS operations be modeled as a partially observable Markov decision process (POMDP) and decisions be informed using automated planning algorithms. To this end, we develop a simplified model of CCS operations based on a 2D spillpoint analysis that retains many of the challenges and safety considerations of the real-world problem. We show how off-the-shelf POMDP solvers outperform expert baselines for safe CCS planning. This POMDP model can be used as a test bed to drive the development of novel decision-making algorithms for CCS operations.

artificial intelligence, co 2, machine learning, (16 more...)

2212.00669

Country:

Europe > Norway > Norwegian Sea (0.24)
North America > United States > California > Santa Clara County (0.16)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas > Upstream (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Opportunistic Episodic Reinforcement Learning

Wang, Xiaoxiao, Bouacida, Nader, Guo, Xueying, Liu, Xin

In this paper, we propose and study opportunistic reinforcement learning - a new variant of reinforcement learning problems where the regret of selecting a suboptimal action varies under an external environmental condition known as the variation factor. When the variation factor is low, so is the regret of selecting a suboptimal action and vice versa. Our intuition is to exploit more when the variation factor is high, and explore more when the variation factor is low. We demonstrate the benefit of this novel framework for finite-horizon episodic MDPs by designing and evaluating OppUCRL2 and OppPSRL algorithms. Our algorithms dynamically balance the exploration-exploitation trade-off for reinforcement learning by introducing variation factor-dependent optimism to guide exploration. We establish an $\tilde{O}(HS \sqrt{AT})$ regret bound for the OppUCRL2 algorithm and show through simulations that both OppUCRL2 and OppPSRL algorithm outperform their original corresponding algorithms.

artificial intelligence, machine learning, reinforcement learning, (11 more...)

2210.13504

Country:

North America > United States > California > Yolo County > Davis (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Conserva, Michelangelo, Rauber, Paulo

Hardness in Markov Decision Processes: Theory and Practice

Meticulously analysing the empirical strengths and weaknesses of reinforcement learning methods in hard (challenging) environments is essential to inspire innovations and assess progress in the field. In tabular reinforcement learning, there is no well-established standard selection of environments to conduct such analysis, which is partially due to the lack of a widespread understanding of the rich theory of hardness of environments. The goal of this paper is to unlock the practical usefulness of this theory through four main contributions. First, we present a systematic survey of the theory of hardness, which also identifies promising research directions. Second, we introduce Colosseum, a pioneering package that enables empirical hardness analysis and implements a principled benchmark composed of environments that are diverse with respect to different measures of hardness. Third, we present an empirical analysis that provides new insights into computable measures. Finally, we benchmark five tabular agents in our newly proposed benchmark. While advancing the theoretical understanding of hardness in non-tabular reinforcement learning remains essential, our contributions in the tabular setting are intended as solid steps towards a principled non-tabular benchmark. Accordingly, we benchmark four agents in non-tabular versions of Colosseum environments, obtaining results that demonstrate the generality of tabular hardness measures.

machine learning, mg-empty, reinforcement learning, (16 more...)

2210.13075

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.81)

Industry: Leisure & Entertainment > Games > Computer Games (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.83)

Zhang, Shangtong, Tachet, Remi, Laroche, Romain

Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch

In this paper, we establish the global optimality and convergence rate of an off-policy actor critic algorithm in the tabular setting without using density ratio to correct the discrepancy between the state distribution of the behavior policy and that of the target policy. Our work goes beyond existing works on the optimality of policy gradient methods in that existing works use the exact policy gradient for updating the policy parameters while we use an approximate and stochastic update step. Our update step is not a gradient update because we do not use a density ratio to correct the state distribution, which aligns well with what practitioners do. Our update is approximate because we use a learned critic instead of the true value function. Our update is stochastic because at each step the update is done for only the current state action pair. Moreover, we remove several restrictive assumptions from existing works in our analysis. Central to our work is the finite sample analysis of a generic stochastic approximation algorithm with time-inhomogeneous update operators on time-inhomogeneous Markov chains, based on its uniform contraction properties.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2111.02997

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > Canada > Quebec > Montreal (0.04)
North America > United States > Virginia > Albemarle County > Charlottesville (0.04)
(2 more...)

Genre:

Research Report (0.64)
Workflow (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

#artificialintelligenceOct-23-2022, 07:25:06 GMT

How Boltzmann Machines work part2(Artificial Intelligence)

Abstract: Unmanned aerial vehicle (UAV) is steadily growing as a promising technology for next-generation communication systems due to their appealing features such as wide coverage with high altitude, on-demand low-cost deployment, and fast responses. UAV communications are fundamentally different from the conventional terrestrial and satellite communications owing to the high mobility and the unique channel characteristics of air-ground links. However, obtaining effective channel state information (CSI) is challenging because of the dynamic propagation environment and variable transmission delay. In this paper, a deep learning (DL)-based CSI prediction framework is proposed to address channel aging problem by extracting the most discriminative features from the UAV wireless signals. Specifically, we develop a procedure of multiple Gaussian Bernoulli restricted Boltzmann machines (GBRBM) for dimension reduction and pre-training utilization incorporated with an autoencoder-based deep neural networks (DNNs).

artificial intelligence, boltzmann machine, boltzmann machine work part2, (8 more...)

#artificialintelligence

Industry: Telecommunications (0.52)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.80)

Rao, Rajesh P. N., Gklezakos, Dimitrios C., Sathish, Vishwas

Active Predictive Coding: A Unified Neural Framework for Learning Hierarchical World Models for Perception and Planning

arXiv.org Artificial IntelligenceOct-23-2022

Predictive coding has emerged as a prominent model of how the brain learns through predictions, anticipating the importance accorded to predictive learning in recent AI architectures such as transformers. Here we propose a new framework for predictive coding called active predictive coding which can learn hierarchical world models and solve two radically different open problems in AI: (1) how do we learn compositional representations, e.g., part-whole hierarchies, for equivariant vision? and (2) how do we solve large-scale planning problems, which are hard for traditional reinforcement learning, by composing complex action sequences from primitive policies? Our approach exploits hypernetworks, self-supervised learning and reinforcement learning to learn hierarchical world models that combine task-invariant state transition networks and task-dependent policy networks at multiple abstraction levels. We demonstrate the viability of our approach on a variety of vision datasets (MNIST, FashionMNIST, Omniglot) as well as on a scalable hierarchical planning problem. Our results represent, to our knowledge, the first demonstration of a unified solution to the part-whole learning problem posed by Hinton, the nested reference frames problem posed by Hawkins, and the integrated state-action hierarchy learning problem in reinforcement learning.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2210.13461

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > France (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Law > Litigation (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.68)
Education > Focused Education > Special Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.91)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.68)
(2 more...)

Prasad, Vignesh, Koert, Dorothea, Stock-Homburg, Ruth, Peters, Jan, Chalvatzaki, Georgia

MILD: Multimodal Interactive Latent Dynamics for Learning Human-Robot Interaction

arXiv.org Artificial IntelligenceOct-22-2022

Modeling interaction dynamics to generate robot trajectories that enable a robot to adapt and react to a human's actions and intentions is critical for efficient and effective collaborative Human-Robot Interactions (HRI). Learning from Demonstration (LfD) methods from Human-Human Interactions (HHI) have shown promising results, especially when coupled with representation learning techniques. However, such methods for learning HRI either do not scale well to high dimensional data or cannot accurately adapt to changing via-poses of the interacting partner. We propose Multimodal Interactive Latent Dynamics (MILD), a method that couples deep representation learning and probabilistic machine learning to address the problem of two-party physical HRIs. We learn the interaction dynamics from demonstrations, using Hidden Semi-Markov Models (HSMMs) to model the joint distribution of the interacting agents in the latent space of a Variational Autoencoder (VAE). Our experimental evaluations for learning HRI from HHI demonstrations show that MILD effectively captures the multimodality in the latent representations of HRI tasks, allowing us to decode the varying dynamics occurring in such tasks. Compared to related work, MILD generates more accurate trajectories for the controlled agent (robot) when conditioned on the observed agent's (human) trajectory. Notably, MILD can learn directly from camera-based pose estimations to generate trajectories, which we then map to a humanoid robot without the need for any additional training.

artificial intelligence, machine learning, trajectory, (16 more...)

doi: 10.1109/Humanoids53995.2022.10000239

2210.12418

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.05)
Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.87)