AITopics | Undirected Networks

Collaborating Authors

Undirected Networks

News Overviews Instructional Materials AI-Alerts Classics

RBM-Flow and D-Flow: Invertible Flows with Discrete Energy Base Spaces

arXiv.org Machine LearningDec-24-2020

Efficient sampling of complex data distributions can be achieved using trained invertible flows (IF), where the model distribution is generated by pushing a simple base distribution through multiple non-linear bijective transformations. However, the iterative nature of the transformations in IFs can limit the approximation to the target distribution. In this paper we seek to mitigate this by implementing RBM-Flow, an IF model whose base distribution is a Restricted Boltzmann Machine (RBM) with a continuous smoothing applied. We show that by using RBM-Flow we are able to improve the quality of samples generated, quantified by the Inception Scores (IS) and Frechet Inception Distance (FID), over baseline models with the same IF transformations, but with less expressive base distributions. Furthermore, we also obtain D-Flow, an IF model with uncorrelated discrete latent variables. We show that D-Flow achieves similar likelihoods and FID/IS scores to those of a typical IF with Gaussian base variables, but with the additional benefit that global features are meaningfully encoded as discrete labels in the latent space.

arxiv, d-flow, rbm-flow, (14 more...)

arXiv.org Machine Learning

2012.13196

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
North America > United States > Texas > Harris County > Houston (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.85)

Industry: Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Identification of Unexpected Decisions in Partially Observable Monte-Carlo Planning: a Rule-Based Approach

Mazzi, Giulio, Castellini, Alberto, Farinelli, Alessandro

arXiv.org Artificial IntelligenceDec-23-2020

Partially Observable Monte-Carlo Planning (POMCP) is a powerful online algorithm able to generate approximate policies for large Partially Observable Markov Decision Processes. The online nature of this method supports scalability by avoiding complete policy representation. The lack of an explicit representation however hinders interpretability. In this work, we propose a methodology based on Satisfiability Modulo Theory (SMT) for analyzing POMCP policies by inspecting their traces, namely sequences of belief-action-observation triplets generated by the algorithm. The proposed method explores local properties of policy behavior to identify unexpected decisions. We propose an iterative process of trace analysis consisting of three main steps, i) the definition of a question by means of a parametric logical formula describing (probabilistic) relationships between beliefs and actions, ii) the generation of an answer by computing the parameters of the logical formula that maximize the number of satisfied clauses (solving a MAX-SMT problem), iii) the analysis of the generated logical formula and the related decision boundaries for identifying unexpected decisions made by POMCP with respect to the original question. We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation. Results show that the approach can exploit human knowledge on the domain, outperforming state-of-the-art anomaly detection methods in identifying unexpected decisions. An improvement of the Area Under Curve up to 47\% has been achieved in our tests.

formula, template, unexpected decision, (15 more...)

arXiv.org Artificial Intelligence

2012.12732

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > Quebec > Montreal (0.04)
Europe > Italy (0.04)
(5 more...)

Genre:

Research Report (0.84)
Workflow (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(2 more...)

Add feedback

QVMix and QVMix-Max: Extending the Deep Quality-Value Family of Algorithms to Cooperative Multi-Agent Reinforcement Learning

Leroy, Pascal, Ernst, Damien, Geurts, Pierre, Louppe, Gilles, Pisane, Jonathan, Sabatelli, Matthia

arXiv.org Artificial IntelligenceDec-22-2020

This paper introduces four new algorithms that can be used for tackling multi-agent reinforcement learning (MARL) problems occurring in cooperative settings. All algorithms are based on the Deep Quality-Value (DQV) family of algorithms, a set of techniques that have proven to be successful when dealing with single-agent reinforcement learning problems (SARL). The key idea of DQV algorithms is to jointly learn an approximation of the state-value function $V$, alongside an approximation of the state-action value function $Q$. We follow this principle and generalise these algorithms by introducing two fully decentralised MARL algorithms (IQV and IQV-Max) and two algorithms that are based on the centralised training with decentralised execution training paradigm (QVMix and QVMix-Max). We compare our algorithms with state-of-the-art MARL techniques on the popular StarCraft Multi-Agent Challenge (SMAC) environment. We show competitive results when QVMix and QVMix-Max are compared to well-known MARL techniques such as QMIX and MAVEN and show that QVMix can even outperform them on some of the tested environments, being the algorithm which performs best overall. We hypothesise that this is due to the fact that QVMix suffers less from the overestimation bias of the $Q$ function.

agent, algorithm, qvmix-max, (12 more...)

arXiv.org Artificial Intelligence

2012.12062

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
Europe > Belgium > Wallonia > Liège Province > Liège (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Learning to Play Imperfect-Information Games by Imitating an Oracle Planner

Boney, Rinu, Ilin, Alexander, Kannala, Juho, Seppänen, Jarno

arXiv.org Artificial IntelligenceDec-22-2020

We consider learning to play multiplayer imperfect-information games with simultaneous moves and large state-action spaces. Previous attempts to tackle such challenging games have largely focused on model-free learning methods, often requiring hundreds of years of experience to produce competitive agents. Our approach is based on model-based planning. We tackle the problem of partial observability by first building an (oracle) planner that has access to the full state of the environment and then distilling the knowledge of the oracle to a (follower) agent which is trained to play the imperfect-information game by imitating the oracle's choices. We experimentally show that planning with naive Monte Carlo tree search does not perform very well in large combinatorial action spaces. We therefore propose planning with a fixed-depth tree search and decoupled Thompson sampling for action selection. We show that the planner is able to discover efficient playing strategies in the games of Clash Royale and Pommerman and the follower policy successfully learns to implement them by training on a few hundred battles.

agent, algorithm, clash royale, (14 more...)

arXiv.org Artificial Intelligence

2012.12186

Country: Europe > Finland > Uusimaa > Helsinki (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Neural Methods for Effective, Efficient, and Exposure-Aware Information Retrieval

Mitra, Bhaskar

arXiv.org Artificial IntelligenceDec-21-2020

Neural networks with deep architectures have demonstrated significant performance improvements in computer vision, speech recognition, and natural language processing. The challenges in information retrieval (IR), however, are different from these other application areas. A common form of IR involves ranking of documents -- or short passages -- in response to keyword-based queries. Effective IR systems must deal with query-document vocabulary mismatch problem, by modeling relationships between different query and document terms and how they indicate relevance. Models should also consider lexical matches when the query contains rare terms -- such as a person's name or a product model number -- not seen during training, and to avoid retrieving semantically related but irrelevant results. In many real-life IR tasks, the retrieval involves extremely large collections -- such as the document index of a commercial Web search engine -- containing billions of documents. Efficient IR methods should take advantage of specialized IR data structures, such as inverted index, to efficiently retrieve from large collections. Given an information need, the IR system also mediates how much exposure an information artifact receives by deciding whether it should be displayed, and where it should be positioned, among other results. Exposure-aware IR systems may optimize for additional objectives, besides relevance, such as parity of exposure for retrieved items and content publishers. In this thesis, we present novel neural architectures and methods motivated by the specific needs and challenges of IR tasks.

deep learning, exposure-aware information retrieval, us government, (24 more...)

arXiv.org Artificial Intelligence

2012.11685

Country:

Africa (0.67)
North America > United States > Colorado (0.14)
North America > United States > New Mexico > Bernalillo County (0.14)
(10 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Experimental Study (0.92)

Industry:

Media > Music (1.00)
Leisure & Entertainment > Sports > Football (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
(8 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Add feedback

Scalable Deep Reinforcement Learning for Routing and Spectrum Access in Physical Layer

Cui, Wei, Yu, Wei

arXiv.org Artificial IntelligenceDec-21-2020

This paper proposes a novel and scalable reinforcement learning approach for simultaneous routing and spectrum access in wireless ad-hoc networks. In most previous works on reinforcement learning for network optimization, routing and spectrum access are tackled as separate tasks; further, the wireless links in the network are assumed to be fixed, and a different agent is trained for each transmission node -- this limits scalability and generalizability. In this paper, we account for the inherent signal-to-interference-plus-noise ratio (SINR) in the physical layer and propose a more scalable approach in which a single agent is associated with each flow. Specifically, a single agent makes all routing and spectrum access decisions as it moves along the frontier nodes of each flow. The agent is trained according to the physical layer characteristics of the environment using the future bottleneck SINR as a novel reward definition. This allows a highly effective routing strategy based on the geographic locations of the nodes in the wireless ad-hoc network. The proposed deep reinforcement learning strategy is capable of accounting for the mutual interference between the links. It learns to avoid interference by intelligently allocating spectrum slots and making routing decisions for the entire network in a scalable manner.

agent, frequency band, node, (17 more...)

arXiv.org Artificial Intelligence

2012.11783

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Telecommunications (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Offline Reinforcement Learning from Images with Latent Space Models

Rafailov, Rafael, Yu, Tianhe, Rajeswaran, Aravind, Finn, Chelsea

arXiv.org Artificial IntelligenceDec-21-2020

Offline reinforcement learning (RL) refers to the problem of learning policies from a static dataset of environment interactions. Offline RL enables extensive use and re-use of historical datasets, while also alleviating safety concerns associated with online exploration, thereby expanding the real-world applicability of RL. Most prior work in offline RL has focused on tasks with compact state representations. However, the ability to learn directly from rich observation spaces like images is critical for real-world applications such as robotics. In this work, we build on recent advances in model-based algorithms for offline RL, and extend them to high-dimensional visual observation spaces. Model-based offline RL algorithms have achieved state of the art results in state based tasks and have strong theoretical guarantees. However, they rely crucially on the ability to quantify uncertainty in the model predictions, which is particularly challenging with image observations. To overcome this challenge, we propose to learn a latent-state dynamics model, and represent the uncertainty in the latent space. Our approach is both tractable in practice and corresponds to maximizing a lower bound of the ELBO in the unknown POMDP. In experiments on a range of challenging image-based locomotion and manipulation tasks, we find that our algorithm significantly outperforms previous offline model-free RL methods as well as state-of-the-art online visual model-based RL methods. Moreover, we also find that our approach excels on an image-based drawer closing task on a real robot using a pre-existing dataset. All results including videos can be found online at https://sites.google.com/view/lompo/ .

algorithm, arxiv preprint arxiv, dataset, (11 more...)

arXiv.org Artificial Intelligence

2012.11547

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Add feedback

Spatial Monte Carlo Integration with Annealed Importance Sampling

Yasuda, Muneki, Sekimoto, Kaiji

arXiv.org Machine LearningDec-21-2020

Evaluating expectations on a pairwise Boltzmann machine (PBM) (or Ising model) is important for various applications, including the statistical machine learning. However, in general the evaluation is computationally difficult because it involves intractable multiple summations or integrations; therefore, it requires an approximation. Monte Carlo integration (MCI) is a well-known approximation method; a more effective MCI-like approximation method was proposed recently, called spatial Monte Carlo integration (SMCI). However, the estimations obtained from SMCI (and MCI) tend to perform poorly in PBMs with low temperature owing to degradation of the sampling quality. Annealed importance sampling (AIS) is a type of importance sampling based on Markov chain Monte Carlo methods, and it can suppress performance degradation in low temperature regions by the force of importance weights. In this study, a new method is proposed to evaluate the expectations on PBMs combining AIS and SMCI. The proposed method performs efficiently in both high- and low-temperature regions, which is theoretically and numerically demonstrated.

approximation, equation, smci, (14 more...)

arXiv.org Machine Learning

2012.11198

Country:

Asia > Japan (0.04)
North America > United States > Indiana > Hamilton County > Fishers (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)

Add feedback

Voronoi Progressive Widening: Efficient Online Solvers for Continuous Space MDPs and POMDPs with Provably Optimal Components

Lim, Michael H., Tomlin, Claire J., Sunberg, Zachary N.

arXiv.org Artificial IntelligenceDec-18-2020

Markov decision processes (MDPs) and partially observable MDPs (POMDPs) can effectively represent complex real-world decision and control problems. However, continuous space MDPs and POMDPs, i.e. those having continuous state, action and observation spaces, are extremely difficult to solve, and there are few online algorithms with convergence guarantees. This paper introduces Voronoi Progressive Widening (VPW), a general technique to modify tree search algorithms to effectively handle continuous or hybrid action spaces, and proposes and evaluates three continuous space solvers: VOSS, VOWSS, and VOMCPOW. VOSS and VOWSS are theoretical tools based on sparse sampling and Voronoi optimistic optimization designed to justify VPW-based online solvers. While previous algorithms have enjoyed convergence guarantees for problems with continuous state and observation spaces, VOWSS is the first with global convergence guarantees for problems that additionally have continuous action spaces. VOMCPOW is a versatile and efficient VPW-based algorithm that consistently outperforms POMCPOW and BOMCP in several simulation experiments.

pomdp, voss, vowss, (17 more...)

arXiv.org Artificial Intelligence

2012.1014

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
North America > United States > Colorado > Boulder County > Boulder (0.14)
North America > United States > Washington > King County > Seattle (0.04)
(5 more...)

Genre: Research Report (0.40)

Industry: Transportation (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.93)

Add feedback

Exact Reduction of Huge Action Spaces in General Reinforcement Learning

Majeed, Sultan Javed, Hutter, Marcus

arXiv.org Machine LearningDec-18-2020

The reinforcement learning (RL) framework formalizes the notion of learning with interactions. Many real-world problems have large state-spaces and/or action-spaces such as in Go, StarCraft, protein folding, and robotics or are non-Markovian, which cause significant challenges to RL algorithms. In this work we address the large action-space problem by sequentializing actions, which can reduce the action-space size significantly, even down to two actions at the expense of an increased planning horizon. We provide explicit and exact constructions and equivalence proofs for all quantities of interest for arbitrary history-based processes. In the case of MDPs, this could help RL algorithms that bootstrap. In this work we show how action-binarization in the non-MDP case can significantly improve Extreme State Aggregation (ESA) bounds. ESA allows casting any (non-MDP, non-ergodic, history-based) RL problem into a fixed-sized non-Markovian state-space with the help of a surrogate Markovian process. On the upside, ESA enjoys similar optimality guarantees as Markovian models do. But a downside is that the size of the aggregated state-space becomes exponential in the size of the action-space. In this work, we patch this issue by binarizing the action-space. We provide an upper bound on the number of states of this binarized ESA that is logarithmic in the original action-space size, a double-exponential improvement.

abstraction, agent, history, (16 more...)

arXiv.org Machine Learning

2012.102

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games > Computer Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback