AITopics | Undirected Networks

Collaborating Authors

Undirected Networks

News Overviews Instructional Materials AI-Alerts Classics

Probabilistic Planning with Reduced Models

Journal of Artificial Intelligence ResearchJul-9-2019

Reduced models are simplified versions of a given domain, designed to accelerate the planning process. Interest in reduced models has grown since the surprising success of determinization in the first international probabilistic planning competition, leading to the development of several enhanced determinization techniques. To address the drawbacks of previous determinization methods, we introduce a family of reduced models in which probabilistic outcomes are classified as one of two types: primary and exceptional. In each model that belongs to this family of reductions, primary outcomes can occur an unbounded number of times per trajectory, while exceptions can occur at most a finite number of times, specified by a parameter. Distinct reduced models are characterized by two parameters: the maximum number of primary outcomes per action, and the maximum number of occurrences of exceptions per trajectory. This family of reductions generalizes the well-known most-likely-outcome determinization approach, which includes one primary outcome per action and zero exceptional outcomes per plan. We present a framework to determine the benefits of planning with reduced models, and develop a continual planning approach that handles situations where the number of exceptions exceeds the specified bound during plan execution. Using this framework, we compare the performance of various reduced models and consider the challenge of generating good ones automatically. We show that each one of the dimensions---allowing more than one primary outcome or planning for some limited number of exceptions---could improve performance relative to standard determinization. The results place previous work on determinization in a broader context and lay the foundation for a systematic exploration of the space of model reductions.

determinization, primary outcome, proceedings, (14 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.11569

AI Access Foundation

11569

Journal of Artificial Intelligence Research

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Rhode Island > Providence County > Providence (0.04)
(22 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
(3 more...)

Add feedback

Variance-Based Risk Estimations in Markov Processes via Transformation with State Lumping

Ma, Shuai, Yu, Jia Yuan

arXiv.org Artificial IntelligenceJul-9-2019

Variance plays a crucial role in risk-sensitive reinforcement learning, and most risk measures can be analyzed via variance. In this paper, we consider two law-invariant risks as examples: mean-variance risk and exponential utility risk. With the aid of the state-augmentation transformation (SAT), we show that, the two risks can be estimated in Markov decision processes (MDPs) with a stochastic transition-based reward and a randomized policy. To relieve the enlarged state space, a novel definition of isotopic states is proposed for state lumping, considering the special structure of the transformed transition probability. In the numerical experiment, we illustrate state lumping in the SAT, errors from a naive reward simplification, and the validity of the SAT for the two risk estimations.

artificial intelligence, machine learning, markov process, (16 more...)

arXiv.org Artificial Intelligence

1907.05231

Country: North America (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Partially Observable Planning and Learning for Systems with Non-Uniform Dynamics

Collins, Nicholas, Kurniawati, Hanna

arXiv.org Artificial IntelligenceJul-9-2019

We propose a neural network architecture, called TransNet, that combines planning and model learning for solving Partially Observable Markov Decision Processes (POMDPs) with non-uniform system dynamics. The past decade has seen a substantial advancement in solving POMDP problems. However, constructing a suitable POMDP model remains difficult. Recently, neural network architectures have been proposed to alleviate the difficulty in acquiring such models. Although the results are promising, existing architectures restrict the type of system dynamics that can be learned --that is, system dynamics must be the same in all parts of the state space. TransNet relaxes such a restriction. Key to this relaxation is a novel neural network module that classifies the state space into classes and then learns the system dynamics of the different classes. TransNet uses this module together with the overall architecture of QMDP-Net[1] to allow solving POMDPs that have more expressive dynamic models, while maintaining efficient data requirement. Its evaluation on typical benchmarks in robot navigation with initially unknown system and environment models indicates that TransNet substantially out-performs the quality of the generated policies and learning efficiency of the state-of-the-art method QMDP-Net.

artificial intelligence, machine learning, transnet, (18 more...)

arXiv.org Artificial Intelligence

1907.04457

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

A Scheme for Dynamic Risk-Sensitive Sequential Decision Making

Ma, Shuai, Yu, Jia Yuan, Satir, Ahmet

arXiv.org Artificial IntelligenceJul-9-2019

We present a scheme for sequential decision making with a risk-sensitive objective and constraints in a dynamic environment. A neural network is trained as an approximator of the mapping from parameter space to space of risk and policy with risk-sensitive constraints. For a given risk-sensitive problem, in which the objective and constraints are, or can be estimated by, functions of the mean and variance of return, we generate a synthetic dataset as training data. Parameters defining a targeted process might be dynamic, i.e., they might vary over time, so we sample them within specified intervals to deal with these dynamics. We show that: i). Most risk measures can be estimated using return variance; ii). By virtue of the state-augmentation transformation, practical problems modeled by Markov decision processes with stochastic rewards can be solved in a risk-sensitive scenario; and iii). The proposed scheme is validated by a numerical experiment.

constraint, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

1907.04269

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)

Add feedback

Expressive power of tensor-network factorizations for probabilistic modeling, with applications from hidden Markov models to quantum machine learning

Glasser, Ivan, Sweke, Ryan, Pancotti, Nicola, Eisert, Jens, Cirac, J. Ignacio

arXiv.org Machine LearningJul-8-2019

Tensor-network techniques have enjoyed outstanding success in physics, and have recently attracted attention in machine learning, both as a tool for the formulation of new learning algorithms and for enhancing the mathematical understanding of existing methods. Inspired by these developments, and the natural correspondence between tensor networks and probabilistic graphical models, we provide a rigorous analysis of the expressive power of various tensor-network factorizations of discrete multivariate probability distributions. These factorizations include non-negative tensor-trains/MPS, which are in correspondence with hidden Markov models, and Born machines, which are naturally related to local quantum circuits. When used to model probability distributions, they exhibit tractable likelihoods and admit efficient learning algorithms. Interestingly, we prove that there exist probability distributions for which there are unbounded separations between the resource requirements of some of these tensor-network factorizations. Particularly surprising is the fact that using complex instead of real tensors can lead to an arbitrarily large reduction in the number of parameters of the network. Additionally, we introduce locally purified states (LPS), a new factorization inspired by techniques for the simulation of quantum systems, with provably better expressive power than all other representations considered. The ramifications of this result are explored through numerical experiments. Our findings imply that LPS should be considered over hidden Markov models, and furthermore provide guidelines for the design of local quantum circuits for probabilistic modeling.

artificial intelligence, machine learning, tensor, (18 more...)

arXiv.org Machine Learning

1907.03741

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > New York > Richmond County > New York City (0.04)
North America > United States > New York > Queens County > New York City (0.04)
(13 more...)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Routine Modeling with Time Series Metric Learning

Compagnon, Paul, Lefebvre, Grégoire, Duffner, Stefan, Garcia, Christophe

arXiv.org Artificial IntelligenceJul-8-2019

Traditionally, the automatic recognition of human activities is performed with supervised learning algorithms on limited sets of specific activities. This work proposes to recognize recurrent activity patterns, called routines, instead of precisely defined activities. The modeling of routines is defined as a metric learning problem, and an architecture, called SS2S, based on sequence-to-sequence models is proposed to learn a distance between time series. This approach only relies on inertial data and is thus non intrusive and preserves privacy. Experimental results show that a clustering algorithm provided with the learned distance is able to recover daily routines.

artificial intelligence, machine learning, sequence, (15 more...)

arXiv.org Artificial Intelligence

1907.04666

Country: Europe > France (0.28)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine (0.46)
Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Variational Inference MPC for Bayesian Model-based Reinforcement Learning

Okada, Masashi, Taniguchi, Tadahiro

arXiv.org Machine LearningJul-7-2019

In recent studies on model-based reinforcement learning (MBRL), incorporating uncertainty in forward dynamics is a state-of-the-art strategy to enhance learning performance, making MBRLs competitive to cutting-edge model free methods, especially in simulated robotics tasks. Probabilistic ensembles with trajectory sampling (PETS) is a leading type of MBRL, which employs Bayesian inference to dynamics modeling and model predictive control (MPC) with stochastic optimization via the cross entropy method (CEM). In this paper, we propose a novel extension to the uncertainty-aware MBRL. Our main contributions are twofold: Firstly, we introduce a variational inference MPC, which reformulates various stochastic methods, including CEM, in a Bayesian fashion. Secondly, we propose a novel instance of the framework, called probabilistic action ensembles with trajectory sampling (PaETS). As a result, our Bayesian MBRL can involve multimodal uncertainties both in dynamics and optimal trajectories. In comparison to PETS, our method consistently improves asymptotic performance on several challenging locomotion tasks.

downstream oil & gas, neural network, trajectory, (19 more...)

arXiv.org Machine Learning

1907.04202

Genre:

Research Report > New Finding (0.94)
Research Report > Promising Solution (0.66)

Industry: Energy > Oil & Gas > Downstream (0.85)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
(2 more...)

Add feedback

Learning Neural Sequence-to-Sequence Models from Weak Feedback with Bipolar Ramp Loss

Jehl, Laura, Lawrence, Carolin, Riezler, Stefan

arXiv.org Machine LearningJul-6-2019

In many machine learning scenarios, supervision by gold labels is not available and consequently neural models cannot be trained directly by maximum likelihood estimation (MLE). In a weak supervision scenario, metric-augmented objectives can be employed to assign feedback to model outputs, which can be used to extract a supervision signal for training. We present several objectives for two separate weakly supervised tasks, machine translation and semantic parsing. We show that objectives should actively discourage negative outputs in addition to promoting a surrogate gold structure. This notion of bipolarity is naturally present in ramp loss objectives, which we adapt to neural models. We show that bipolar ramp loss objectives outperform other non-bipolar ramp loss objectives and minimum risk training (MRT) on both weakly supervised tasks, as well as on a supervised machine translation task. Additionally, we introduce a novel token-level ramp loss objective, which is able to outperform even the best sequence-level ramp loss on both weakly supervised tasks.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Machine Learning

1907.03748

Country:

North America > United States > Texas > Travis County > Austin (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.05)
Oceania > Australia > New South Wales > Sydney (0.04)
(20 more...)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.89)
(3 more...)

Add feedback

Modern Deep Reinforcement Learning Algorithms

Ivanov, Sergey, D'yakonov, Alexander

arXiv.org Artificial IntelligenceJul-6-2019

Recent advances in Reinforcement Learning, grounded on combining classical theoretical results with Deep Learning paradigm, led to breakthroughs in many artificial intelligence tasks and gave birth to Deep Reinforcement Learning (DRL) as a field of research. In this work latest DRL algorithms are reviewed with a focus on their theoretical justification, practical limitations and observed empirical properties.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1906.10025

Genre: Research Report (0.81)

Industry: Leisure & Entertainment > Games > Computer Games (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Add feedback

A Communication-Efficient Multi-Agent Actor-Critic Algorithm for Distributed Reinforcement Learning

Lin, Yixuan, Zhang, Kaiqing, Yang, Zhuoran, Wang, Zhaoran, Başar, Tamer, Sandhu, Romeil, Liu, Ji

arXiv.org Machine LearningJul-5-2019

Recently, there has been increasing interest in developing distributed machine learning algorithms. Notable examples include distributed linear regression [1], multi-arm bandit [2], reinforcement learning (RL) [3], and deep learning [4]. Such algorithms have promising applications in large-scale networks, such as social platforms, online economic networks, cyber-physical systems, and Internet of Things, primarily because in such a complex network, it is impossible to collect all the information at the same point and each component of the network may not be willing to share its private information due to privacy issues. Multi-agent reinforcement learning (MARL) problems have recently received increasing attention. In general, MARL problems are investigated in settings that are either collaborative, competitive, or a mixture of the two. For collaborative MARL, the most rudimentary framework is the canonical multi-agent Markov decision process [5, 6], where the agents share a common reward function that is determined by the joint actions of all agents. Another notable framework for collaborative MARL is the team Markov game model, also with a shared reward function among agents [7, 8]. These two frameworks were then extended to the setting where agents are allowed to have heterogeneous reward functions[3,9-12], collaborating with the goal of maximizing the long-term return corresponding to the team averaged reward.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1907.03053

Country:

North America > United States > New York > Suffolk County > Stony Brook (0.04)
North America > United States > Illinois (0.04)

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback