AITopics | Markov Models

Collaborating Authors

Markov Models

News Overviews Instructional Materials AI-Alerts Classics

Safe Policy Improvement with Soft Baseline Bootstrapping

Nadjahi, Kimia, Laroche, Romain, Combes, Rémi Tachet des

arXiv.org Artificial IntelligenceJul-11-2019

Batch Reinforcement Learning (Batch RL) consists in training a policy using trajectories collected with another policy, called the behavioural policy. Safe policy improvement (SPI) provides guarantees with high probability that the trained policy performs better than the behavioural policy, also called baseline in this setting. Previous work shows that the SPI objective improves mean performance as compared to using the basic RL objective, which boils down to solving the MDP with maximum likelihood. Here, we build on that work and improve more precisely the SPI with Baseline Bootstrapping algorithm (SPIBB) by allowing the policy search over a wider set of policies. Instead of binarily classifying the state-action pairs into two sets (the \textit{uncertain} and the \textit{safe-to-train-on} ones), we adopt a softer strategy that controls the error in the value estimates by constraining the policy change according to the local model uncertainty. The method can take more risks on uncertain actions all the while remaining provably-safe, and is therefore less conservative than the state-of-the-art methods. We propose two algorithms (one optimal and one approximate) to solve this constrained optimization problem and empirically show a significant improvement over existing SPI algorithms both on finite MDPs and on infinite MDPs with a neural network function approximation.

algorithm, baseline, safe policy improvement, (10 more...)

arXiv.org Artificial Intelligence

1907.05079

Country:

North America > United States > New York (0.04)
North America > Canada > Quebec > Montreal (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.62)

Add feedback

Adaptive Thompson Sampling Stacks for Memory Bounded Open-Loop Planning

Phan, Thomy, Gabor, Thomas, Müller, Robert, Roch, Christoph, Linnhoff-Popien, Claudia

arXiv.org Artificial IntelligenceJul-11-2019

We propose Stable Yet Memory Bounded Open-Loop (SYMBOL) planning, a general memory bounded approach to partially observable open-loop planning. SYMBOL maintains an adaptive stack of Thompson Sampling bandits, whose size is bounded by the planning horizon and can be automatically adapted according to the underlying domain without any prior domain knowledge beyond a generative model. We empirically test SYMBOL in four large POMDP benchmark problems to demonstrate its effectiveness and robustness w.r.t. the choice of hyperparameters and evaluate its adaptive memory consumption. We also compare its performance with other open-loop planning algorithms and POMCP.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

1907.05861

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)

Add feedback

Markov Decision Process for MOOC users behavioral inference

Jarboui, Firas, Gruson-daniel, Célya, Durmus, Alain, Rocchisani, Vincent, Ebongue, Sophie-helene Goulet, Depoux, Anneliese, Kirschenmann, Wilfried, Perchet, Vianney

arXiv.org Machine LearningJul-10-2019

Studies on massive open online courses (MOOCs) users discuss the existence of typical profiles and their impact on the learning process of the students. However defining the typical behaviors as well as classifying the users accordingly is a difficult task. In this paper we suggest two methods to model MOOC users behaviour given their log data. We mold their behavior into a Markov Decision Process framework. We associate the user's intentions with the MDP reward and argue that this allows us to classify them.

artificial intelligence, machine learning, markov decision process, (11 more...)

arXiv.org Machine Learning

1907.04723

Country: Europe > France (0.05)

Genre: Instructional Material > Online (0.93)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Improving the Performance of the LSTM and HMM Models via Hybridization

Liu, Larkin, Lin, Yu-Chung, Reid, Joshua

arXiv.org Machine LearningJul-9-2019

Language modelling has been an integral part of providing an understanding of the nature of language to capture its meaning. In order to improve the machine understanding of language using sequential models, we seek to explore two prominent areas of statistical language models, the Hidden Markov Model (HMM), and a Recurrent Neural Network (RNN) architecture, known commonly as Long Short-Term Memory (LSTM). Under a discrete stochastic modelling framework, HMM's were first introduced in Rabiner [1] to classify speech signals. First used to automate AT&T's voice activated call center, the revolutionary technology allowed computers to robustly characterise speech, and form a basic understanding of spoken words. HMM's have since become a definitive benchmark for the state-of-the-art for speech recognition, and text recognition. Around the same period, RNN's were introduced by Rumelhart et al. [2], however, the training complexity of the model was far too high and not commensurate with the hardware capabilities at the time. In the 21st century, With the introduction of more advanced hardware for deep learning model training, came a wave of applications for the RNN for both voice, text recognition, [3], [4], [5] and machine translation [6]. In parallel, an early form of neural language model was developed in Bengio et al. [7], displaying promising results in statistical language modelling. LSTM's were the first introduced in Hochreiter and Schmidhuber [8], specifically to combat the vanishing gradient problem, which will be further addressed in Section 1.2.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Machine Learning

1907.0467

Country:

North America > Canada > Ontario > Toronto (0.29)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)

Genre:

Overview (0.67)
Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.94)

Add feedback

Probabilistic Planning with Reduced Models

Pineda, Luis, Zilberstein, Shlomo

Journal of Artificial Intelligence ResearchJul-9-2019

Reduced models are simplified versions of a given domain, designed to accelerate the planning process. Interest in reduced models has grown since the surprising success of determinization in the first international probabilistic planning competition, leading to the development of several enhanced determinization techniques. To address the drawbacks of previous determinization methods, we introduce a family of reduced models in which probabilistic outcomes are classified as one of two types: primary and exceptional. In each model that belongs to this family of reductions, primary outcomes can occur an unbounded number of times per trajectory, while exceptions can occur at most a finite number of times, specified by a parameter. Distinct reduced models are characterized by two parameters: the maximum number of primary outcomes per action, and the maximum number of occurrences of exceptions per trajectory. This family of reductions generalizes the well-known most-likely-outcome determinization approach, which includes one primary outcome per action and zero exceptional outcomes per plan. We present a framework to determine the benefits of planning with reduced models, and develop a continual planning approach that handles situations where the number of exceptions exceeds the specified bound during plan execution. Using this framework, we compare the performance of various reduced models and consider the challenge of generating good ones automatically. We show that each one of the dimensions---allowing more than one primary outcome or planning for some limited number of exceptions---could improve performance relative to standard determinization. The results place previous work on determinization in a broader context and lay the foundation for a systematic exploration of the space of model reductions.

determinization, primary outcome, proceedings, (14 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.11569

AI Access Foundation

11569

Journal of Artificial Intelligence Research

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Rhode Island > Providence County > Providence (0.04)
(22 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
(3 more...)

Add feedback

Variance-Based Risk Estimations in Markov Processes via Transformation with State Lumping

Ma, Shuai, Yu, Jia Yuan

arXiv.org Artificial IntelligenceJul-9-2019

Variance plays a crucial role in risk-sensitive reinforcement learning, and most risk measures can be analyzed via variance. In this paper, we consider two law-invariant risks as examples: mean-variance risk and exponential utility risk. With the aid of the state-augmentation transformation (SAT), we show that, the two risks can be estimated in Markov decision processes (MDPs) with a stochastic transition-based reward and a randomized policy. To relieve the enlarged state space, a novel definition of isotopic states is proposed for state lumping, considering the special structure of the transformed transition probability. In the numerical experiment, we illustrate state lumping in the SAT, errors from a naive reward simplification, and the validity of the SAT for the two risk estimations.

artificial intelligence, machine learning, markov process, (16 more...)

arXiv.org Artificial Intelligence

1907.05231

Country: North America (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Partially Observable Planning and Learning for Systems with Non-Uniform Dynamics

Collins, Nicholas, Kurniawati, Hanna

arXiv.org Artificial IntelligenceJul-9-2019

We propose a neural network architecture, called TransNet, that combines planning and model learning for solving Partially Observable Markov Decision Processes (POMDPs) with non-uniform system dynamics. The past decade has seen a substantial advancement in solving POMDP problems. However, constructing a suitable POMDP model remains difficult. Recently, neural network architectures have been proposed to alleviate the difficulty in acquiring such models. Although the results are promising, existing architectures restrict the type of system dynamics that can be learned --that is, system dynamics must be the same in all parts of the state space. TransNet relaxes such a restriction. Key to this relaxation is a novel neural network module that classifies the state space into classes and then learns the system dynamics of the different classes. TransNet uses this module together with the overall architecture of QMDP-Net[1] to allow solving POMDPs that have more expressive dynamic models, while maintaining efficient data requirement. Its evaluation on typical benchmarks in robot navigation with initially unknown system and environment models indicates that TransNet substantially out-performs the quality of the generated policies and learning efficiency of the state-of-the-art method QMDP-Net.

artificial intelligence, machine learning, transnet, (18 more...)

arXiv.org Artificial Intelligence

1907.04457

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

A Scheme for Dynamic Risk-Sensitive Sequential Decision Making

Ma, Shuai, Yu, Jia Yuan, Satir, Ahmet

arXiv.org Artificial IntelligenceJul-9-2019

We present a scheme for sequential decision making with a risk-sensitive objective and constraints in a dynamic environment. A neural network is trained as an approximator of the mapping from parameter space to space of risk and policy with risk-sensitive constraints. For a given risk-sensitive problem, in which the objective and constraints are, or can be estimated by, functions of the mean and variance of return, we generate a synthetic dataset as training data. Parameters defining a targeted process might be dynamic, i.e., they might vary over time, so we sample them within specified intervals to deal with these dynamics. We show that: i). Most risk measures can be estimated using return variance; ii). By virtue of the state-augmentation transformation, practical problems modeled by Markov decision processes with stochastic rewards can be solved in a risk-sensitive scenario; and iii). The proposed scheme is validated by a numerical experiment.

constraint, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

1907.04269

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)

Add feedback

Expressive power of tensor-network factorizations for probabilistic modeling, with applications from hidden Markov models to quantum machine learning

Glasser, Ivan, Sweke, Ryan, Pancotti, Nicola, Eisert, Jens, Cirac, J. Ignacio

arXiv.org Machine LearningJul-8-2019

Tensor-network techniques have enjoyed outstanding success in physics, and have recently attracted attention in machine learning, both as a tool for the formulation of new learning algorithms and for enhancing the mathematical understanding of existing methods. Inspired by these developments, and the natural correspondence between tensor networks and probabilistic graphical models, we provide a rigorous analysis of the expressive power of various tensor-network factorizations of discrete multivariate probability distributions. These factorizations include non-negative tensor-trains/MPS, which are in correspondence with hidden Markov models, and Born machines, which are naturally related to local quantum circuits. When used to model probability distributions, they exhibit tractable likelihoods and admit efficient learning algorithms. Interestingly, we prove that there exist probability distributions for which there are unbounded separations between the resource requirements of some of these tensor-network factorizations. Particularly surprising is the fact that using complex instead of real tensors can lead to an arbitrarily large reduction in the number of parameters of the network. Additionally, we introduce locally purified states (LPS), a new factorization inspired by techniques for the simulation of quantum systems, with provably better expressive power than all other representations considered. The ramifications of this result are explored through numerical experiments. Our findings imply that LPS should be considered over hidden Markov models, and furthermore provide guidelines for the design of local quantum circuits for probabilistic modeling.

artificial intelligence, machine learning, tensor, (18 more...)

arXiv.org Machine Learning

1907.03741

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > New York > Richmond County > New York City (0.04)
North America > United States > New York > Queens County > New York City (0.04)
(13 more...)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Routine Modeling with Time Series Metric Learning

Compagnon, Paul, Lefebvre, Grégoire, Duffner, Stefan, Garcia, Christophe

arXiv.org Artificial IntelligenceJul-8-2019

Traditionally, the automatic recognition of human activities is performed with supervised learning algorithms on limited sets of specific activities. This work proposes to recognize recurrent activity patterns, called routines, instead of precisely defined activities. The modeling of routines is defined as a metric learning problem, and an architecture, called SS2S, based on sequence-to-sequence models is proposed to learn a distance between time series. This approach only relies on inertial data and is thus non intrusive and preserves privacy. Experimental results show that a clustering algorithm provided with the learned distance is able to recover daily routines.

artificial intelligence, machine learning, sequence, (15 more...)

arXiv.org Artificial Intelligence

1907.04666

Country: Europe > France (0.28)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine (0.46)
Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback