AITopics | Markov Models

Collaborating Authors

Markov Models

News Overviews Instructional Materials AI-Alerts Classics

Stochastic Approximation with Markov Noise: Analysis and applications in reinforcement learning

arXiv.org Machine LearningApr-7-2020

We present for the first time an asymptotic convergence analysis of two time-scale stochastic approximation driven by "controlled" Markov noise. In particular, the faster and slower recursions have non-additive controlled Markov noise components in addition to martingale difference noise. We analyze the asymptotic behavior of our framework by relating it to limiting differential inclusions in both time scales that are defined in terms of the ergodic occupation measures associated with the controlled Markov processes. Using a special case of our results, we present a solution to the off-policy convergence problem for temporal-difference learning with linear function approximation. We compile several aspects of the dynamics of stochastic approximation algorithms with Markov iterate-dependent noise when the iterates are not known to be stable beforehand. We achieve the same by extending the lock-in probability (i.e. the probability of convergence to a specific attractor of the limiting o.d.e. given that the iterates are in its domain of attraction after a sufficiently large number of iterations (say) n_0) framework to such recursions. We use these results to prove almost sure convergence of the iterates to the specified attractor when the iterates satisfy an "asymptotic tightness" condition. This, in turn, is shown to be useful in analyzing the tracking ability of general "adaptive" algorithms. Finally, we obtain the first informative error bounds on function approximation for the policy evaluation algorithm proposed by Basu et al. when the aim is to find the risk-sensitive cost represented using exponential utility. We show that this happens due to the absence of difference term in the earlier bound which is always present in all our bounds when the state space is large.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

2012.00805

Country:

North America > Canada > Alberta (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York (0.04)
(8 more...)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.88)

Add feedback

Guided Dialog Policy Learning without Adversarial Learning in the Loop

Li, Ziming, Lee, Sungjin, Peng, Baolin, Li, Jinchao, Shayandeh, Shahin, Gao, Jianfeng

arXiv.org Artificial IntelligenceApr-7-2020

Reinforcement-based training methods have emerged as the most popular choice to train an efficient and effective dialog policy. However, these methods are suffering from sparse and unstable reward signals usually returned from the user simulator at the end of the dialog. Besides, the reward signal is manually designed by human experts which requires domain knowledge. A number of adversarial learning methods have been proposed to learn the reward function together with the dialog policy. However, to alternatively update the dialog policy and the reward model on the fly, the algorithms to update the dialog policy are limited to policy gradient-based algorithms, such as REINFORCE and PPO. Besides, the alternative training of the dialog agent and the reward model can easily get stuck in local optimum or result in mode collapse. In this work, we propose to decompose the previous adversarial training into two different steps. We first train the discriminator with an auxiliary dialog generator and then incorporate this trained reward model to a common reinforcement learning method to train a high-quality dialog agent. This approach is applicable to both on-policy and off-policy reinforcement learning methods. By conducting several experiments, we show the proposed methods can achieve remarkable task success and its potential to transfer knowledge from existing domains to a new domain.

agent, discriminator, reward function, (16 more...)

arXiv.org Artificial Intelligence

2004.03267

Country: Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

GGA-MG: Generative Genetic Algorithm for Music Generation

Farzaneh, Majid, Toroghi, Rahil Mahdian

arXiv.org Artificial IntelligenceApr-7-2020

Music Generation (MG) is an interesting research topic that links the art of music and Artificial Intelligence (AI). The goal is to train an artificial composer to generate infinite, fresh, and pleasurable musical pieces. Music has different parts such as melody, harmony, and rhythm. In this paper, we propose a Generative Genetic Algorithm (GGA) to produce a melody automatically. The main GGA uses a Long Short-Term Memory (LSTM) recurrent neural network as the objective function, which should be trained by a spectrum of bad-to-good melodies. These melodies have to be provided by another GGA with a different objective function. Good melodies have been provided by CAMPINs collection. We have considered the rhythm in this work, too. The experimental results clearly show that the proposed GGA method is able to generate eligible melodies with natural transitions and without rhythm error.

algorithm, lstm network, melody, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.13140/RG.2.2.16677.24805

2004.04687

Country:

Europe > Spain > Castile and León > Salamanca Province > Salamanca (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)

Genre: Research Report (0.50)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Disentangled sticky hierarchical Dirichlet process hidden Markov model

Zhou, Ding, Gao, Yuanjun, Paninski, Liam

arXiv.org Machine LearningApr-6-2020

The Hierarchical Dirichlet Process Hidden Markov Model (HDP-HMM) has been used widely as a natural Bayesian nonparametric extension of the classical Hidden Markov Model for learning from sequential and time-series data. A sticky extension of the HDP-HMM has been proposed to strengthen the self-persistence probability in the HDP-HMM. However, the sticky HDP-HMM entangles the strength of the self-persistence prior and transition prior together, limiting its expressiveness. Here, we propose a more general model: the disentangled sticky HDP-HMM (DS-HDP-HMM). We develop novel Gibbs sampling algorithms for efficient inference in this model. We show that the disentangled sticky HDP-HMM outperforms the sticky HDP-HMM and HDP-HMM on both synthetic and real data, and apply the new approach to analyze neural data and segment behavioral video data.

artificial intelligence, hdp-hmm, machine learning, (13 more...)

arXiv.org Machine Learning

2004.03019

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)

Genre: Research Report > Experimental Study (0.46)

Industry: Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

On Quantum Computing and Artificial Intelligence

#artificialintelligenceApr-4-2020, 07:26:34 GMT

Mixing quantum computing and Artificial Intelligence (AI) may sound like a new buzzword. However, since quantum computing advances are hinting at profound changes in the very notions of computation, it is natural to reexamine various branches of computer science in the light of these disruptions. As usual, before entering the quantum realm, it is important to get an overview of the classical world. Artificial Intelligence is difficult to define. Probably because intelligence, by itself, is difficult to define.

algorithm, neural network, neuron, (13 more...)

#artificialintelligence

Country: Asia > Middle East > Israel (0.04)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Planning in Stochastic Environments with Goal Uncertainty

Saisubramanian, Sandhya, Wray, Kyle Hollins, Pineda, Luis, Zilberstein, Shlomo

arXiv.org Artificial IntelligenceApr-3-2020

We present the Goal Uncertain Stochastic Shortest Path (GUSSP) problem -- a general framework to model path planning and decision making in stochastic environments with goal uncertainty. The framework extends the stochastic shortest path (SSP) model to dynamic environments in which it is impossible to determine the exact goal states ahead of plan execution. GUSSPs introduce flexibility in goal specification by allowing a belief over possible goal configurations. The unique observations at potential goals helps the agent identify the true goal during plan execution. The partial observability is restricted to goals, facilitating the reduction to an SSP with a modified state space. We formally define a GUSSP and discuss its theoretical properties. We then propose an admissible heuristic that reduces the planning time using FLARES -- a start-of-the-art probabilistic planner. We also propose a determinization approach for solving this class of problems. Finally, we present empirical results on a search and rescue mobile robot and three other problem domains in simulation.

goal state, gussp, potential goal, (15 more...)

arXiv.org Artificial Intelligence

1810.08159

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > United States > California > Santa Clara County > Santa Clara (0.04)

Genre: Research Report (0.64)

Industry:

Automobiles & Trucks (0.93)
Transportation > Ground > Road (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)
(3 more...)

Add feedback

What Skills are AI Firms Expecting From its Employees? - SignitySolutions

#artificialintelligenceApr-2-2020, 03:06:02 GMT

Artificial Intelligence (AI) also known as machine learning has come a long way in the recent few years. Instead of being a subject of discussion, it has become a reality. There has been ready integration of AI across a large number of industries. This has given rise to several AI development companies across the world. These AI consulting firms offer services to their clients and help with the integration of AI in their operations.

ai development company, ai firm expecting, knowledge, (9 more...)

#artificialintelligence

Country: Asia > India (0.06)

Industry:

Professional Services (0.51)
Information Technology > Services (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.31)

Add feedback

Value Driven Representation for Human-in-the-Loop Reinforcement Learning

Keramati, Ramtin, Brunskill, Emma

arXiv.org Artificial IntelligenceApr-2-2020

Interactive adaptive systems powered by Reinforcement Learning (RL) have many potential applications, such as intelligent tutoring systems. In such systems there is typically an external human system designer that is creating, monitoring and modifying the interactive adaptive system, trying to improve its performance on the target outcomes. In this paper we focus on algorithmic foundation of how to help the system designer choose the set of sensors or features to define the observation space used by reinforcement learning agent. We present an algorithm, value driven representation (VDR), that can iteratively and adaptively augment the observation space of a reinforcement learning agent so that is sufficient to capture a (near) optimal policy. To do so we introduce a new method to optimistically estimate the value of a policy using offline simulated Monte Carlo rollouts. We evaluate the performance of our approach on standard RL benchmarks with simulated humans and demonstrate significant improvement over prior baselines.

algorithm, observation space, optimal policy, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3320435.3320471

2004.01223

Country:

North America > United States > California > Santa Clara County > Stanford (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Middle East > Cyprus > Larnaka > Larnaca (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry: Education > Educational Technology > Educational Software > Computer Based Training (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

Information State Embedding in Partially Observable Cooperative Multi-Agent Reinforcement Learning

Mao, Weichao, Zhang, Kaiqing, Miehling, Erik, Başar, Tamer

arXiv.org Artificial IntelligenceApr-2-2020

Multi-agent reinforcement learning (MARL) under partial observability has long been considered challenging, primarily due to the requirement for each agent to maintain a belief over all other agents' local histories -- a domain that generally grows exponentially over time. In this work, we investigate a partially observable MARL problem in which agents are cooperative. To enable the development of tractable algorithms, we introduce the concept of an information state embedding that serves to compress agents' histories. We quantify how the compression error influences the resulting value functions for decentralized control. Furthermore, we propose three natural embeddings, based on finite-memory truncation, principal component analysis, and recurrent neural networks. The output of these embeddings are then used as the information state, and can be fed into any MARL algorithm. The proposed embed-then-learn pipeline opens the black-box of existing MARL algorithms, allowing us to establish some theoretical guarantees (error bounds of value functions) while still achieving competitive performance with many end-to-end approaches.

agent, information, information state, (15 more...)

arXiv.org Artificial Intelligence

2004.01098

Country:

North America > United States > Illinois > Champaign County > Urbana (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry:

Information Technology (0.46)
Transportation (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.64)

Add feedback

Kernel autocovariance operators of stationary processes: Estimation and convergence

Mollenhauer, Mattes, Klus, Stefan, Schütte, Christof, Koltai, Péter

arXiv.org Machine LearningApr-2-2020

We consider autocovariance operators of a stationary stochastic process on a Polish space that is embedded into a reproducing kernel Hilbert space. We investigate how empirical estimates of these operators converge along realizations of the process under various conditions. In particular, we examine ergodic and strongly mixing processes and prove several asymptotic results as well as finite sample error bounds with a detailed analysis for the Gaussian kernel. We provide applications of our theory in terms of consistency results for kernel PCA with dependent data and the conditional mean embedding of transition probabilities. Finally, we use our approach to examine the nonparametric estimation of Markov transition operators and highlight how our theory can give a consistency analysis for a large family of spectral analysis methods including kernel-based dynamic mode decomposition.

application, autocovariance operator, operator, (15 more...)

arXiv.org Machine Learning

2004.00891

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Virginia > Arlington County > Arlington (0.04)
(5 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)
(2 more...)

Add feedback