AITopics | Markov Models

Collaborating Authors

Markov Models

News Overviews Instructional Materials AI-Alerts Classics

Model-Based Episodic Memory Induces Dynamic Hybrid Controls

Le, Hung, George, Thommen Karimpanal, Abdolshah, Majid, Tran, Truyen, Venkatesh, Svetha

arXiv.org Artificial IntelligenceNov-6-2021

Episodic control enables sample efficiency in reinforcement learning by recalling past experiences from an episodic memory. We propose a new model-based episodic memory of trajectories addressing current limitations of episodic control. Our memory estimates trajectory values, guiding the agent towards good policies. Built upon the memory, we construct a complementary learning model via a dynamic hybrid control unifying model-based, episodic and habitual learning into a single architecture. Experiments demonstrate that our model allows significantly faster and better learning than other strong reinforcement learning agents across a variety of environments including stochastic and non-Markovian settings.

episodic memory, mbec, trajectory, (16 more...)

arXiv.org Artificial Intelligence

2111.02104

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Consumer Health (0.93)
Leisure & Entertainment > Games > Computer Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Learning to Cooperate with Unseen Agent via Meta-Reinforcement Learning

Charakorn, Rujikorn, Manoonpong, Poramate, Dilokthanakul, Nat

arXiv.org Artificial IntelligenceNov-5-2021

Ad hoc teamwork problem describes situations where an agent has to cooperate with previously unseen agents to achieve a common goal. For an agent to be successful in these scenarios, it has to have a suitable cooperative skill. One could implement cooperative skills into an agent by using domain knowledge to design the agent's behavior. However, in complex domains, domain knowledge might not be available. Therefore, it is worthwhile to explore how to directly learn cooperative skills from data. In this work, we apply meta-reinforcement learning (meta-RL) formulation in the context of the ad hoc teamwork problem. Our empirical results show that such a method could produce robust cooperative agents in two cooperative environments with different cooperative circumstances: social compliance and language interpretation. (This is a full paper of the extended abstract version.)

agent, meta-rl agent, training partner, (13 more...)

arXiv.org Artificial Intelligence

2111.03431

Country: Asia > Thailand > Rayong > Rayong (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

Learning for Structured Prediction

#artificialintelligenceNov-3-2021, 12:40:50 GMT

Structured prediction is the main term for supervised machine learning techniques. Those techniques are involved predicting structured objects, instead of scalar discrete or real values. Structured prediction models are normally trained by means of observed data. In which the true value is used to regulate model parameters similar to usually used supervised learning techniques. The process of prediction using a trained model and of training the aforementioned is frequently computationally infeasible.

natural language processing, prediction, representation, (13 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.72)

Add feedback

Speech Recognition Transformation

#artificialintelligenceNov-2-2021, 15:35:08 GMT

Voice technology has reached maturity. The quality of speech recognition surpassed 95 percent accuracy in 2020. That is the same quality as normal communication between human beings. And the influence is now being felt. The modern Microsoft Windows update vigorously pushes its voice feature -- a mechanism that allows the user to dictate messages at the speed of normal speech, which is four times faster than typing.

markov model, recognition, speech recognition, (13 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

Efficient Learning of the Parameters of Non-Linear Models using Differentiable Resampling in Particle Filters

Rosato, Conor, Horridge, Paul, Schön, Thomas B., Maskell, Simon

arXiv.org Machine LearningNov-2-2021

It has been widely documented that the sampling and resampling steps in particle filters cannot be differentiated. The {\itshape reparameterisation trick} was introduced to allow the sampling step to be reformulated into a differentiable function. We extend the {\itshape reparameterisation trick} to include the stochastic input to resampling therefore limiting the discontinuities in the gradient calculation after this step. Knowing the gradients of the prior and likelihood allows us to run particle Markov Chain Monte Carlo (p-MCMC) and use the No-U-Turn Sampler (NUTS) as the proposal when estimating parameters. We compare the Metropolis-adjusted Langevin algorithm (MALA), Hamiltonian Monte Carlo with different number of steps and NUTS. We consider two state-space models and show that NUTS improves the mixing of the Markov chain and can produce more accurate results in less computational time.

derivative, particle filter, proposal, (13 more...)

arXiv.org Machine Learning

2111.01409

Country:

Europe > United Kingdom > England > Merseyside > Liverpool (0.04)
Europe > Sweden > Uppsala County > Uppsala (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.55)

Add feedback

A Review of Dialogue Systems: From Trained Monkeys to Stochastic Parrots

Patlan, Atharv Singh, Tripathi, Shiven, Korde, Shubham

arXiv.org Artificial IntelligenceNov-2-2021

In spoken dialogue systems, we aim to deploy artificial intelligence to build automated dialogue agents that can converse with humans. Dialogue systems are increasingly being designed to move beyond just imitating conversation and also improve from such interactions over time. In this survey, we present a broad overview of methods developed to build dialogue systems over the years. Different use cases for dialogue systems ranging from task-based systems to open domain chatbots motivate and necessitate specific systems. Starting from simple rule-based systems, research has progressed towards increasingly complex architectures trained on a massive corpus of datasets, like deep learning systems. Motivated with the intuition of resembling human dialogues, progress has been made towards incorporating emotions into the natural language generator, using reinforcement learning. While we see a trend of highly marginal improvement on some metrics, we find that limited justification exists for the metrics, and evaluation practices are not uniform. To conclude, we flag these concerns and highlight possible research directions.

computational linguistic, dialogue system, proceedings, (13 more...)

arXiv.org Artificial Intelligence

2111.01414

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Portugal > Lisbon > Lisbon (0.04)
(9 more...)

Genre: Overview (1.00)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(4 more...)

Add feedback

Learning to Explore by Reinforcement over High-Level Options

Juncheng, Liu, Brendan, McCane, Steven, Mills

arXiv.org Artificial IntelligenceNov-2-2021

Autonomous 3D environment exploration is a fundamental task for various applications such as navigation. The goal of exploration is to investigate a new environment and build its occupancy map efficiently. In this paper, we propose a new method which grants an agent two intertwined options of behaviors: "look-around" and "frontier navigation". This is implemented by an option-critic architecture and trained by reinforcement learning algorithms. In each timestep, an agent produces an option and a corresponding action according to the policy. We also take advantage of macro-actions by incorporating classic path-planning techniques to increase training efficiency. We demonstrate the effectiveness of the proposed method on two publicly available 3D environment datasets and the results show our method achieves higher coverage than competing techniques with better efficiency.

agent, exploration, trajectory, (14 more...)

arXiv.org Artificial Intelligence

2111.01364

Country: Oceania > New Zealand > South Island > Otago > Dunedin (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Settling the Horizon-Dependence of Sample Complexity in Reinforcement Learning

Li, Yuanzhi, Wang, Ruosong, Yang, Lin F.

arXiv.org Artificial IntelligenceOct-31-2021

Reinforcement learning (RL) is one of the most important paradigms in machine learning. What makes RL different from other paradigms is that it models the long-term effects in decision-making problems. For instance, in a finite-horizon Markov decision process (MDP), which is one of the most fundamental models for RL, an agent interacts with the environment for a total of H steps and receives a sequence of H random reward values, along with stochastic state transitions, as feedback. The goal of the agent is to find a policy to maximize the expected sum of these rewards values instead of any single one of them. Since decisions made at early stages could significantly impact the future, the agent must take possible future transitions into consideration when choosing the policy. On the other hand, when H 1, RL reduces to the contextual bandits problem in which it suffices to act myopically to achieve optimality. Due to the important role of the horizon length in RL, Jiang and Agarwal [JA18] propose to study how the sample complexity of RL depends on the horizon length. More formally, let us consider the episodic RL setting, where the horizon length is H and the underlying MDP has unknown and time invariant transition probabilities and rewards.

algorithm, probability, sample complexity, (12 more...)

arXiv.org Artificial Intelligence

2111.00633

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Add feedback

Fast Global Convergence of Policy Optimization for Constrained MDPs

Liu, Tao, Zhou, Ruida, Kalathil, Dileep, Kumar, P. R., Tian, Chao

arXiv.org Artificial IntelligenceOct-31-2021

We address the issue of safety in reinforcement learning. We pose the problem in a discounted infinite-horizon constrained Markov decision process framework. Existing results have shown that gradient-based methods are able to achieve an $\mathcal{O}(1/\sqrt{T})$ global convergence rate both for the optimality gap and the constraint violation. We exhibit a natural policy gradient-based algorithm that has a faster convergence rate $\mathcal{O}(\log(T)/T)$ for both the optimality gap and the constraint violation. When Slater's condition is satisfied and known a priori, zero constraint violation can be further guaranteed for a sufficiently large $T$ while maintaining the same convergence rate.

algorithm, constraint violation, convergence rate, (12 more...)

arXiv.org Artificial Intelligence

2111.00552

Country:

North America > United States > Texas > Brazos County > College Station (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.81)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Energy > Renewable (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Intrusion Prevention through Optimal Stopping

Hammar, Kim, Stadler, Rolf

arXiv.org Artificial IntelligenceOct-30-2021

We study automated intrusion prevention using reinforcement learning. Following a novel approach, we formulate the problem of intrusion prevention as an (optimal) multiple stopping problem. This formulation gives us insight into the structure of optimal policies, which we show to have threshold properties. For most practical cases, it is not feasible to obtain an optimal defender policy using dynamic programming. We therefore develop a reinforcement learning approach to approximate an optimal policy. Our method for learning and validating policies includes two systems: a simulation system where defender policies are incrementally learned and an emulation system where statistics are produced that drive simulation runs and where learned policies are evaluated. We show that our approach can produce effective defender policies for a practical IT infrastructure of limited size. Inspection of the learned policies confirms that they exhibit threshold properties.

infrastructure, intrusion, reinforcement, (14 more...)

arXiv.org Artificial Intelligence

2111.00289

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.96)

Add feedback