AITopics | Markov Models

Collaborating Authors

Markov Models

News Overviews Instructional Materials AI-Alerts Classics

Reinforcement Learning for Learning of Dynamical Systems in Uncertain Environment: a Tutorial

arXiv.org Artificial IntelligenceMay-19-2019

In this paper, a review of model-free reinforcement learning for learning of dynamical systems in uncertain environments has discussed. For this purpose, the Markov Decision Process (MDP) will be reviewed. Furthermore, some learning algorithms such as Temporal Difference (TD) learning, Q-Learning, and Approximate Q-learning as model-free algorithms which constitute the main part of this article have been investigated, and benefits and drawbacks of each algorithm will be discussed. The discussed concepts in each section are explaining with details and examples.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

1905.07727

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Chess (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Add feedback

Online Convex Optimization in Adversarial Markov Decision Processes

Rosenberg, Aviv, Mansour, Yishay

arXiv.org Machine LearningMay-19-2019

We consider online learning in episodic loopfree We propose a novel algorithm for the adversarial MDP Markov decision processes (MDPs), where model where the transition function is unknown to the the loss function can change arbitrarily between learner and the losses change arbitrarily over time. Our episodes, and the transition function is not known algorithm, UC-O-REPS, uses two important ingredients, to the learner. We show Õ(L X A T) regret the first is Online Mirror Descent (OMD) (Shalev-Shwartz, bound, where T is the number of episodes, X 2012) and the second is UCRL-2 (Auer et al., 2008). A is the state space, A is the action space, and L major challenge in this work is to handle convex performance is the length of each episode. Our online algorithm criteria, which model different ways of aggregating is implemented using entropic regularization the losses of each episode. In order to handle convex performance methodology, which allows to extend the criteria, we use the methodology of OMD, which is original adversarial MDP model to handle convex widely used for online convex optimization, and we implement performance criteria (different ways to aggregate it in the adversarial MDP setting. In order to overcome the losses of a single episode), as well as the unknown dynamics (stochastic transition function) improve previous regret bounds.

artificial intelligence, machine learning, online convex optimization, (13 more...)

arXiv.org Machine Learning

1905.07773

Country:

North America > United States (0.46)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.40)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.61)

Add feedback

Evolving Rewards to Automate Reinforcement Learning

Faust, Aleksandra, Francis, Anthony, Mehta, Dar

arXiv.org Machine LearningMay-18-2019

Many continuous control tasks have easily formulated objectives, yet using them directly as a reward in reinforcement learning (RL) leads to suboptimal policies. Therefore, many classical control tasks guide RL training using complex rewards, which require tedious hand-tuning. We automate the reward search with AutoRL, an evolutionary layer over standard RL that treats reward tuning as hyperparameter optimization and trains a population of RL agents to find a reward that maximizes the task objective. AutoRL, evaluated on four Mujoco continuous control tasks over two RL algorithms, shows improvements over baselines, with the the biggest uplift for more complex tasks. The video can be found at: \url{https://youtu.be/svdaOFfQyC8}.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1905.07628

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(4 more...)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Optimizing Sequential Medical Treatments with Auto-Encoding Heuristic Search in POMDPs

Li, Luchen, Komorowski, Matthieu, Faisal, Aldo A.

arXiv.org Artificial IntelligenceMay-17-2019

Health-related data is noisy and stochastic in implying the true physiological states of patients, limiting information contained in single-moment observations for sequential clinical decision making. We model patient-clinician interactions as partially observable Markov decision processes (POMDPs) and optimize sequential treatment based on belief states inferred from history sequence. To facilitate inference, we build a variational generative model and boost state representation with a recurrent neural network (RNN), incorporating an auxiliary loss from sequence auto-encoding. Meanwhile, we optimize a continuous policy of drug levels with an actor-critic method where policy gradients are obtained from a stablized off-policy estimate of advantage function, with the value of belief state backed up by parallel best-first suffix trees. We exploit our methodology in optimizing dosages of vasopressor and intravenous fluid for sepsis patients using a retrospective intensive care dataset and evaluate the learned policy with off-policy policy evaluation (OPPE). The results demonstrate that modelling as POMDPs yields better performance than MDPs, and that incorporating heuristic search improves sample efficiency.

artificial intelligence, auto-encoding heuristic search, machine learning, (15 more...)

arXiv.org Artificial Intelligence

1905.07465

Country:

North America > United States (1.00)
Europe (0.67)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.67)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Exploration-Exploitation Trade-off in Reinforcement Learning on Online Markov Decision Processes with Global Concave Rewards

Cheung, Wang Chi

arXiv.org Machine LearningMay-15-2019

We consider an agent who is involved in a Markov decision process and receives a vector of outcomes every round. Her objective is to maximize a global concave reward function on the average vectorial outcome. The problem models applications such as multi-objective optimization, maximum entropy exploration, and constrained optimization in Markovian environments. In our general setting where a stationary policy could have multiple recurrent classes, the agent faces a subtle yet consequential trade-off in alternating among different actions for balancing the vectorial outcomes. In particular, stationary policies are in general sub-optimal. We propose a no-regret algorithm based on online convex optimization (OCO) tools (Agrawal and Devanur 2014) and UCRL2 (Jaksch et al. 2010). Importantly, we introduce a novel gradient threshold procedure, which carefully controls the switches among actions to handle the subtle trade-off. By delaying the gradient updates, our procedure produces a non-stationary policy that diversifies the outcomes for optimizing the objective. The procedure is compatible with a variety of OCO tools.

artificial intelligence, machine learning, toc-ucrl2, (17 more...)

arXiv.org Machine Learning

1905.06466

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(5 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Emergence in Multi-Agent Systems

Zhao, Yan (Dartmouth College) | Santos, Eugene (Dartmouth College)

AAAI ConferencesMay-15-2019

In a multiagent system or MAS, due to agent interactions, the agents as a group may make decisions that none of them would make alone; this phenomenon is called emergence. Emergence is characterized by an unanticipated system behavior caused by nonlinear interactions. This paper detects such emergence in a MAS by analyzing agent behaviors across two simple strategies. In the first strategy, agents make decisions based on the local information; in the second strategy, agents make decisions based on global information provided via communication. The proposed method identifies when and how nonlinear interactions cause behavior change, and quantitatively defines emergence based on the change in team performance. It then proves several theorems about emergence in a MAS. It also explores several emergence-related factors like the communication cost and the reward gap quantitatively. Experimental results on several benchmarks demonstrate the promising performance of the proposed framework in detecting emergence in a MAS.

artificial intelligence, emergence, machine learning, (15 more...)

AAAI Conferences

The Thirty-Second International Flairs Conference

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Maryland > Baltimore (0.04)
North America > Mexico (0.04)
Asia > India > Telangana > Hyderabad (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.50)

Add feedback

Autonomous Penetration Testing using Reinforcement Learning

Schwartz, Jonathon, Kurniawati, Hanna

arXiv.org Artificial IntelligenceMay-15-2019

Penetration testing (pentesting) involves performing a controlled attack on a computer system in order to assess it's security. Although an effective method for testing security, pentesting requires highly skilled practitioners and currently there is a growing shortage of skilled cyber security professionals. One avenue for alleviating this problem is automate the pentesting process using artificial intelligence techniques. Current approaches to automated pentesting have relied on model-based planning, however the cyber security landscape is rapidly changing making maintaining up-to-date models of exploits a challenge. This project investigated the application of model-free Reinforcement Learning (RL) to automated pentesting. Model-free RL has the key advantage over model-based planning of not requiring a model of the environment, instead learning the best policy through interaction with the environment. We first designed and built a fast, low compute simulator for training and testing autonomous pentesting agents. We did this by framing pentesting as a Markov Decision Process with the known configuration of the network as states, the available scans and exploits as actions, the reward determined by the value of machines on the network. We then used this simulator to investigate the application of model-free RL to pentesting. We tested the standard Q-learning algorithm using both tabular and neural network based implementations. We found that within the simulated environment both tabular and neural network implementations were able to find optimal attack paths for a range of different network topologies and sizes without having a model of action behaviour. However, the implemented algorithms were only practical for smaller networks and numbers of actions. Further work is needed in developing scalable RL algorithms and testing these algorithms in larger and higher fidelity environments.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

1905.05965

Country:

North America > United States (0.14)
Oceania > Australia > Queensland (0.04)

Genre: Research Report > Promising Solution (0.67)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Add feedback

Synthesis of Provably Correct Autonomy Protocols for Shared Control

Cubuktepe, Murat, Jansen, Nils, Alsiekh, Mohammed, Topcu, Ufuk

arXiv.org Artificial IntelligenceMay-15-2019

We synthesize shared control protocols subject to probabilistic temporal logic specifications. More specifically, we develop a framework in which a human and an autonomy protocol can issue commands to carry out a certain task. We blend these commands into a joint input to a robot. We model the interaction between the human and the robot as a Markov decision process (MDP) that represents the shared control scenario. Using inverse reinforcement learning, we obtain an abstraction of the human's behavior and decisions. We use randomized strategies to account for randomness in human's decisions, caused by factors such as complexity of the task specifications or imperfect interfaces. We design the autonomy protocol to ensure that the resulting robot behavior satisfies given safety and performance specifications in probabilistic temporal logic. Additionally, the resulting strategies generate behavior as similar to the behavior induced by the human's commands as possible. We solve the underlying problem efficiently using quasiconvex programming. Case studies involving autonomous wheelchair navigation and unmanned aerial vehicle mission planning showcase the applicability of our approach.

machine learning, reinforcement learning, specification, (19 more...)

arXiv.org Artificial Intelligence

1905.06471

Country:

North America > United States > Texas > Travis County > Austin (0.14)
Europe > Netherlands > Gelderland > Nijmegen (0.04)
North America > United States > Pennsylvania (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry:

Aerospace & Defense (0.49)
Government > Military (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

Integrating Typed Model Counting into First-Order Maximum Entropy Computations and the Connection to Markov Logic Networks

Wilhelm, Marco (TU Dortmund,) | Kern-Isberner, Gabriele (TU Dortmund,) | Finthammer, Marc (TU Dortmund,) | Beierle, Christoph (University of Hagen)

AAAI ConferencesMay-15-2019

The principle of maximum entropy (MaxEnt) provides a well-founded methodology for commonsense reasoning based on probabilistic conditional knowledge. We show how to calculate MaxEnt distributions in a first-order setting by using typed model counting and condensed iterative scaling. Further, we discuss the connection to Markov Logic Networks for drawing inferences.

const, formula, knowledge base, (14 more...)

AAAI Conferences

The Thirty-Second International Flairs Conference

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > North Rhine-Westphalia > Arnsberg Region > Dortmund (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Modeling the Dynamics of User Preferences for Sequence-Aware Recommendation Using Hidden Markov Models

Eskandanian, Farzad (DePaul University) | Mobasher, Bamshad (DePaul University)

AAAI ConferencesMay-15-2019

In a variety of online settings involving interaction with end-users it is critical for the systems to adapt to changes in user preference. User preferences on items tend to change over time due to a variety of factors such as change in context, the task being performed, or other short-term or long-term external factors. Recommender systems, in particular need to be able to capture these dynamics in user preferences in order to remain tuned to the most current interests of users. In this work we present a recommendation framework which takes into account the dynamics of user preferences. We propose an approach based on Hidden Markov Models (HMM) to identify change-points in the sequence of user interactions which reflect significant changes in preference according to the sequential behavior of all the users in the data. The proposed framework leverages the identified change points to generate recommendations using a sequence-aware non-negative matrix factorization model. We empirically demonstrate the effectiveness of the HMM-based change detection method as compared to standard baseline methods. Additionally, we evaluate the performance of the proposed recommendation method and show that it compares favorably to state-of-the-art sequence-aware recommendation models.

change point, recommendation, sequence, (14 more...)

AAAI Conferences

The Thirty-Second International Flairs Conference

Country: North America > United States > Illinois > Cook County > Chicago (0.04)

Industry:

Media > Music (0.48)
Leisure & Entertainment (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback