Online Convex Optimization in Adversarial Markov Decision Processes

May-19-2019–arXiv.org Machine Learning

We consider online learning in episodic loopfree We propose a novel algorithm for the adversarial MDP Markov decision processes (MDPs), where model where the transition function is unknown to the the loss function can change arbitrarily between learner and the losses change arbitrarily over time. Our episodes, and the transition function is not known algorithm, UC-O-REPS, uses two important ingredients, to the learner. We show Õ(L X A T) regret the first is Online Mirror Descent (OMD) (Shalev-Shwartz, bound, where T is the number of episodes, X 2012) and the second is UCRL-2 (Auer et al., 2008). A is the state space, A is the action space, and L major challenge in this work is to handle convex performance is the length of each episode. Our online algorithm criteria, which model different ways of aggregating is implemented using entropic regularization the losses of each episode. In order to handle convex performance methodology, which allows to extend the criteria, we use the methodology of OMD, which is original adversarial MDP model to handle convex widely used for online convex optimization, and we implement performance criteria (different ways to aggregate it in the adversarial MDP setting. In order to overcome the losses of a single episode), as well as the unknown dynamics (stochastic transition function) improve previous regret bounds.

artificial intelligence, machine learning, online convex optimization, (13 more...)

arXiv.org Machine Learning

May-19-2019

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.46)
- Asia > Middle East
  - Israel (0.14)

Genre:
- Research Report (0.40)

Industry:
- Education (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Learning Graphical Models
    - Undirected Networks > Markov Models (0.61)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found