Follow-the-Perturbed-LeaderforAdversarialMarkov DecisionProcesseswithBanditFeedback

Feb-8-2026, 18:28:24 GMT–Neural Information Processing Systems

We consider regret minimization for Adversarial Markov Decision Processes (AMDPs), where the loss functions are changing over time and adversarially chosen, and the learner only observesthe losses for the visited state-action pairs (i.e., bandit feedback).

artificial intelligence, jinetal, machine learning, (17 more...)

Neural Information Processing Systems

Feb-8-2026, 18:28:24 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (0.49)
  - Representation & Reasoning (0.46)

Duplicate Docs Excel Report

Title
4a5c76c63f83ea45fb136d62db6c7104-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found