Online learning with graph-structured feedback against adaptive adversaries

Apr-1-2018–arXiv.org Machine Learning

We derive upper and lower bounds for the policy regret of $T$-round online learning problems with graph-structured feedback, where the adversary is nonoblivious but assumed to have a bounded memory. We obtain upper bounds of $\widetilde O(T^{2/3})$ and $\widetilde O(T^{3/4})$ for strongly-observable and weakly-observable graphs, respectively, based on analyzing a variant of the Exp3 algorithm. When the adversary is allowed a bounded memory of size 1, we show that a matching lower bound of $\widetilde\Omega(T^{2/3})$ is achieved in the case of full-information feedback. We also study the particular loss structure of an oblivious adversary with switching costs, and show that in such a setting, non-revealing strongly-observable feedback graphs achieve a lower bound of $\widetilde\Omega(T^{2/3})$, as well.

adversary, computer based training, educational technology, (23 more...)

arXiv.org Machine Learning

Apr-1-2018

arXiv.org PDF

Add feedback

Country:
- North America > United States > Wisconsin > Dane County > Madison (0.14)

Genre:
- Research Report (0.40)

Industry:
- Education > Educational Setting > Online (0.71)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning (1.00)
  - Enterprise Applications > Human Resources
    - Learning Management (0.71)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found