Decision Variance in Online Learning

Vakili, Sattar, Boukouvalas, Alexis, Zhao, Qing

Mar-14-2019–arXiv.org Machine Learning

Online learning has traditionally focused on the expected rewards. In this paper, a risk-averse online learning problem under the performance measure of the mean-variance of the rewards is studied. Both the bandit and full information settings are considered. The performance of several existing policies is analyzed, and new fundamental limitations on risk-averse learning is established. In particular, it is shown that although a logarithmic distribution-dependent regret in time $T$ is achievable (similar to the risk-neutral problem), the worst-case (i.e. minimax) regret is lower bounded by $\Omega(T)$ (in contrast to the $\Omega(\sqrt{T})$ lower bound in the risk-neutral problem). This sharp difference from the risk-neutral counterpart is caused by the the variance in the player's decisions, which, while absent in the regret under the expected reward criterion, contributes to excess mean-variance due to the non-linearity of this risk measure. The role of the decision variance in regret performance reflects a risk-averse player's desire for robust decisions and outcomes.

artificial intelligence, machine learning, variance, (17 more...)

arXiv.org Machine Learning

Mar-14-2019

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)

Genre:
- Research Report (0.82)

Industry:
- Education > Educational Setting > Online (0.82)

Technology:
- Information Technology
  - Enterprise Applications > Human Resources
    - Learning Management (0.82)
  - Artificial Intelligence
    - Machine Learning (0.88)
    - Representation & Reasoning (0.67)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found