cesa-bianchi
Country:
- North America > United States > New York > New York County > New York City (0.05)
- North America > United States > Illinois > Cook County > Chicago (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Country:
- Europe > Italy > Lombardy > Milan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > Canada > Ontario > National Capital Region > Ottawa (0.04)
- Europe > Italy > Liguria > Genoa (0.04)
Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.46)
Country:
- Europe > Italy > Lombardy > Milan (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > Canada > Ontario > National Capital Region > Ottawa (0.04)
- Europe > Italy > Liguria > Genoa (0.04)
Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.97)
- Information Technology > Data Science > Data Mining > Big Data (0.47)
Country:
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Hungary > Győr-Moson-Sopron County > Győr (0.04)
Country:
- Asia > China > Guangdong Province > Shenzhen (0.05)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (4 more...)
Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)
Online EXP3 Learning in Adversarial Bandits with Delayed Feedback
Ilai Bistritz, Zhengyuan Zhou, Xi Chen, Nicholas Bambos, Jose Blanchet
Consider a player that in each of T rounds chooses one of K arms. An adversary chooses the cost of each arm in a bounded interval, and a sequence of feedback delays {dt} that are unknown to the player. After picking arm at at round t, the player receives the cost of playing this arm dt rounds later. In cases where t + dt > T, this feedback is simply missing.
Country:
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > France (0.04)
Country:
- North America > United States (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Ireland (0.04)
- Africa > South Sudan > Equatoria > Central Equatoria > Juba (0.04)
Technology:
Country:
- Europe > Italy (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
ANear-OptimalBest-of-Both-WorldsAlgorithm forOnlineLearningwithFeedbackGraphs
We present a computationally efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both stochastic and adversarial environments. The bound against oblivious adversaries is O( αT), where T is the time horizon andα is the independence number of the feedback graph.
Country:
- Europe > Italy (0.04)
- Europe > Denmark (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Country:
- Asia > Middle East > Jordan (0.04)
- North America > United States > Nevada (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (2 more...)
Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.34)