AITopics | full information case

Consider a player that in each of T rounds chooses one of K arms. An adversary chooses the cost of each arm in a bounded interval, and a sequence of feedback delays {dt} that are unknown to the player. After picking arm at at round t, the player receives the cost of playing this arm dt rounds later. In cases where t + dt > T, this feedback is simply missing.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > France (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback

Online Learning with Gaussian Payoffs and Side Observations

Yifan Wu, András György, Csaba Szepesvari

Neural Information Processing SystemsOct-2-2025, 10:47:43 GMT

We consider a sequential learning problem with Gaussian payoffs and side observations: after selecting an action i, the learner receives information about the payoff of every action j in the form of Gaussian observations whose mean is the same as the mean payoff, but the variance depends on the pair (i,j) (and may be infinite). The setup allows a more refined information transfer from one action to another than previous partial monitoring setups, including the recently introduced graph-structured feedback case. For the first time in the literature, we provide non-asymptotic problem-dependent lower bounds on the regret of any algorithm, which recover existing asymptotic problem-dependent lower bounds and finite-time minimax lower bounds available in the literature. We also provide algorithms that achieve the problem-dependent lower bound (up to some universal constant factor) or the minimax lower bounds (up to logarithmic factors).

algorithm, payoff, side observation, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry: Education > Educational Setting > Online (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.57)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.41)

Add feedback

Online Learning with Gaussian Payoffs and Side Observations Yifan Wu1 András György

Neural Information Processing SystemsMar-13-2024, 01:16:11 GMT

We consider a sequential learning problem with Gaussian payoffs and side observations: after selecting an action i, the learner receives information about the payoff of every action j in the form of Gaussian observations whose mean is the same as the mean payoff, but the variance depends on the pair (i, j) (and may be infinite). The setup allows a more refined information transfer from one action to another than previous partial monitoring setups, including the recently introduced graph-structured feedback case. For the first time in the literature, we provide non-asymptotic problem-dependent lower bounds on the regret of any algorithm, which recover existing asymptotic problem-dependent lower bounds and finitetime minimax lower bounds available in the literature. We also provide algorithms that achieve the problem-dependent lower bound (up to some universal constant factor) or the minimax lower bounds (up to logarithmic factors).

algorithm, payoff, side observation, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry: Education > Educational Setting > Online (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.57)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.41)

Add feedback

Online Learning with Gaussian Payoffs and Side Observations

Wu, Yifan, György, András, Szepesvari, Csaba

Neural Information Processing SystemsDec-31-2015

We consider a sequential learning problem with Gaussian payoffs and side information: after selecting an action $i$, the learner receives information about the payoff of every action $j$ in the form of Gaussian observations whose mean is the same as the mean payoff, but the variance depends on the pair $(i,j)$ (and may be infinite). The setup allows a more refined information transfer from one action to another than previous partial monitoring setups, including the recently introduced graph-structured feedback case. For the first time in the literature, we provide non-asymptotic problem-dependent lower bounds on the regret of any algorithm, which recover existing asymptotic problem-dependent lower bounds and finite-time minimax lower bounds available in the literature. We also provide algorithms that achieve the problem-dependent lower bound (up to some universal constant factor) or the minimax lower bounds (up to logarithmic factors).

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > Canada > Alberta (0.14)

Industry: Education > Educational Setting > Online (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.57)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.41)

Add feedback

Online Learning with Gaussian Payoffs and Side Observations

Wu, Yifan, György, András, Szepesvári, Csaba

arXiv.org Machine LearningOct-27-2015

We consider a sequential learning problem with Gaussian payoffs and side information: after selecting an action $i$, the learner receives information about the payoff of every action $j$ in the form of Gaussian observations whose mean is the same as the mean payoff, but the variance depends on the pair $(i,j)$ (and may be infinite). The setup allows a more refined information transfer from one action to another than previous partial monitoring setups, including the recently introduced graph-structured feedback case. For the first time in the literature, we provide non-asymptotic problem-dependent lower bounds on the regret of any algorithm, which recover existing asymptotic problem-dependent lower bounds and finite-time minimax lower bounds available in the literature. We also provide algorithms that achieve the problem-dependent lower bound (up to some universal constant factor) or the minimax lower bounds (up to logarithmic factors).

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

1510.08108

Genre: Research Report (0.40)

Industry: Education > Educational Setting > Online (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.55)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.40)

Add feedback

The Price of Bandit Information for Online Optimization

Dani, Varsha, Kakade, Sham M., Hayes, Thomas P.

Neural Information Processing SystemsDec-31-2008

We present sharp rates of convergence (with respect to additive regret) for both the full information setting (where the cost function is revealed at the end of each round) and the bandit setting (where only the scalar cost incurred is revealed). In particular, this paper is concerned with the price of bandit information, by which we mean the ratio of the best achievable regret in the bandit setting to that in the full-information setting.

algorithm, full information case, information case, (11 more...)

Neural Information Processing Systems

Country: North America > United States > Illinois > Cook County > Chicago (0.05)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)

Add feedback

The Price of Bandit Information for Online Optimization

Dani, Varsha, Kakade, Sham M., Hayes, Thomas P.

Neural Information Processing SystemsDec-31-2008

We present sharp rates of convergence (with respect to additive regret) for both the full information setting (where the cost function is revealed at the end of each round) and the bandit setting (where only the scalar cost incurred is revealed). In particular, this paper is concerned with the price of bandit information, by which we mean the ratio of the best achievable regret in the bandit setting to that in the full-information setting.

algorithm, full information case, information case, (11 more...)

Neural Information Processing Systems

Country: North America > United States > Illinois > Cook County > Chicago (0.05)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)

Add feedback

The Price of Bandit Information for Online Optimization

Dani, Varsha, Kakade, Sham M., Hayes, Thomas P.

Neural Information Processing SystemsDec-31-2008

We present sharp rates of convergence (with respect to additive regret) for both the full information setting (where the cost function is revealed at the end of each round) and the bandit setting (where only the scalar cost incurred is revealed). In particular, this paper is concerned with the price of bandit information, by which we mean the ratio of the best achievable regret in the bandit setting to that in the full-information setting.

algorithm, full information case, information case, (11 more...)

Neural Information Processing Systems

Country: North America > United States > Illinois > Cook County > Chicago (0.05)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)

Add feedback