cost function
Regret Analysis for Continuous Dueling Bandit
The dueling bandit is a learning framework where the feedback information in the learning process is restricted to noisy comparison between a pair of actions. In this paper, we address a dueling bandit problem based on a cost function over a continuous space. We propose a stochastic mirror descent algorithm and show that the algorithm achieves an $O(\sqrt{T\log T})$-regret bound under strong convexity and smoothness assumptions for the cost function. Then, we clarify the equivalence between regret minimization in dueling bandit and convex optimization for the cost function. Moreover, considering a lower bound in convex optimization, it is turned out that our algorithm achieves the optimal convergence rate in convex optimization and the optimal regret in dueling bandit except for a logarithmic factor.
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.83)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.65)
- North America > United States > California > Santa Clara County > Mountain View (0.04)
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > Canada (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.50)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.40)
Safety through feedback in Constrained RL
This feedback can be system generated or elicited from a human observing the training process. Previous approaches have not been able to scale to complex environments and are constrained to receiving feedback at the state level which can be expensive to collect. To this end, we introduce an approach that scales to more complex domains and extends beyond state-level feedback, thus, reducing the burden on the evaluator.
- Asia > Singapore (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (2 more...)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > France (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Law (1.00)
- Information Technology > Security & Privacy (0.92)
- Health & Medicine > Diagnostic Medicine > Imaging (0.46)
- Government > Regional Government > North America Government > United States Government (0.46)
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Asia > Japan > Kyūshū & Okinawa > Kyūshū > Fukuoka Prefecture > Fukuoka (0.04)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.67)
- Information Technology > Data Science (0.67)
- Europe > Kosovo > District of Gjilan > Kamenica (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (2 more...)
- Law (1.00)
- Banking & Finance (1.00)
- Information Technology > Security & Privacy (0.92)
- (2 more...)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Heilongjiang Province > Daqing (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)