corral
- North America > Canada > Alberta (0.14)
- North America > United States > Massachusetts (0.04)
- North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
- (2 more...)
Supplement to " Model Selection in Contextual Stochastic Bandit Problems "
In Section D we present the proofs for Section 5.1 In Section H we show the proofs of the lower bounds in Section 6. We outline briefly some other direct applications of our results. CORRAL will achieve regret O ( p | L | dT) . B.1 Original Corral The original Corral algorithm [2] is reproduced below. We reproduce the EXP3.P algorithm (Figure 3.1 in [ 's expected replay regret satisfies: Therefore total regret is bounded by 6 U ( T,) log( T) D.2 Applications of Proposition 5.1 We now show that several algorithms are ( U,, T) bounded: Lemma D.2.
- North America > Canada > Alberta (0.14)
- North America > United States > Massachusetts (0.04)
- North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.48)
Multi-agent Reinforcement Learning Improvement in a Dynamic Environment Using Knowledge Transfer
Mahdavimoghaddam, Mahnoosh, Nikanjam, Amin, Abdoos, Monireh
Cooperative multi-agent systems are being widely used in variety of areas. Interaction between agents would bring positive points, including reducing costs of operating, high scalability, and facilitating parallel processing. These systems pave the way for handling large-scale, unknown, and dynamic environments. However, learning in these environments has become a prominent challenge in different applications. These challenges include the effect of size of search space on learning time, inappropriate cooperation among agents, and the lack of proper coordination among agents' decisions. Moreover, reinforcement learning algorithms may suffer from long time of convergence in these problems. In this paper, a communication framework using knowledge transfer concepts is introduced to address such challenges in the herding problem with large state space. To handle the problems of convergence, knowledge transfer has been utilized that can significantly increase the efficiency of reinforcement learning algorithms. Coordination between the agents is carried out through a head agent in each group of agents and a coordinator agent respectively. The results demonstrate that this framework could indeed enhance the speed of learning and reduce convergence time.
- Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > China (0.04)
Efficient Algorithms for Global Inference in Internet Marketplaces
Ramanath, Rohan, Keerthi, Sathiya, Pan, Yao, Salomatin, Konstantin, Basu, Kinjal
Matching demand to supply in internet marketplaces (e-commerce, ride-sharing, food delivery, professional services, advertising) is a global inference problem that can be formulated as a Linear Program (LP) with (millions of) coupling constraints and (up to a billion) non-coupling polytope constraints. Until recently, solving such problems on web-scale data with an LP formulation was intractable. Recent work (Basu et al., 2020) developed a dual decomposition-based approach to solve such problems when the polytope constraints are simple. In this work, we motivate the need to go beyond these simple polytopes and show real-world internet marketplaces that require more complex structured polytope constraints. We expand on the recent literature with novel algorithms that are more broadly applicable to global inference problems. We derive an efficient incremental algorithm using a theoretical insight on the nature of solutions on the polytopes to project onto any arbitrary polytope, that shows massive improvements in performance. Using better optimization routines along with an adaptive algorithm to control the smoothness of the objective, improves the speed of the solution even further. We showcase the efficacy of our approach via experimental results on web-scale marketplace data.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Asia > India > Maharashtra > Mumbai (0.04)
- Instructional Material > Course Syllabus & Notes (0.46)
- Research Report (0.40)
- Information Technology > Services (0.54)
- Transportation > Passenger (0.34)
Smooth Bandit Optimization: Generalization to H\"older Space
Liu, Yusha, Wang, Yining, Singh, Aarti
We consider bandit optimization of a smooth reward function, where the goal is cumulative regret minimization. This problem has been studied for $\alpha$-H\"older continuous (including Lipschitz) functions with $0<\alpha\leq 1$. Our main result is in generalization of the reward function to H\"older space with exponent $\alpha>1$ to bridge the gap between Lipschitz bandits and infinitely-differentiable models such as linear bandits. For H\"older continuous functions, approaches based on random sampling in bins of a discretized domain suffices as optimal. In contrast, we propose a class of two-layer algorithms that deploy misspecified linear/polynomial bandit algorithms in bins. We demonstrate that the proposed algorithm can exploit higher-order smoothness of the function by deriving a regret upper bound of $\tilde{O}(T^\frac{d+\alpha}{d+2\alpha})$ for when $\alpha>1$, which matches existing lower bound. We also study adaptation to unknown function smoothness over a continuous scale of H\"older spaces indexed by $\alpha$, with a bandit model selection approach applied with our proposed two-layer algorithms. We show that it achieves regret rate that matches the existing lower bound for adaptation within the $\alpha\leq 1$ subset.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Model Selection in Contextual Stochastic Bandit Problems
Pacchiano, Aldo, Phan, My, Abbasi-Yadkori, Yasin, Rao, Anup, Zimmert, Julian, Lattimore, Tor, Szepesvari, Csaba
We study model selection in stochastic bandit problems. Our approach relies on a master algorithm that selects its actions among candidate base algorithms. While this problem is studied for specific classes of stochastic base algorithms, our objective is to provide a method that can work with more general classes of stochastic base algorithms. We propose a master algorithm inspired by CORRAL \cite{DBLP:conf/colt/AgarwalLNS17} and introduce a novel and generic smoothing transformation for stochastic bandit algorithms that permits us to obtain $O(\sqrt{T})$ regret guarantees for a wide class of base algorithms when working along with our master. We exhibit a lower bound showing that even when one of the base algorithms has $O(\log T)$ regret, in general it is impossible to get better than $\Omega(\sqrt{T})$ regret in model selection, even asymptotically. We apply our algorithm to choose among different values of $\epsilon$ for the $\epsilon$-greedy algorithm, and to choose between the $k$-armed UCB and linear UCB algorithms. Our empirical studies further confirm the effectiveness of our model-selection method.
- North America > United States > Massachusetts (0.04)
- Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
Reolink RLN8-410 8-Channel PoE NVR review: Corral up to 8 cameras into a single system
We recently reviewed Reolink's marvelous RLC-410 4MP PoE Security IP Camera. One of it's most notable features is that it doesn't carry the added cost of a cloud subscription for storing video footage. Instead, you record directly to your mobile device, PC, or--if you want to use several cameras as part of a larger security system--to Reolink's own network video recorder (NVR). We tried out the NVR while we had the RLC-410 in hand and decided to review it separately to do justic to its breadth of features. Reolink offers two versions of it's standalone NVR: the 8-channel RLN8-410 reviewed here and the and the 16-channel RLN16-410.
- Information Technology > Artificial Intelligence (0.52)
- Information Technology > Hardware (0.50)
- Information Technology > Communications (0.35)