Goto

Collaborating Authors

 batch query



219ece62fae865562d4510ea501cf349-Supplemental.pdf

Neural Information Processing Systems

If there are multiple LVs, we select the LV with the maximum probabilityp(W). It is a heuristic to improve the empirical performancesuggestedby[13]. The simulated robot pushing experiment is taken from [23]. The simulation returns the location of apushed object given the robot'slocation and the pushing duration, i.e.,x. The portfolio optimization problem is taken from [4].


A Instantaneous Regret Bound Conditioned on the event that (8) in Lemma 1 holds (with probability 1 δ), it follows that c

Neural Information Processing Systems

From (18), (19), and (20), we obtain (9), (10), and (11), respectively. We prove the following Lemma 4 which is then used to prove Lemma 5. Lemma 3 follows from Lemma 5. Lemma 4. Let W Thus, Lemma 5 implies Lemma 3. Let us consider V -TS that selects a single query at each BO iteration (Algorithm 3). The simulation returns the location of a pushed object given the robot's location and the pushing duration, i.e., There are 30 initial observations, i.e., | D



Parallelizing Thompson Sampling

Neural Information Processing Systems

How can we make use of information parallelism in online decision making problems while efficiently balancing the exploration-exploitation trade-off? In this paper, we introduce a batch Thompson Sampling framework for two canonical online decision making problems, namely, stochastic multi-arm bandit and linear contextual bandit with finitely many arms. Over a time horizon T, our batch Thompson Sampling policy achieves the same (asymptotic) regret bound of a fully sequential one while carrying out only O (log T) batch queries. To achieve this exponential reduction, i.e., reducing the number of interactions from T to O (log T), our batch policy dynamically determines the duration of each batch in order to balance the exploration-exploitation trade-off. We also demonstrate experimentally that dynamic batch allocation dramatically outperforms natural baselines such as static batch allocations.


BQSched: A Non-intrusive Scheduler for Batch Concurrent Queries via Reinforcement Learning

Xu, Chenhao, Chen, Chunyu, Peng, Jinglin, Wang, Jiannan, Gao, Jun

arXiv.org Artificial Intelligence

Most large enterprises build predefined data pipelines and execute them periodically to process operational data using SQL queries for various tasks. A key issue in minimizing the overall makespan of these pipelines is the efficient scheduling of concurrent queries within the pipelines. Existing tools mainly rely on simple heuristic rules due to the difficulty of expressing the complex features and mutual influences of queries. The latest reinforcement learning (RL) based methods have the potential to capture these patterns from feedback, but it is non-trivial to apply them directly due to the large scheduling space, high sampling cost, and poor sample utilization. Motivated by these challenges, we propose BQSched, a non-intrusive Scheduler for Batch concurrent Queries via reinforcement learning. Specifically, BQSched designs an attention-based state representation to capture the complex query patterns, and proposes IQ-PPO, an auxiliary task-enhanced proximal policy optimization (PPO) algorithm, to fully exploit the rich signals of Individual Query completion in logs. Based on the RL framework above, BQSched further introduces three optimization strategies, including adaptive masking to prune the action space, scheduling gain-based query clustering to deal with large query sets, and an incremental simulator to reduce sampling cost. To our knowledge, BQSched is the first non-intrusive batch query scheduler via RL. Extensive experiments show that BQSched can significantly improve the efficiency and stability of batch query scheduling, while also achieving remarkable scalability and adaptability in both data and queries. For example, across all DBMSs and scales tested, BQSched reduces the overall makespan of batch queries on TPC-DS benchmark by an average of 34% and 13%, compared with the commonly used heuristic strategy and the adapted RL-based scheduler, respectively.


Parallelizing Thompson Sampling

Karbasi, Amin, Mirrokni, Vahab, Shadravan, Mohammad

arXiv.org Machine Learning

How can we make use of information parallelism in online decision making problems while efficiently balancing the exploration-exploitation trade-off? In this paper, we introduce a batch Thompson Sampling framework for two canonical online decision making problems, namely, stochastic multi-arm bandit and linear contextual bandit with finitely many arms. Over a time horizon $T$, our \textit{batch} Thompson Sampling policy achieves the same (asymptotic) regret bound of a fully sequential one while carrying out only $O(\log T)$ batch queries. To achieve this exponential reduction, i.e., reducing the number of interactions from $T$ to $O(\log T)$, our batch policy dynamically determines the duration of each batch in order to balance the exploration-exploitation trade-off. We also demonstrate experimentally that dynamic batch allocation dramatically outperforms natural baselines such as static batch allocations.


Adaptivity in Adaptive Submodularity

Esfandiari, Hossein, Karbasi, Amin, Mirrokni, Vahab

arXiv.org Machine Learning

Adaptive sequential decision making is one of the central challenges in machine learning and artificial intelligence. In such problems, the goal is to design an interactive policy that plans for an action to take, from a finite set of $n$ actions, given some partial observations. It has been shown that in many applications such as active learning, robotics, sequential experimental design, and active detection, the utility function satisfies adaptive submodularity, a notion that generalizes the notion of diminishing returns to policies. In this paper, we revisit the power of adaptivity in maximizing an adaptive monotone submodular function. We propose an efficient batch policy that with $O(\log n \times\log k)$ adaptive rounds of observations can achieve an almost tight $(1-1/e-\epsilon)$ approximation guarantee with respect to an optimal policy that carries out $k$ actions in a fully sequential setting. To complement our results, we also show that it is impossible to achieve a constant factor approximation with $o(\log n)$ adaptive rounds. We also extend our result to the case of adaptive stochastic minimum cost coverage where the goal is to reach a desired utility $Q$ with the cheapest policy. We first prove the conjecture by Golovin and Krause that the greedy policy achieves the asymptotically tight logarithmic approximation guarantee without resorting to stronger notions of adaptivity. We then propose a batch policy that provides the same guarantee in polylogarithmic adaptive rounds through a similar information-parallelism scheme. Our results shrink the adaptivity gap in adaptive submodular maximization by an exponential factor.