ln 1
Bandit Task Assignment with Unknown Processing Time
This study considers a novel problem setting, referred to as bandit task assignment, that incorporates the processing time of each task in the bandit setting. In this problem setting, a player sequentially chooses a set of tasks to start so that the set of processing tasks satisfies a given combinatorial constraint. The reward and processing time for each task follow unknown distributions, values of which are revealed only after the task has been completed.
Function
Algorithm 2 details the pseudocode for the partition function used in LaMCTS, which we use in LaP3 as well. Algorithm 2 Partition Function 1: Input: Input Space Ω, Samples St, Node partition threshold Nthres, Partitioning Latent Model s(x) 2: Set V0 = {Ω} 3: Set Vqueue = {Ω} 4: while Vqueue 6= do 5: Ωp Vqueue.pop(0) It is clear that Fk(y) is a monotonically decreasing function with Fk(0) = 1 and limy + Fk(y) = 0. Here we assume it is strictly decreasing so that Fk(y) has a well-defined inverse function F 1k . In the following, we will omit the subscript k for brevity. P[f(xi) g y|xi Ωk] (4) = 1 Fntk (y) (5) Note that 1 is due to the fact that all samples x1,...,xnt are independently drawn within the region Ωk.
Two-Sided Bounds for Entropic Optimal Transport via a Rate-Distortion Integral
We show that the maximum expected inner product between a random vector and the standard normal vector over all couplings subject to a mutual information constraint or regularization is equivalent to a truncated integral involving the rate-distortion function, up to universal multiplicative constants. The proof is based on a lifting technique, which constructs a Gaussian process indexed by a random subset of the type class of the probability distribution involved in the information-theoretic inequality, and then applying a form of the majorizing measure theorem.