Jennings, Nicholas R
Efficient Algorithms with Performance Guarantees for the Stochastic Multiple-Choice Knapsack Problem
Tran-Thanh, Long (University of Southampton) | Xia, Yingce (University of Science and Technology of China) | Qin, Tao (Microsoft Research) | Jennings, Nicholas R (University of Southampton)
We study the stochastic multiple-choice knapsack problem, where a set of Kitems, whose value and weight are random variables, arrive to the system at each time step, and a decision maker has to choose at most one item to put into the knapsack without exceeding its capacity. The goal is the decision-maker is to maximise the total expected value of chosen items with respect to the knapsack capacity and a finite time horizon.We provide the first comprehensive theoretical analysis of the problem. In particular, we propose OPT-S-MCKP, the first algorithm that achieves optimality when the value-weight distributions are known. This algorithm also enjoys O(sqrt{T}) performance loss, where T is the finite time horizon, in the unknown value-weight distributions scenario.We also further develop two novel approximation methods, FR-S-MCKP and G-S-MCKP, and we prove that FR-S-MCKP achieves O(sqrt{T}) performance loss in both known and unknown value-weight distributions cases, while enjoying polynomial computational complexity per time step.On the other hand, G-S-MCKP does not have theoretical guarantees, but it still provides good performance in practice with linear running time.
Knapsack Based Optimal Policies for Budget–Limited Multi–Armed Bandits
Tran-Thanh, Long (University of Southampton) | Chapman, Archie (The University of Sydney) | Rogers, Alex (University of Southampton) | Jennings, Nicholas R (University of Southampton)
In budget–limited multi–armed bandit (MAB) problems, thelearner’s actions are costly and constrained by a fixed budget.Consequently, an optimal exploitation policy may not be topull the optimal arm repeatedly, as is the case in other variantsof MAB, but rather to pull the sequence of different arms thatmaximises the agent’s total reward within the budget. Thisdifference from existing MABs means that new approachesto maximising the total reward are required. Given this, wedevelop two pulling policies, namely: (i) KUBE; and (ii)fractional KUBE. Whereas the former provides better performanceup to 40% in our experimental settings, the latteris computationally less expensive. We also prove logarithmicupper bounds for the regret of both policies, and show thatthese bounds are asymptotically optimal (i.e. they only differfrom the best possible regret by a constant factor).