weitzman
- North America > United States > Colorado > Boulder County > Boulder (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- (18 more...)
Weitzman's Rule for Pandora's Box with Correlations
Pandora's Box is a central problem in decision making under uncertainty that can model various real life scenarios. In this problem we are given n boxes, each with a fixed opening cost, and an unknown value drawn from a known distribution, only revealed if we pay the opening cost. Our goal is to find a strategy for opening boxes to minimize the sum of the value selected and the opening cost paid.In this work we revisit Pandora's Box when the value distributions are correlated, first studied in [CGT+20]. We show that the optimal algorithm for the independent case, given by Weitzman's rule, directly works for the correlated case. In fact, our algorithm results in significantly improved approximation guarantees compared to the previous work, while also being substantially simpler. We also show how to implement the rule given only sample access to the correlated distribution of values. Specifically, we find that a number of samples that is polynomial in the number of boxes is sufficient for the algorithm to work.
- North America > United States > Colorado > Boulder County > Boulder (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- (18 more...)
Optimal Stopping vs Best-of-$N$ for Inference Time Optimization
Kalayci, Yusuf, Raman, Vinod, Dughmi, Shaddin
Large language model (LLM) generation often requires balancing output quality against inference cost, especially when using multiple generations. We introduce a new framework for inference-time optimization based on the classical Pandora's Box problem. Viewing each generation as opening a costly "box" with random reward, we develop algorithms that decide when to stop generating without knowing the underlying reward distribution. Our first contribution is a UCB-style Pandora's Box algorithm, which achieves performance that is provably close to Weitzman's algorithm, the optimal strategy when the distribution is known. We further adapt this method to practical LLM settings by addressing reward scaling across prompts via a Bradley-Terry inspired transformation. This leads to an adaptive inference-time optimization method that normalizes rewards and learns stopping thresholds on the fly. Experiments on the AlpacaFarm and HH-RLHF datasets, using multiple LLM-reward model pairs, show that our adaptive strategy can obtain the same performance as non-adaptive Best-of-N sampling while requiring 15-35 percent fewer generations on average. Our results establish a principled bridge between optimal stopping theory and inference-time scaling, providing both theoretical performance bounds and practical efficiency gains for LLM deployment.
- North America > United States > California (0.14)
- North America > United States > Michigan (0.04)
- Asia > Middle East > Jordan (0.04)
Weitzman's Rule for Pandora's Box with Correlations
Pandora's Box is a central problem in decision making under uncertainty that can model various real life scenarios. In this problem we are given n boxes, each with a fixed opening cost, and an unknown value drawn from a known distribution, only revealed if we pay the opening cost. Our goal is to find a strategy for opening boxes to minimize the sum of the value selected and the opening cost paid.In this work we revisit Pandora's Box when the value distributions are correlated, first studied in [CGT 20]. We show that the optimal algorithm for the independent case, given by Weitzman's rule, directly works for the correlated case. In fact, our algorithm results in significantly improved approximation guarantees compared to the previous work, while also being substantially simpler.
Contextual Pandora's Box
Atsidakou, Alexia, Caramanis, Constantine, Gergatsouli, Evangelia, Papadigenopoulos, Orestis, Tzamos, Christos
Pandora's Box is a fundamental stochastic optimization problem, where the decision-maker must find a good alternative while minimizing the search cost of exploring the value of each alternative. In the original formulation, it is assumed that accurate distributions are given for the values of all the alternatives, while recent work studies the online variant of Pandora's Box where the distributions are originally unknown. In this work, we study Pandora's Box in the online setting, while incorporating context. At every round, we are presented with a number of alternatives each having a context, an exploration cost and an unknown value drawn from an unknown distribution that may change at every round. Our main result is a no-regret algorithm that performs comparably well to the optimal algorithm which knows all prior distributions exactly. Our algorithm works even in the bandit setting where the algorithm never learns the values of the alternatives that were not explored. The key technique that enables our result is a novel modification of the realizability condition in contextual bandits that connects a context to a sufficient statistic of each alternative's distribution (its "reservation value") rather than its mean.
- North America > United States > California > Los Angeles County > Long Beach (0.14)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- (22 more...)
A Chess Novice Challenged Magnus Carlsen. He Had One Month to Train.
Max was not very good at chess himself. He's a 24-year-old entrepreneur who lives in San Francisco and plays the sport occasionally to amuse himself. He was a prototypical amateur. Now he was preparing himself for a match against chess royalty. And he believed he could win. The unlikely series of events that brought him to this stage began last year, when Max challenged himself to a series of monthly tasks that were ambitious bordering on absurd. He memorized the order of a shuffled deck of cards. He solved a Rubik's Cube in 17 seconds. He developed perfect musical pitch and landed a standing back-flip. He studied enough Hebrew to discuss the future of technology for a half-hour. Max, a self-diagnosed obsessive learner, wanted his goals to be so lofty that he would fail to reach some. He knew from the beginning of his peculiar year that the hardest challenge would come in October: defeating Magnus Carlsen in a game of chess.
- North America > United States > California > San Francisco County > San Francisco (0.24)
- North America > United States > New York > Westchester County (0.04)
- North America > United States > Nevada > Clark County > Las Vegas (0.04)
- Europe > Norway (0.04)