inventory constraint
A Minimax-MDP Framework with Future-imposed Conditions for Learning-augmented Problems
Chen, Xin, Chen, Yuze, Zhou, Yuan
We study a class of sequential decision-making problems with augmented predictions, potentially provided by a machine learning algorithm. In this setting, the decision-maker receives prediction intervals for unknown parameters that become progressively refined over time, and seeks decisions that are competitive with the hindsight optimal under all possible realizations of both parameters and predictions. We propose a minimax Markov Decision Process (minimax-MDP) framework, where the system state consists of an adversarially evolving environment state and an internal state controlled by the decision-maker. We introduce a set of future-imposed conditions that characterize the feasibility of minimax-MDPs and enable the design of efficient, often closed-form, robustly competitive policies. We illustrate the framework through three applications: multi-period inventory ordering with refining demand predictions, resource allocation with uncertain utility functions, and a multi-phase extension of the minimax-MDP applied to the inventory problem with time-varying ordering costs. Our results provide a tractable and versatile approach to robust online decision-making under predictive uncertainty.
Learning Collusion in Episodic, Inventory-Constrained Markets
Friedrich, Paul, Pรกsztor, Barna, Ramponi, Giorgia
Pricing algorithms have demonstrated the capability to learn tacit collusion that is largely unaddressed by current regulations. Their increasing use in markets, including oligopolistic industries with a history of collusion, calls for closer examination by competition authorities. In this paper, we extend the study of tacit collusion in learning algorithms from basic pricing games to more complex markets characterized by perishable goods with fixed supply and sell-by dates, such as airline tickets, perishables, and hotel rooms. We formalize collusion within this framework and introduce a metric based on price levels under both the competitive (Nash) equilibrium and collusive (monopolistic) optimum. Since no analytical expressions for these price levels exist, we propose an efficient computational approach to derive them. Through experiments, we demonstrate that deep reinforcement learning agents can learn to collude in this more complex domain. Additionally, we analyze the underlying mechanisms and structures of the collusive strategies these agents adopt.
A Re-solving Heuristic for Dynamic Assortment Optimization with Knapsack Constraints
Chen, Xi, Liu, Mo, Wang, Yining, Zhou, Yuan
In this paper, we consider a multi-stage dynamic assortment optimization problem with multi-nomial choice modeling (MNL) under resource knapsack constraints. Given the current resource inventory levels, the retailer makes an assortment decision at each period, and the goal of the retailer is to maximize the total profit from purchases. With the exact optimal dynamic assortment solution being computationally intractable, a practical strategy is to adopt the re-solving technique that periodically re-optimizes deterministic linear programs (LP) arising from fluid approximation. However, the fractional structure of MNL makes the fluid approximation in assortment optimization highly non-linear, which brings new technical challenges. To address this challenge, we propose a new epoch-based re-solving algorithm that effectively transforms the denominator of the objective into the constraint. Theoretically, we prove that the regret (i.e., the gap between the resolving policy and the optimal objective of the fluid approximation) scales logarithmically with the length of time horizon and resource capacities.
Learning with Posterior Sampling for Revenue Management under Time-varying Demand
Shimizu, Kazuma, Honda, Junya, Ito, Shinji, Nakadai, Shinji
This paper discusses the revenue management (RM) problem to maximize revenue by pricing items or services. One challenge in this problem is that the demand distribution is unknown and varies over time in real applications such as airline and retail industries. In particular, the time-varying demand has not been well studied under scenarios of unknown demand due to the difficulty of jointly managing the remaining inventory and estimating the demand. To tackle this challenge, we first introduce an episodic generalization of the RM problem motivated by typical application scenarios. We then propose a computationally efficient algorithm based on posterior sampling, which effectively optimizes prices by solving linear programming. We derive a Bayesian regret upper bound of this algorithm for general models where demand parameters can be correlated between time periods, while also deriving a regret lower bound for generic algorithms. Our empirical study shows that the proposed algorithm performs better than other benchmark algorithms and comparably to the optimal policy in hindsight. We also propose a heuristic modification of the proposed algorithm, which further efficiently learns the pricing policy in the experiments.
Demand Balancing in Primal-Dual Optimization for Blind Network Revenue Management
This paper proposes a practically efficient algorithm with optimal theoretical regret which solves the classical network revenue management (NRM) problem with unknown, nonparametric demand. Over a time horizon of length $T$, in each time period the retailer needs to decide prices of $N$ types of products which are produced based on $M$ types of resources with unreplenishable initial inventory. When demand is nonparametric with some mild assumptions, Miao and Wang (2021) is the first paper which proposes an algorithm with $O(\text{poly}(N,M,\ln(T))\sqrt{T})$ type of regret (in particular, $\tilde O(N^{3.5}\sqrt{T})$ plus additional high-order terms that are $o(\sqrt{T})$ with sufficiently large $T\gg N$). In this paper, we improve the previous result by proposing a primal-dual optimization algorithm which is not only more practical, but also with an improved regret of $\tilde O(N^{3.25}\sqrt{T})$ free from additional high-order terms. A key technical contribution of the proposed algorithm is the so-called demand balancing, which pairs the primal solution (i.e., the price) in each time period with another price to offset the violation of complementary slackness on resource inventory constraints. Numerical experiments compared with several benchmark algorithms further illustrate the effectiveness of our algorithm.
MNL-Bandit with Knapsacks
Aznag, Abdellah, Goyal, Vineet, Perivier, Noemie
We consider a dynamic assortment selection problem where a seller has a fixed inventory of $N$ substitutable products and faces an unknown demand that arrives sequentially over $T$ periods. In each period, the seller needs to decide on the assortment of products (of cardinality at most $K$) to offer to the customers. The customer's response follows an unknown multinomial logit model (MNL) with parameters $v$. The goal of the seller is to maximize the total expected revenue given the fixed initial inventory of $N$ products. We give a policy that achieves a regret of $\tilde O\Big(K \sqrt{KN T}\Big(\sqrt{v_{\text{max}}} + \frac{1}{q_{\text{min}}}\text{OPT}\Big)\Big)$, where $v_{\text{max}}\leq 1$ is the maximum utility for any product and $q_{\text{min}}$ the minimum inventory level, under a mild assumption on the model parameters. In particular, our policy achieves a near-optimal $\tilde O(\sqrt{T})$ regret in a large-inventory setting. Our policy builds upon the UCB-based approach for MNL-bandit without inventory constraints in [1] and addresses the inventory constraints through an exponentially sized LP for which we present a tractable approximation while keeping the $\tilde O(\sqrt{T})$ regret bound.
Network Revenue Management with Limited Switches: Known and Unknown Demand Distributions
Simchi-Levi, David, Xu, Yunzong, Zhao, Jinglong
This work is motivated by a practical concern from our retail partner. While they respect the advantages of dynamic pricing, they must limit the number of price changes to be within some constant. We study the classical price-based network revenue management problem, where a retailer has finite initial inventory of multiple resources to sell over a finite time horizon. We consider both known and unknown distribution settings, and derive policies that have the best-possible asymptotic performance in both settings. Our results suggest an intrinsic difference between the expected revenue associated with how many switches are allowed, which further depends on the number of resources. Our results are also the first to show a separation between the regret bounds associated with different number of resources.
A Primal-dual Learning Algorithm for Personalized Dynamic Pricing with an Inventory Constraint
Chen, Ningyuan, Gallego, Guillermo
A firm is selling a product to different types (based on the features such as education backgrounds, ages, etc.) of customers over a finite season with non-replenishable initial inventory. The type label of an arriving customer can be observed but the demand function associated with each type is initially unknown. The firm sets personalized prices dynamically for each type and attempts to maximize the revenue over the season. We provide a learning algorithm that is near-optimal when the demand and capacity scale in proportion. The algorithm utilizes the primal-dual formulation of the problem and learns the dual optimal solution explicitly. It allows the algorithm to overcome the curse of dimensionality (the rate of regret is independent of the number of types) and sheds light on novel algorithmic designs for learning problems with resource constraints.