Talluri, Kalyan
Understanding the Training and Generalization of Pretrained Transformer for Sequential Decision Making
Wang, Hanzhao, Pan, Yu, Sun, Fupeng, Liu, Shang, Talluri, Kalyan, Chen, Guanting, Li, Xiaocheng
In this paper, we consider the supervised pretrained transformer for a class of sequential decision-making problems. The class of considered problems is a subset of the general formulation of reinforcement learning in that there is no transition probability matrix, and the class of problems covers bandits, dynamic pricing, and newsvendor problems as special cases. Such a structure enables the use of optimal actions/decisions in the pretraining phase, and the usage also provides new insights for the training and generalization of the pretrained transformer. We first note that the training of the transformer model can be viewed as a performative prediction problem, and the existing methods and theories largely ignore or cannot resolve the arisen out-of-distribution issue. We propose a natural solution that includes the transformer-generated action sequences in the training procedure, and it enjoys better properties both numerically and theoretically. The availability of the optimal actions in the considered tasks also allows us to analyze the properties of the pretrained transformer as an algorithm and explains why it may lack exploration and how this can be automatically resolved. Numerically, we categorize the advantages of the pretrained transformer over the structured algorithms such as UCB and Thompson sampling into three cases: (i) it better utilizes the prior knowledge in the pretraining data; (ii) it can elegantly handle the misspecification issue suffered by the structured algorithms; (iii) for short time horizon such as $T\le50$, it behaves more greedy and enjoys much better regret than the structured algorithms which are designed for asymptotic optimality.
Transformer Choice Net: A Transformer Neural Network for Choice Prediction
Wang, Hanzhao, Li, Xiaocheng, Talluri, Kalyan
Firms are interested in understanding the choice behavior of their customers as well as forecasting the sales of their items. When customers choose at most one item per shopping instance, discrete-choice models estimate the probability of the choice, either at a segment level or individual customer level, based on a latent utility function of the features of the item, the customer, and the provided assortment. However, there are many situations where customers choose multiple items on a single shopping instance, either from the same category or across categories. The firm may be aware of only the final choices made by the customer (as in physical retail) or the precise sequence of those choices (such as in an e-commerce setting). Multi-choice models are used for the former case, to estimate the probability of choosing a subset of items, amongst all possible subsets of the given assortment, considering potential interactions amongst the items and their features. Sequential choice models consider the sequence of choices, taking into account not only the item and customer features but also what the customer has chosen till then to predict the subsequent choice(s). Modeling and predicting the choice probabilities for these situations is challenging: the complexity of the sequential and multi-choice models is considerably more than in the single-choice case because of combinatorial explosion in the number of possible customer journeys and final choices, and consequently models for multiple choices are less widely adapted in practice. In this paper, we introduce the Transformer Choice Net, a neural network using the Transformer architecture (Vaswani et al., 2017), as a data-driven solution that works under any of the three models: single, sequential, and multiple.
A Neural Network Based Choice Model for Assortment Optimization
Wang, Hanzhao, Cai, Zhongze, Li, Xiaocheng, Talluri, Kalyan
Discrete-choice models are used in economics, marketing and revenue management to predict customer purchase probabilities, say as a function of prices and other features of the offered assortment. While they have been shown to be expressive, capturing customer heterogeneity and behaviour, they are also hard to estimate, often based on many unobservables like utilities; and moreover, they still fail to capture many salient features of customer behaviour. A natural question then, given their success in other contexts, is if neural networks can eliminate the necessity of carefully building a context-dependent customer behaviour model and hand-coding and tuning the estimation. It is unclear however how one would incorporate assortment effects into such a neural network, and also how one would optimize the assortment with such a black-box generative model of choice probabilities. In this paper we investigate first whether a single neural network architecture can predict purchase probabilities for datasets from various contexts and generated under various models and assumptions. Next, we develop an assortment optimization formulation that is solvable by off-the-shelf integer programming solvers. We compare against a variety of benchmark discrete-choice models on simulated as well as real-world datasets, developing training tricks along the way to make the neural network prediction and subsequent optimization robust and comparable in performance to the alternates.
On Dynamic Pricing with Covariates
Wang, Hanzhao, Talluri, Kalyan, Li, Xiaocheng
We consider the dynamic pricing problem with covariates under a generalized linear demand model: a seller can dynamically adjust the price of a product over a horizon of $T$ time periods, and at each time period $t$, the demand of the product is jointly determined by the price and an observable covariate vector $x_t\in\mathbb{R}^d$ through an unknown generalized linear model. Most of the existing literature assumes the covariate vectors $x_t$'s are independently and identically distributed (i.i.d.); the few papers that relax this assumption either sacrifice model generality or yield sub-optimal regret bounds. In this paper we show that a simple pricing algorithm has an $O(d\sqrt{T}\log T)$ regret upper bound without assuming any statistical structure on the covariates $x_t$ (which can even be arbitrarily chosen). The upper bound on the regret matches the lower bound (even under the i.i.d. assumption) up to logarithmic factors. Our paper thus shows that (i) the i.i.d. assumption is not necessary for obtaining low regret, and (ii) the regret bound can be independent of the (inverse) minimum eigenvalue of the covariance matrix of the $x_t$'s, a quantity present in previous bounds. Furthermore, we discuss a condition under which a better regret is achievable and how a Thompson sampling algorithm can be applied to give an efficient computation of the prices.