to

### On Dynamic Pricing with Covariates

We consider the dynamic pricing problem with covariates under a generalized linear demand model: a seller can dynamically adjust the price of a product over a horizon of $T$ time periods, and at each time period $t$, the demand of the product is jointly determined by the price and an observable covariate vector $x_t\in\mathbb{R}^d$ through an unknown generalized linear model. Most of the existing literature assumes the covariate vectors $x_t$'s are independently and identically distributed (i.i.d.); the few papers that relax this assumption either sacrifice model generality or yield sub-optimal regret bounds. In this paper we show that a simple pricing algorithm has an $O(d\sqrt{T}\log T)$ regret upper bound without assuming any statistical structure on the covariates $x_t$ (which can even be arbitrarily chosen). The upper bound on the regret matches the lower bound (even under the i.i.d. assumption) up to logarithmic factors. Our paper thus shows that (i) the i.i.d. assumption is not necessary for obtaining low regret, and (ii) the regret bound can be independent of the (inverse) minimum eigenvalue of the covariance matrix of the $x_t$'s, a quantity present in previous bounds. Furthermore, we discuss a condition under which a better regret is achievable and how a Thompson sampling algorithm can be applied to give an efficient computation of the prices.

### Dynamic Pricing with Demand Covariates

We consider a firm that sells products over $T$ periods without knowing the demand function. The firm sequentially sets prices to earn revenue and to learn the underlying demand function simultaneously. A natural heuristic for this problem, commonly used in practice, is greedy iterative least squares (GILS). At each time period, GILS estimates the demand as a linear function of the price by applying least squares to the set of prior prices and realized demands. Then a price that maximizes the revenue, given the estimated demand function, is used for the next time period. The performance is measured by the regret, which is the expected revenue loss from the optimal (oracle) pricing policy when the demand function is known. Recently, den Boer and Zwart (2014) and Keskin and Zeevi (2014) demonstrated that GILS is sub-optimal. They introduced algorithms which integrate forced price dispersion with GILS and achieve asymptotically optimal performance. In this paper, we consider this dynamic pricing problem in a data-rich environment. In particular, we assume that the firm knows the expected demand under a particular price from historical data, and in each period, before setting the price, the firm has access to extra information (demand covariates) which may be predictive of the demand. We prove that in this setting GILS achieves asymptotically optimal regret of order $\log(T)$. We also show the following surprising result: in the original dynamic pricing problem of den Boer and Zwart (2014) and Keskin and Zeevi (2014), inclusion of any set of covariates in GILS as potential demand covariates (even though they could carry no information) would make GILS asymptotically optimal. We validate our results via extensive numerical simulations on synthetic and real data sets.

### Distribution-free Contextual Dynamic Pricing

Contextual dynamic pricing aims to set personalized prices based on sequential interactions with customers. At each time period, a customer who is interested in purchasing a product comes to the platform. The customer's valuation for the product is a linear function of contexts, including product and customer features, plus some random market noise. The seller does not observe the customer's true valuation, but instead needs to learn the valuation by leveraging contextual information and historical binary purchase feedbacks. Existing models typically assume full or partial knowledge of the random noise distribution. In this paper, we consider contextual dynamic pricing with unknown random noise in the valuation model. Our distribution-free pricing policy learns both the contextual function and the market noise simultaneously. A key ingredient of our method is a novel perturbed linear bandit framework, where a modified linear upper confidence bound algorithm is proposed to balance the exploration of market noise and the exploitation of the current knowledge for better pricing. We establish the regret upper bound and a matching lower bound of our policy in the perturbed linear bandit framework and prove a sub-linear regret bound in the considered pricing problem. Finally, we demonstrate the superior performance of our policy on simulations and a real-life auto-loan dataset.

### Online Pricing with Offline Data: Phase Transition and Inverse Square Law

This paper investigates the impact of pre-existing offline data on online learning, in the context of dynamic pricing. We study a single-product dynamic pricing problem over a selling horizon of $T$ periods. The demand in each period is determined by the price of the product according to a linear demand model with unknown parameters. We assume that an incumbent price has been tested for $n$ periods in the offline stage before the start of the selling horizon, and the seller has collected $n$ demand observations under the incumbent price from the market. The seller wants to utilize both the pre-existing offline data and the sequential online data to minimize the regret of the online learning process. In the well-separated case where the absolute difference between the incumbent price and the optimal price $\delta$ is lower bounded by a known constant, we prove that the best achievable regret is $\tilde{\Theta}\left(\sqrt{T}\wedge (\frac{T}{n}\vee \log T)\right)$, and show that certain variants of the greedy policy achieve this bound. In the general case where $\delta$ is not necessarily lower bounded by a known constant, we prove that the best achievable regret is $\tilde{\Theta}\left(\sqrt{T}\wedge (\frac{T}{n\delta^2} \vee \frac{\log T}{\delta^2})\right)$, and construct a learning algorithm based on the "optimism in the face of uncertainty" principle, whose regret is optimal up to a logarithm factor. In both cases, our results reveal surprising transformations of the optimal regret rate with respect to the size of offline data, which we refer to as phase transitions. In addition, our result demonstrates that the shape of offline data, measured by $\delta$, also has an intrinsic effect on the optimal regret, and we quantify this effect via the inverse-square law.

### Privacy-Preserving Dynamic Personalized Pricing with Demand Learning

The prevalence of e-commerce has made detailed customers' personal information readily accessible to retailers, and this information has been widely used in pricing decisions. When involving personalized information, how to protect the privacy of such information becomes a critical issue in practice. In this paper, we consider a dynamic pricing problem over $T$ time periods with an \emph{unknown} demand function of posted price and personalized information. At each time $t$, the retailer observes an arriving customer's personal information and offers a price. The customer then makes the purchase decision, which will be utilized by the retailer to learn the underlying demand function. There is potentially a serious privacy concern during this process: a third party agent might infer the personalized information and purchase decisions from price changes from the pricing system. Using the fundamental framework of differential privacy from computer science, we develop a privacy-preserving dynamic pricing policy, which tries to maximize the retailer revenue while avoiding information leakage of individual customer's information and purchasing decisions. To this end, we first introduce a notion of \emph{anticipating} $(\varepsilon, \delta)$-differential privacy that is tailored to dynamic pricing problem. Our policy achieves both the privacy guarantee and the performance guarantee in terms of regret. Roughly speaking, for $d$-dimensional personalized information, our algorithm achieves the expected regret at the order of $\tilde{O}(\varepsilon^{-1} \sqrt{d^3 T})$, when the customers' information is adversarially chosen. For stochastic personalized information, the regret bound can be further improved to $\tilde{O}(\sqrt{d^2T} + \varepsilon^{-2} d^2)$