Not enough data to create a plot.
Try a different view from the menu above.
Tewari, Ambuj
Offline Policy Evaluation and Optimization under Confounding
Kausik, Chinmaya, Lu, Yangyi, Tan, Kevin, Makar, Maggie, Wang, Yixin, Tewari, Ambuj
Evaluating and optimizing policies in the presence of unobserved confounders is a problem of growing interest in offline reinforcement learning. Using conventional methods for offline RL in the presence of confounding can not only lead to poor decisions and poor policies, but also have disastrous effects in critical applications such as healthcare and education. We map out the landscape of offline policy evaluation for confounded MDPs, distinguishing assumptions on confounding based on whether they are memoryless and on their effect on the data-collection policies. We characterize settings where consistent value estimates are provably not achievable, and provide algorithms with guarantees to instead estimate lower bounds on the value. When consistent estimates are achievable, we provide algorithms for value estimation with sample complexity guarantees. We also present new algorithms for offline policy improvement and prove local convergence guarantees. Finally, we experimentally evaluate our algorithms on both a gridworld environment and a simulated healthcare setting of managing sepsis patients. In gridworld, our model-based method provides tighter lower bounds than existing methods, while in the sepsis simulator, our methods significantly outperform confounder-oblivious benchmarks.
Revisiting the Learnability of Apple Tasting
Raman, Vinod, Subedi, Unique, Raman, Ananth, Tewari, Ambuj
In online binary classification under \textit{apple tasting} feedback, the learner only observes the true label if it predicts "1". First studied by \cite{helmbold2000apple}, we revisit this classical partial-feedback setting and study online learnability from a combinatorial perspective. We show that the Littlestone dimension continues to prove a tight quantitative characterization of apple tasting in the agnostic setting, closing an open question posed by \cite{helmbold2000apple}. In addition, we give a new combinatorial parameter, called the Effective width, that tightly quantifies the minimax expected mistakes in the realizable setting. As a corollary, we use the Effective width to establish a \textit{trichotomy} of the minimax expected number of mistakes in the realizable setting. In particular, we show that in the realizable setting, the expected number of mistakes for any learner under apple tasting feedback can only be $\Theta(1), \Theta(\sqrt{T})$, or $\Theta(T)$.
A Characterization of Multioutput Learnability
Raman, Vinod, Subedi, Unique, Tewari, Ambuj
We consider the problem of learning multioutput function classes in batch and online settings. In both settings, we show that a multioutput function class is learnable if and only if each single-output restriction of the function class is learnable. This provides a complete characterization of the learnability of multilabel classification and multioutput regression in both batch and online settings. As an extension, we also consider multilabel learnability in the bandit feedback setting and show a similar characterization as in the full-feedback setting.
A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning
Hong, Kihyuk, Li, Yuhang, Tewari, Ambuj
Offline constrained reinforcement learning (RL) aims to learn a decision making policy that performs well while satisfying safety constraints given a dataset of trajectories collected from historical experiments. It enjoys the benefits of offline RL [22]: not requiring interaction with the environment enables real-world applications where collecting interaction data is expensive (e.g., robotics [18, 23]) or dangerous (e.g., healthcare [30]). It also enjoys the benefits of constrained RL [1]: being able to specify constraints to the behavior of the agent enables real-world applications with safety concerns (e.g., smart grid [31], robotics [14]). Offline constrained RL with function approximation (e.g., neural networks) is of particular interest because function approximation can encode inductive biases to allow sample-efficient learning in large state spaces. As is the case for offline unconstrained RL [29, 32], offline constrained RL with function approximation requires two classes of assumptions for sample-efficient learning.
Sequence Length Independent Norm-Based Generalization Bounds for Transformers
Trauger, Jacob, Tewari, Ambuj
Since Vaswani et al. (2017) debuted the Transformer, it has become one of the most preeminent architectures of its time. It has achieved state of the art prediction capabilities in various fields (Dosovitskiy et al., 2020; Wu et al., 2022; Vaswani et al., 2017; Pettersson and Falkman, 2023) and an implementation of it has even passed the BAR exam (Katz et al., 2023). With such widespread use, the theoretical underpinnings of this architecture are of great interest. Specifically, this paper is concerned with bounding the generalization error when using the Transformer in supervised learning. Upper bounding this can be used to help understand how sample size needs to scale with different architecture parameters and is a very common theoretical tool to understand machine learning algorithms (Kakade et al., 2008; Garg et al., 2020; Truong, 2022; Lin and Zhang, 2019).
Amortized Variational Inference with Coverage Guarantees
Patel, Yash, McNamara, Declan, Loper, Jackson, Regier, Jeffrey, Tewari, Ambuj
Amortized variational inference produces a posterior approximation that can be rapidly computed given any new observation. Unfortunately, there are few guarantees about the quality of these approximate posteriors. We propose Conformalized Amortized Neural Variational Inference (CANVI), a procedure that is scalable, easily implemented, and provides guaranteed marginal coverage. Given a collection of candidate amortized posterior approximators, CANVI constructs conformalized predictors based on each candidate, compares the predictors using a metric known as predictive efficiency, and returns the most efficient predictor. CANVI ensures that the resulting predictor constructs regions that contain the truth with a user-specified level of probability. CANVI is agnostic to design decisions in formulating the candidate approximators and only requires access to samples from the forward model, permitting its use in likelihood-free settings. We prove lower bounds on the predictive efficiency of the regions produced by CANVI and explore how the quality of a posterior approximation relates to the predictive efficiency of prediction regions based on that approximation. Finally, we demonstrate the accurate calibration and high predictive efficiency of CANVI on a suite of simulation-based inference benchmark tasks and an important scientific task: analyzing galaxy emission spectra.
Conformal Contextual Robust Optimization
Patel, Yash, Rayan, Sahana, Tewari, Ambuj
Predict-then-optimize or contextual robust optimization problems are of long-standing interest in safety-critical settings where decision-making happens under uncertainty (Sun, Liu, and Li, 2023; Elmachtoub and Grigas, 2022; Elmachtoub, Liang, and McNellis, 2020; Peršak and Anjos, 2023). In traditional robust optimization, results are made to be robust to distributions anticipated to be present upon deployment (Ben-Tal, El Ghaoui, and Nemirovski, 2009; Beyer and Sendhoff, 2007). Since such decisions are sensitive to proper model specification, recent efforts have sought to supplant this with data-driven uncertainty regions (Cheramin et al., 2021; Bertsimas, Gupta, and Kallus, 2018; Shang and You, 2019; Johnstone and Cox, 2021). Model misspecification is ever more present in contextual robust optimization, spurring efforts to define similar datadriven uncertainty regions (Ohmori, 2021; Chenreddy, Bandi, and Delage, 2022; Sun, Liu, and Li, 2023). Such methods, however, focus on box-and ellipsoid-based uncertainty regions, both of which are necessarily convex and often overly conservative, resulting in suboptimal decision-making. Conformal prediction provides a principled framework for producing distribution-free prediction regions with marginal frequentist coverage guarantees (Angelopoulos and Bates, 2021; Shafer and Vovk, 2008).
On the Computational Complexity of Private High-dimensional Model Selection via the Exponential Mechanism
Roy, Saptarshi, Tewari, Ambuj
We consider the problem of model selection in a high-dimensional sparse linear regression model under the differential privacy framework. In particular, we consider the problem of differentially private best subset selection and study its utility guarantee. We adopt the well-known exponential mechanism for selecting the best model, and under a certain margin condition, we establish its strong model recovery property. However, the exponential search space of the exponential mechanism poses a serious computational bottleneck. To overcome this challenge, we propose a Metropolis-Hastings algorithm for the sampling step and establish its polynomial mixing time to its stationary distribution in the problem parameters $n,p$, and $s$. Furthermore, we also establish approximate differential privacy for the final estimates of the Metropolis-Hastings random walk using its mixing property. Finally, we also perform some illustrative simulations that echo the theoretical findings of our main results.
An Asymptotically Optimal Algorithm for the Convex Hull Membership Problem
Qiao, Gang, Tewari, Ambuj
Multi-armed bandit (MAB) problem has been increasingly recognized as a fundamental and powerful framework in decision theory, where an agent's objective is to make a series of decisions to pull an arm of a K slot machine in order to maximize the total reward. Each of the arms is associated with a fixed but unknown probability distribution [5, 31]. An enormous literature has accumulated over the past decades on the MAB problem, such as clinical trials and drug testing [6, 19], recommendation system and online advertising [7, 9, 42, 51, 56], information retrieval [8, 37], and finance [24, 39, 40, 48]. From a theoretical perspective, the MAB problem was first studied in the seminal work [44] and followed by a vast line of work to study in regret minimization [2, 4, 5, 10, 14, 18, 32, 34, 49, 53] and pure exploration [11, 22, 36, 46]. In this paper, we investigate the convex hull membership (CHM) problem in a pure exploration setting, where a learner sequentially performs experiments in a stochastic multi-armed bandit environment to identify if a given point lies in the convex hull of means of K arms as efficiently and accurately as possible. In particular, we are interested in the minimum expected number of samples required to identify the convex hull membership with high probability (at least 1 δ), without considering a reward/cost structure. The non-stochastic version of the convex hull membership problem is well studied in [20] and has attracted significant attention in different scientific areas and proven its crucial applications in image processing [25, 55], robot motion planning [33, 50] and pattern recognition [27, 45].
Online Infinite-Dimensional Regression: Learning Linear Operators
Raman, Vinod, Subedi, Unique, Tewari, Ambuj
We consider the problem of learning linear operators under squared loss between two infinite-dimensional Hilbert spaces in the online setting. We show that the class of linear operators with uniformly bounded $p$-Schatten norm is online learnable for any $p \in [1, \infty)$. On the other hand, we prove an impossibility result by showing that the class of uniformly bounded linear operators with respect to the operator norm is \textit{not} online learnable. Moreover, we show a separation between online uniform convergence and online learnability by identifying a class of bounded linear operators that is online learnable but uniform convergence does not hold. Finally, we prove that the impossibility result and the separation between uniform convergence and learnability also hold in the agnostic PAC setting.