omega
Fair Secretaries with Unfair Predictions
Algorithms with predictions is a recent framework for decision-making under uncertainty that leverages the power of machine-learned predictions without making any assumption about their quality. The goal in this framework is for algorithms to achieve an improved performance when the predictions are accurate while maintaining acceptable guarantees when the predictions are erroneous. A serious concern with algorithms that use predictions is that these predictions can be biased and, as a result, cause the algorithm to make decisions that are deemed unfair. We show that this concern manifests itself in the classical secretary problem in the learning-augmented setting---the state-of-the-art algorithm can have zero probability of accepting the best candidate, which we deem unfair, despite promising to accept a candidate whose expected value is at least $\max\{\Omega (1), 1 - O(\varepsilon)\}$ times the optimal value, where $\varepsilon$ is the prediction error.We show how to preserve this promise while also guaranteeing to accept the best candidate with probability $\Omega(1)$. Our algorithm and analysis are based on a new ``pegging'' idea that diverges from existing works and simplifies/unifies some of their results. Finally, we extend to the $k$-secretary problem and complement our theoretical analysis with experiments.
Reinforcement Learning with LTL and \omega -Regular Objectives via Optimality-Preserving Translation to Average Rewards
Linear temporal logic (LTL) and, more generally, $\omega$-regular objectives are alternatives to the traditional discount sum and average reward objectives in reinforcement learning (RL), offering the advantage of greater comprehensibility and hence explainability. In this work, we study the relationship between these objectives. Our main result is that each RL problem for $\omega$-regular objectives can be reduced to a limit-average reward problem in an optimality-preserving fashion, via (finite-memory) reward machines. Furthermore, we demonstrate the efficacy of this approach by showing that optimal policies for limit-average problems can be found asymptotically by solving a sequence of discount-sum problems approximately. Consequently, we resolve an open problem: optimal policies for LTL and $\omega$-regular objectives can be learned asymptotically.
Approximating the Top Eigenvector in Random Order Streams
When rows of an $n \times d$ matrix $A$ are given in a stream, we study algorithms for approximating the top eigenvector of $A^T A$ (equivalently, the top right singular vector of $A$). We consider worst case inputs $A$ but assume that the rows are presented to the streaming algorithm in a uniformly random order. We show that when the gap parameter $R = \sigma_1(A)^2/\sigma_2(A)^2 = \Omega(1)$, then there is a randomized algorithm that uses $O(h \cdot d \cdot \text{polylog}(d))$ bits of space and outputs a unit vector $v$ that has a correlation $1 - O(1/\sqrt{R})$ with the top eigenvector $v_1$. Here $h$ denotes the number of ``heavy rows'' in the matrix, defined as the rows with Euclidean norm at least $\|{A}\|_F/\sqrt{d \cdot \text{polylog}(d)}$. We also provide a lower bound showing that any algorithm using $O(hd/R)$ bits of space can obtain at most $1 - \Omega(1/R^2)$ correlation with the top eigenvector. Thus, parameterizing the space complexity in terms of the number of heavy rows is necessary for high accuracy solutions.Our results improve upon the $R = \Omega(\log n \cdot \log d)$ requirement in a recent work of Price. We note that Price's algorithm works for arbitrary order streams whereas our algorithm requires a stronger assumption that the rows are presented in a uniformly random order. We additionally show that the gap requirements in Price's analysis can be brought down to $R = \Omega(\log^2 d)$ for arbitrary order streams and $R = \Omega(\log d)$ for random order streams. The requirement of $R = \Omega(\log d)$ for random order streams is nearly tight for Price's analysis as we obtain a simple instance with $R = \Omega(\log d/\log\log d)$ for which Price's algorithm, with any fixed learning rate, cannot output a vector approximating the top eigenvector $v_1$.
Nearly Minimax Optimal Regret for Multinomial Logistic Bandit
In this paper, we study the contextual multinomial logit (MNL) bandit problem in which a learning agent sequentially selects an assortment based on contextual information, and user feedback follows an MNL choice model.There has been a significant discrepancy between lower and upper regret bounds, particularly regarding the maximum assortment size $K$. Additionally, the variation in reward structures between these bounds complicates the quest for optimality. Under uniform rewards, where all items have the same expected reward, we establish a regret lower bound of $\Omega(d\sqrt{\smash[b]{T/K}})$ and propose a constant-time algorithm, OFU-MNL+, that achieves a matching upper bound of $\tilde{\mathcal{O}}(d\sqrt{\smash[b]{T/K}})$. We also provide instance-dependent minimax regret bounds under uniform rewards.Under non-uniform rewards, we prove a lower bound of $\Omega(d\sqrt{T})$ and an upper bound of $\tilde{\mathcal{O}}(d\sqrt{T})$, also achievable by OFU-MNL+. Our empirical studies support these theoretical findings. To the best of our knowledge, this is the first work in the contextual MNL bandit literature to prove minimax optimality --- for either uniform or non-uniform reward setting --- and to propose a computationally efficient algorithm that achieves this optimality up to logarithmic factors.
The Space Complexity of Approximating Logistic Loss
We provide space complexity lower bounds for data structures that approximate logistic loss up to $\epsilon$-relative error on a logistic regression problem with data $\mathbf{X} \in \mathbb{R}^{n \times d}$ and labels $\mathbf{y} \in \\{-1,1\\}^d$. The space complexity of existing coreset constructions depend on a natural complexity measure $\mu_\mathbf{y}(\mathbf{X})$. We give an $\tilde{\Omega}(\frac{d}{\epsilon^2})$ space complexity lower bound in the regime $\mu_\mathbf{y}(\mathbf{X}) = \mathcal{O}(1)$ that shows existing coresets are optimal in this regime up to lower order factors. We also prove a general $\tilde{\Omega}(d\cdot \mu_\mathbf{y}(\mathbf{X}))$ space lower bound when $\epsilon$ is constant, showing that the dependency on $\mu_\mathbf{y}(\mathbf{X})$ is not an artifact of mergeable coresets. Finally, we refute a prior conjecture that $\mu_\mathbf{y}(\mathbf{X})$ is hard to compute by providing an efficient linear programming formulation, and we empirically compare our algorithm to prior approximate methods.
How Does Variance Shape the Regret in Contextual Bandits?
We consider realizable contextual bandits with general function approximation, investigating how small reward variance can lead to better-than-minimax regret bounds. Unlike in minimax regret bounds, we show that the eluder dimension $d_{\text{elu}}$$-$a measure of the complexity of the function class$-$plays a crucial role in variance-dependent bounds. We consider two types of adversary: (1) Weak adversary: The adversary sets the reward variance before observing the learner's action. In this setting, we prove that a regret of $\Omega( \sqrt{ \min (A, d_{\text{elu}}) \Lambda } + d_{\text{elu}})$ is unavoidable when $d_{\text{elu}} \leq \sqrt{A T}$, where $A$ is the number of actions, $T$ is the total number of rounds, and $\Lambda$ is the total variance over $T$ rounds. For the $A\leq d_{\text{elu}}$ regime, we derive a nearly matching upper bound $\tilde{O}( \sqrt{ A\Lambda } + d_{\text{elu} })$ for the special case where the variance is revealed at the beginning of each round.
On the Sparsity of the Strong Lottery Ticket Hypothesis
Considerable research efforts have recently been made to show that a random neural network $N$ contains subnetworks capable of accurately approximating any given neural network that is sufficiently smaller than $N$, without any training. This line of research, known as the Strong Lottery Ticket Hypothesis (SLTH), was originally motivated by the weaker Lottery Ticket Hypothesis, which states that a sufficiently large random neural network $N$ contains sparse subnetworks that can be trained efficiently to achieve performance comparable to that of training the entire network $N$.Despite its original motivation, results on the SLTH have so far not provided any guarantee on the size of subnetworks.Such limitation is due to the nature of the main technical tool leveraged by these results, the Random Subset Sum (RSS) Problem.Informally, the RSS Problem asks how large a random i.i.d.
Online Learning with Sublinear Best-Action Queries
In online learning, a decision maker repeatedly selects one of a set of actions, with the goal of minimizing the overall loss incurred. Following the recent line of research on algorithms endowed with additional predictive features, we revisit this problem by allowing the decision maker to acquire additional information on the actions to be selected. In particular, we study the power of \emph{best-action queries}, which reveal beforehand the identity of the best action at a given time step. In practice, predictive features may be expensive, so we allow the decision maker to issue at most $k$ such queries.We establish tight bounds on the performance any algorithm can achieve when given access to $k$ best-action queries for different types of feedback models. In particular, we prove that in the full feedback model, $k$ queries are enough to achieve an optimal regret of $\Theta(\min\{\sqrt T, \frac{T}{k}\})$. This finding highlights the significant multiplicative advantage in the regret rate achievable with even a modest (sublinear) number $k \in \Omega(\sqrt{T})$ of queries. Additionally, we study the challenging setting in which the only available feedback is obtained during the time steps corresponding to the $k$ best-action queries.
Benign overfitting in leaky ReLU networks with moderate input dimension
The problem of benign overfitting asks whether it is possible for a model to perfectly fit noisy training data and still generalize well. We study benign overfitting in two-layer leaky ReLU networks trained with the hinge loss on a binary classification task. We consider input data which can be decomposed into the sum of a common signal and a random noise component, which lie on subspaces orthogonal to one another. We characterize conditions on the signal to noise ratio (SNR) of the model parameters giving rise to benign versus non-benign, or harmful, overfitting: in particular, if the SNR is high then benign overfitting occurs, conversely if the SNR is low then harmful overfitting occurs. We attribute both benign and non-benign overfitting to an approximate margin maximization property and show that leaky ReLU networks trained on hinge loss with gradient descent (GD) satisfy this property. In contrast to prior work we do not require the training data to be nearly orthogonal. Notably, for input dimension $d$ and training sample size $n$, while results in prior work require $d = \Omega(n^2 \log n)$, here we require only $d = \Omega(n)$.