Search
Nearly Minimax Optimal Regret for Multinomial Logistic Bandit
In this paper, we study the contextual multinomial logit (MNL) bandit problem in which a learning agent sequentially selects an assortment based on contextual information, and user feedback follows an MNL choice model.There has been a significant discrepancy between lower and upper regret bounds, particularly regarding the maximum assortment size K . Additionally, the variation in reward structures between these bounds complicates the quest for optimality. Under uniform rewards, where all items have the same expected reward, we establish a regret lower bound of \Omega(d\sqrt{\smash[b]{T/K}}) and propose a constant-time algorithm, OFU-MNL, that achieves a matching upper bound of \tilde{\mathcal{O}}(d\sqrt{\smash[b]{T/K}}) . We also provide instance-dependent minimax regret bounds under uniform rewards.Under non-uniform rewards, we prove a lower bound of \Omega(d\sqrt{T}) and an upper bound of \tilde{\mathcal{O}}(d\sqrt{T}), also achievable by OFU-MNL . Our empirical studies support these theoretical findings.
Discretely beyond 1/e : Guided Combinatorial Algortihms for Submodular Maximization
For constrained, not necessarily monotone submodular maximization, all known approximation algorithms with ratio greater than 1/e require continuous ideas, such as queries to the multilinear extension of a submodular function and its gradient, which are typically expensive to simulate with the original set function. For combinatorial algorithms, the best known approximation ratios for both size and matroid constraint are obtained by a simple randomized greedy algorithm of Buchbinder et al. [9]: 1/e \approx 0.367 for size constraint and 0.281 for the matroid constraint in \mathcal O (kn) queries, where k is the rank of the matroid. In this work, we develop the first combinatorial algorithms to break the 1/e barrier: we obtain approximation ratio of 0.385 in \mathcal O (kn) queries to the submodular set function for size constraint, and 0.305 for a general matroid constraint. These are achieved by guiding the randomized greedy algorithm with a fast local search algorithm. Further, we develop deterministic versions of these algorithms, maintaining the same ratio and asymptotic time complexity.
The Minimax Rate of HSIC Estimation for Translation-Invariant Kernels
Kernel techniques are among the most influential approaches in data science and statistics. Under mild conditions, the reproducing kernel Hilbert space associated to a kernel is capable of encoding the independence of M\ge2 random variables. Probably the most widespread independence measure relying on kernels is the so-called Hilbert-Schmidt independence criterion (HSIC; also referred to as distance covariance in the statistics literature). Despite various existing HSIC estimators designed since its introduction close to two decades ago, the fundamental question of the rate at which HSIC can be estimated is still open. In this work, we prove that the minimax optimal rate of HSIC estimation on \mathbb{R} d for Borel measures containing the Gaussians with continuous bounded translation-invariant characteristic kernels is \mathcal{O}\left(n {-1/2}\right) .
Transformers are Minimax Optimal Nonparametric In-Context Learners
In-context learning (ICL) of large language models has proven to be a surprisingly effective method of learning a new task from only a few demonstrative examples. In this paper, we shed light on the efficacy of ICL from the viewpoint of statistical learning theory. We develop approximation and generalization error analyses for a transformer model composed of a deep neural network and one linear attention layer, pretrained on nonparametric regression tasks sampled from general function spaces including the Besov space and piecewise \gamma -smooth class. In particular, we show that sufficiently trained transformers can achieve -- and even improve upon -- the minimax optimal estimation risk in context by encoding the most relevant basis representations during pretraining. Our analysis extends to high-dimensional or sequential data and distinguishes the \emph{pretraining} and \emph{in-context} generalization gaps, establishing upper and lower bounds w.r.t.
SeeA*: Efficient Exploration-Enhanced A* Search by Selective Sampling
Monte-Carlo tree search (MCTS) and reinforcement learning contributed crucially to the success of AlphaGo and AlphaZero, and A * is a tree search algorithm among the most well-known ones in the classical AI literature. MCTS and A * both perform heuristic search and are mutually beneficial. Efforts have been made to the renaissance of A * from three possible aspects, two of which have been confirmed by studies in recent years, while the third is about the OPEN list that consists of open nodes of A * search, but still lacks deep investigation. This paper aims at the third, i.e., developing the Sampling-exploration enhanced A * (SeeA *) search by constructing a dynamic subset of OPEN through a selective sampling process, such that the node with the best heuristic value in this subset instead of in the OPEN is expanded. Nodes with the best heuristic values in OPEN are most probably picked into this subset, but sometimes may not be included, which enables SeeA * to explore other promising branches.
Distributed Low-rank Matrix Factorization With Exact Consensus
Low-rank matrix factorization is a problem of broad importance, owing to the ubiquity of low-rank models in machine learning contexts. In spite of its non- convexity, this problem has a well-behaved geometric landscape, permitting local search algorithms such as gradient descent to converge to global minimizers. In this paper, we study low-rank matrix factorization in the distributed setting, where local variables at each node encode parts of the overall matrix factors, and consensus is encouraged among certain such variables. We identify conditions under which this new problem also has a well-behaved geometric landscape, and we propose an extension of distributed gradient descent (DGD) to solve this problem. The favorable landscape allows us to prove convergence to global optimality with exact consensus, a stronger result than what is provided by off-the-shelf DGD theory.
Learning Compositional Neural Programs with Recursive Tree Search and Planning
We propose a novel reinforcement learning algorithm, AlphaNPI, that incorpo- rates the strengths of Neural Programmer-Interpreters (NPI) and AlphaZero. NPI contributes structural biases in the form of modularity, hierarchy and recursion, which are helpful to reduce sample complexity, improve generalization and in- crease interpretability. AlphaZero contributes powerful neural network guided search algorithms, which we augment with recursion. AlphaNPI only assumes a hierarchical program specification with sparse rewards: 1 when the program execution satisfies the specification, and 0 otherwise. This specification enables us to overcome the need for strong supervision in the form of execution traces and consequently train NPI models effectively with reinforcement learning.
Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity
Optimization of convex functions under stochastic zeroth-order feedback has been a major and challenging question in online learning. In this work, we consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries. We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds. We propose an algorithm that features a combination of a bootstrapping stage and a mirror-descent stage. Our main technical innovation consists of a sharp characterization for the spherical-sampling gradient estimator under higher-order smoothness conditions, which allows the algorithm to optimally balance the bias-variance tradeoff, and a new iterative method for the bootstrapping stage, which maintains the performance for unbounded Hessian.
RandAugment: Practical Automated Data Augmentation with a Reduced Search Space
Recent work on automated data augmentation strategies has led to state-of-the-art results in image classification and object detection. An obstacle to a large-scale adoption of these methods is that they require a separate and expensive search phase. A common way to overcome the expense of the search phase was to use a smaller proxy task. However, it was not clear if the optimized hyperparameters found on the proxy task are also optimal for the actual task. In this work, we rethink the process of designing automated data augmentation strategies.
Nearly Minimax Optimal Submodular Maximization with Bandit Feedback
We consider maximizing an unknown monotonic, submodular set function f: 2 {[n]} \rightarrow [0,1] with cardinality constraint under stochastic bandit feedback. At each time t 1,\dots,T the learner chooses a set S_t \subset [n] with S_t \leq k and receives reward f(S_t) \eta_t where \eta_t is mean-zero sub-Gaussian noise. The objective is to minimize the learner's regret with respect to an approximation of the maximum f(S_*) with S_* k, obtained through robust greedy maximization of f . To date, the best regret bound in the literature scales as k n {1/3} T {2/3} . And by trivially treating every set as a unique arm one deduces that \sqrt{ {n \choose k} T } is also achievable using standard multi-armed bandit algorithms.