Goto

Collaborating Authors

 ound




Data-dependent Bounds with $T$-Optimal Best-of-Both-Worlds Guarantees in Multi-Armed Bandits using Stability-Penalty Matching

Nguyen, Quan, Ito, Shinji, Komiyama, Junpei, Mehta, Nishant A.

arXiv.org Artificial Intelligence

Existing data-dependent and best-of-both-worlds regret bounds for multi-armed bandits problems have limited adaptivity as they are either data-dependent but not best-of-both-worlds (BOBW), BOBW but not data-dependent or have sub-optimal $O(\sqrt{T\ln{T}})$ worst-case guarantee in the adversarial regime. To overcome these limitations, we propose real-time stability-penalty matching (SPM), a new method for obtaining regret bounds that are simultaneously data-dependent, best-of-both-worlds and $T$-optimal for multi-armed bandits problems. In particular, we show that real-time SPM obtains bounds with worst-case guarantees of order $O(\sqrt{T})$ in the adversarial regime and $O(\ln{T})$ in the stochastic regime while simultaneously being adaptive to data-dependent quantities such as sparsity, variations, and small losses. Our results are obtained by extending the SPM technique for tuning the learning rates in the follow-the-regularized-leader (FTRL) framework, which further indicates that the combination of SPM and FTRL is a promising approach for proving new adaptive bounds in online learning problems.


Improving Quantum Machine Learning via Heat-Bath Algorithmic Cooling

Rodríguez-Briones, Nayeli A., Park, Daniel K.

arXiv.org Artificial Intelligence

Building on this concept, we develop a quantum refrigerator protocol that enhances sample efficiency in both the training and prediction phases of QML, Quantum machine learning (QML) stands at the intersection without relying on complex operations like Grover iterations of quantum information processing (QIP) and data science, exploring and quantum phase estimation. Our technique draws inspiration the fundamental limits of physical systems' ability to from algorithmic cooling protocols [5-14], where entropy learn from data and generalize. By operating on radically different is reduced through alternating steps of entropy compression principles, QML holds the potential of surpassing its and thermalization. A pivotal element of our approach is the classical counterparts in analyzing complex data distributions introduction of bidirectional cooling, characterized by distinct and identifying intricate patterns. However, quantum mechanics fixed points determined by the initial polarization's sign. This introduces unique challenges that are absent in classical bidirectional mechanism dynamically reduces entropy through ML approaches. A critical issue arises from the probabilistic repeated rounds of compression and thermalization, ultimately nature of quantum measurements. In QML, both training and yielding a state with enhanced polarization in the direction of inference rely on information extracted from probability distributions the initial bias. Remarkably, the protocol operates without associated with the measurements used in the protocol, requiring prior knowledge of the initial bias direction (i.e. the such as the expectation value of an observable.


Order-Optimal Instance-Dependent Bounds for Offline Reinforcement Learning with Preference Feedback

Chen, Zhirui, Tan, Vincent Y. F.

arXiv.org Machine Learning

We consider offline reinforcement learning (RL) with preference feedback in which the implicit reward is a linear function of an unknown parameter. Given an offline dataset, our objective consists in ascertaining the optimal action for each state, with the ultimate goal of minimizing the {\em simple regret}. We propose an algorithm, \underline{RL} with \underline{L}ocally \underline{O}ptimal \underline{W}eights or {\sc RL-LOW}, which yields a simple regret of $\exp ( - \Omega(n/H) )$ where $n$ is the number of data samples and $H$ denotes an instance-dependent hardness quantity that depends explicitly on the suboptimality gap of each action. Furthermore, we derive a first-of-its-kind instance-dependent lower bound in offline RL with preference feedback. Interestingly, we observe that the lower and upper bounds on the simple regret match order-wise in the exponent, demonstrating order-wise optimality of {\sc RL-LOW}. In view of privacy considerations in practical applications, we also extend {\sc RL-LOW} to the setting of $(\varepsilon,\delta)$-differential privacy and show, somewhat surprisingly, that the hardness parameter $H$ is unchanged in the asymptotic regime as $n$ tends to infinity; this underscores the inherent efficiency of {\sc RL-LOW} in terms of preserving the privacy of the observed rewards. Given our focus on establishing instance-dependent bounds, our work stands in stark contrast to previous works that focus on establishing worst-case regrets for offline RL with preference feedback.


Mixed Robust/Average Submodular Partitioning: Fast Algorithms, Guarantees, and Applications Shengjie Wang

Neural Information Processing Systems

We investigate two novel mixed robust/average-case submodular data partitioning problems that we collectively call Submodular Partitioning. These problems generalize purely robust instances of the problem, namely max-min submodular fair allocation (SFA) [12] and min-max submodular load balancing (SLB) [25], and also average-case instances, that is the submodular welfare problem (SWP) [26] and submodular multiway partition (SMP) [5]. While the robust versions have been studied in the theory community [11, 12, 16, 25, 26], existing work has focused on tight approximation guarantees, and the resultant algorithms are not generally scalable to large real-world applications. This is in contrast to the average case, where most of the algorithms are scalable. In the present paper, we bridge this gap, by proposing several new algorithms (including greedy, majorization-minimization, minorization-maximization, and relaxation algorithms) that not only scale to large datasets but that also achieve theoretical approximation guarantees comparable to the state-of-the-art. We moreover provide new scalable algorithms that apply to additive combinations of the robust and average-case objectives. We show that these problems have many applications in machine learning (ML), including data partitioning and load balancing for distributed ML, data clustering, and image segmentation. We empirically demonstrate the efficacy of our algorithms on real-world problems involving data partitioning for distributed optimization (of convex and deep neural network objectives), and also purely unsupervised image segmentation.


On Optimal Strategies for Wordle and General Guessing Games

Cunanan, Michael, Thielscher, Michael

arXiv.org Artificial Intelligence

The recent popularity of Wordle has revived interest in guessing games. We develop a general method for finding optimal strategies for guessing games while avoiding an exhaustive search. Our main contributions are several theorems that build towards a general theory to prove the optimality of a strategy for a guessing game. This work is developed to apply to any guessing game, but we use Wordle as an example to present concrete results.


Generalized Identifiability Bounds for Mixture Models with Grouped Samples

Vandermeulen, Robert A., Saitenmacher, René

arXiv.org Artificial Intelligence

Recent work has shown that finite mixture models with $m$ components are identifiable, while making no assumptions on the mixture components, so long as one has access to groups of samples of size $2m-1$ which are known to come from the same mixture component. In this work we generalize that result and show that, if every subset of $k$ mixture components of a mixture model are linearly independent, then that mixture model is identifiable with only $(2m-1)/(k-1)$ samples per group. We further show that this value cannot be improved. We prove an analogous result for a stronger form of identifiability known as "determinedness" along with a corresponding lower bound. This independence assumption almost surely holds if mixture components are chosen randomly from a $k$-dimensional space. We describe some implications of our results for multinomial mixture models and topic modeling.


Information-Theoretic Bayes Risk Lower Bounds for Realizable Models

Nokleby, Matthew, Beirami, Ahmad

arXiv.org Machine Learning

We derive information-theoretic lower bounds on the Bayes risk and generalization error of realizable machine learning models. In particular, we employ an analysis in which the rate-distortion function of the model parameters bounds the required mutual information between the training samples and the model parameters in order to learn a model up to a Bayes risk constraint. For realizable models, we show that both the rate distortion function and mutual information admit expressions that are convenient for analysis. For models that are (roughly) lower Lipschitz in their parameters, we bound the rate distortion function from below, whereas for VC classes, the mutual information is bounded above by $d_\mathrm{vc}\log(n)$. When these conditions match, the Bayes risk with respect to the zero-one loss scales no faster than $\Omega(d_\mathrm{vc}/n)$, which matches known outer bounds and minimax lower bounds up to logarithmic factors. We also consider the impact of label noise, providing lower bounds when training and/or test samples are corrupted.


TensorPlan and the Few Actions Lower Bound for Planning in MDPs under Linear Realizability of Optimal Value Functions

Weisz, Gellért, Szepesvári, Csaba, György, András

arXiv.org Machine Learning

We consider the minimax query complexity of online planning with a generative model in fixed-horizon Markov decision processes (MDPs) with linear function approximation. Following recent works, we consider broad classes of problems where either (i) the optimal value function $v^\star$ or (ii) the optimal action-value function $q^\star$ lie in the linear span of some features; or (iii) both $v^\star$ and $q^\star$ lie in the linear span when restricted to the states reachable from the starting state. Recently, Weisz et al. (2021b) showed that under (ii) the minimax query complexity of any planning algorithm is at least exponential in the horizon $H$ or in the feature dimension $d$ when the size $A$ of the action set can be chosen to be exponential in $\min(d,H)$. On the other hand, for the setting (i), Weisz et al. (2021a) introduced TensorPlan, a planner whose query cost is polynomial in all relevant quantities when the number of actions is fixed. Among other things, these two works left open the question whether polynomial query complexity is possible when $A$ is subexponential in $min(d,H)$. In this paper we answer this question in the negative: we show that an exponentially large lower bound holds when $A=\Omega(\min(d^{1/4},H^{1/2}))$, under either (i), (ii) or (iii). In particular, this implies a perhaps surprising exponential separation of query complexity compared to the work of Du et al. (2021) who prove a polynomial upper bound when (iii) holds for all states. Furthermore, we show that the upper bound of TensorPlan can be extended to hold under (iii) and, for MDPs with deterministic transitions and stochastic rewards, also under (ii).