Goto

Collaborating Authors

 lemmaa


A Direct Approach for Handling Contextual Bandits with Latent State Dynamics

Li, Zhen, Stoltz, Gilles

arXiv.org Machine Learning

We revisit the finite-armed linear bandit model by Nelson et al. (2022), where contexts and rewards are governed by a finite hidden Markov chain. Nelson et al. (2022) approach this model by a reduction to linear contextual bandits; but to do so, they actually introduce a simplification in which rewards are linear functions of the posterior probabilities over the hidden states given the observed contexts, rather than functions of the hidden states themselves. Their analysis (but not their algorithm) also does not take into account the estimation of the HMM parameters, and only tackles expected, not high-probability, bounds, which suffer in addition from unnecessary complex dependencies on the model (like reward gaps). We instead study the more natural model incorporating direct dependencies in the hidden states (on top of dependencies on the observed contexts, as is natural for contextual bandits) and also obtain stronger, high-probability, regret bounds for a fully adaptive strategy that estimates HMM parameters online. These bounds do not depend on the reward functions and only depend on the model through the estimation of the HMM parameters.


Learning in Prophet Inequalities with Noisy Observations

Kim, Jung-hun, Perchet, Vianney

arXiv.org Machine Learning

We study the prophet inequality, a fundamental problem in online decision-making and optimal stopping, in a practical setting where rewards are observed only through noisy realizations and reward distributions are unknown. At each stage, the decision-maker receives a noisy reward whose true value follows a linear model with an unknown latent parameter, and observes a feature vector drawn from a distribution. To address this challenge, we propose algorithms that integrate learning and decision-making via lower-confidence-bound (LCB) thresholding. In the i.i.d.\ setting, we establish that both an Explore-then-Decide strategy and an $\varepsilon$-Greedy variant achieve the sharp competitive ratio of $1 - 1/e$, under a mild condition on the optimal value. For non-identical distributions, we show that a competitive ratio of $1/2$ can be guaranteed against a relaxed benchmark. Moreover, with limited window access to past rewards, the tight ratio of $1/2$ against the optimal benchmark is achieved.


Safe Distributionally Robust Feature Selection under Covariate Shift

Hanada, Hiroyuki, Akahane, Satoshi, Hashimoto, Noriaki, Takeno, Shion, Takeuchi, Ichiro

arXiv.org Machine Learning

In practical machine learning, the environments encountered during the model development and deployment phases often differ, especially when a model is used by many users in diverse settings. Learning models that maintain reliable performance across plausible deployment environments is known as distributionally robust (DR) learning. In this work, we study the problem of distributionally robust feature selection (DRFS), with a particular focus on sparse sensing applications motivated by industrial needs. In practical multi-sensor systems, a shared subset of sensors is typically selected prior to deployment based on performance evaluations using many available sensors. At deployment, individual users may further adapt or fine-tune models to their specific environments. When deployment environments differ from those anticipated during development, this strategy can result in systems lacking sensors required for optimal performance. To address this issue, we propose safe-DRFS, a novel approach that extends safe screening from conventional sparse modeling settings to a DR setting under covariate shift. Our method identifies a feature subset that encompasses all subsets that may become optimal across a specified range of input distribution shifts, with finite-sample theoretical guarantees of no false feature elimination.


LearningSocialWelfareFunctions

Neural Information Processing Systems

Consider a standard decision making setting that includes a set of possible actions (decisions or policies), and a set of individuals who assign utilities to the actions.


e45caa3d5273d105b8d045e748636957-Supplemental-Conference.pdf

Neural Information Processing Systems

InFigure 7 of this Appendix, we show that indeed this is due to a decrease in the robustness slope. Across three different datasets, MNIST, CIFAR10, NewsGroup20, we see that increasing the number of tasks leads to a decrease in the robustness slope. Experiments on other languages For our experiments on multilingual generative models, we decided to use Greek and English because we were looking for a linguistic pair with different morphology,syntaxandphonology. This ensures that any benefits in terms of robustness are not coming from exposure to more data. Asshownin Figure 8,eventhough thetwomodels arestarting from roughly thesame perplexity,thebilingual model exhibits higher structural robustness in the presence of weight deletions.


d9d347f57ae11f34235b4555710547d8-Supplemental.pdf

Neural Information Processing Systems

Let X,Y,Z be random variables. Let g: X R be a measurable function, and let Ex Q[expg(x)] .Then DKL(P||Q)=sup Their work has built a connection between PACBayes meta-learning and Hierarchical Variational Bayes. In Appendix A.3 of [1], they give thegenerativegraph model formeta learning whereU W S (their notation usedψ instead of U). The proof technique is analogous to Theorem 5.1. LetΦ = (U,W1:n) be a collection of random variables whereΦ U Wn such thatΦandS1:n follow the joint distributionPΦ,S1;n. Based on Theorem 5.2, for the Meta-SGLD that satisfies Assumption 1, if we set Infact, the algorithm has anest-loop structure, we just list the abovesimple sub-structures for the firststepoftheproof.


AppendixOutline

Neural Information Processing Systems

Hence, we rely on subgradients defined in Equation 7. Since, many subgradient directions exist for the margin points, for consistency, we stick with xlγ(w;(x,y)) = {0}wheny w,x = γ. Note, that thesetofpoints inX satisfying this equality isazeromeasure set. For simplicity we shall treat the projection operation as just renormalizing w(t+1) to have unit norm,i.e., w(t+1) 2 = 1, t 0. This is not necessarily restrictive. A.1 TechnicalLemmas In this section we shall state some technical lemmas without proof, with references to works that contain the full proof. We shall use these in the following sections when proving our lemmas in Section5.


SupplementaryMaterialfor: AdversarialRegression withDoubly Non-negativeWeightingMatrices

Neural Information Processing Systems

A.1 ProofsofSection3 In the following, the symbolh, i will be used to represent both Frobenius norm of matrices and standard Euclidean norm of vectors. For the second part, letv be an eigenvector ofA corresponding to eigenvalueλmax(A). Incase the maximum eigenvalue ofT isnonpositive, then from Lemma A.1 we see that the objectivevalue of problem(A.2)evaluated For anp preal matrixA, its spectral radiusR(A)is defined as the largest absolute value of its eigenvalues. Then the matrixI A is invertible and all entries of(I A) 1 are nonnegative. Also the spectral radius of(γ?) 1bΩ12V(β)bΩ12 is smaller than1 by the feasibility ofγ? in problem (A.5c).


EscapingSaddle-PointFasterunder Interpolation-likeConditions

Neural Information Processing Systems

One of the fundamental aspects of over-parametrized models is that they are capable of interpolating the training data. We show that, under interpolation-like assumptions satisfied by the stochastic gradients in an overparametrization setting, thefirst-order oracle complexityofPerturbed Stochastic Gradient Descent (PSGD) algorithm toreach an -local-minimizer,matches the corresponding deterministic rateof O(1/2).