Goto

Collaborating Authors

 slog


Appendix

Neural Information Processing Systems

This is only for the ease of visualization. For linear MDP,In the generative model setting, Agarwal et al. [2020] shows model-based approach is still minimax optimal O((1 γ) 3SA/2)byusing as-absorbing MDP construction andthismodelbased technique is later reused for other more general settings (e.g. Itrequires high probability guarantee for learning optimal policyforany reward function, which is strictly stronger than the standard learning task that one only needs to learn to optimal policy for a fixed reward. B.2 GeneralabsorbingMDP The general absorbing MDP is defined as follows: for a fixed states and a sequence {ut}Ht=1, MDPMs,{ut}Ht=1 is identical toM for all states excepts, and state s is absorbing in the sense PMs,{ut}Ht=1(s|s,a) = 1 for all a, and the instantaneous reward at timet is rt(s,a) = ut for all a A. Also,weusetheshorthand notationVπ{s,ut} forVπs,Ms,{u We focus on the first claim. Later we shall remove the conditional onN (see SectionB.7). We use the singleton-absorbing MDPMs,{u?t}Ht=1 to handle the case (recallu?t


Firstorderexpansionofconvexregularized estimators

Neural Information Processing Systems

Such first order expansion implies that the risk ofˆβ is asymptotically the same as the risk ofη which leads to a precise characterization of the MSE ofˆβ; this characterization takes aparticularly simple form for isotropic design. Such first order expansion also leads to inference results based onˆβ. We provide sufficient conditions for theexistence ofsuch first order expansion forthree regularizers: theLasso inits constrainedform,thelassoinitspenalizedform,andtheGroup-Lasso.Theresults apply to general loss functions under some conditions and those conditions are satisfied for the squared loss in linear regression and for the logistic loss in the logisticmodel.


min

Neural Information Processing Systems

Recall thatx = argmina Ax>θ so x can be viewed as a deterministic functionθ . " log p(zn|θ) (1/|Nε|) P Since Rmax is the upper bound of maximum expected reward, the second term can be bounded 2Rmaxγ. We letΦ R|A| d as the feature matrix where each row ofΦrepresent each action inA. We summarize the procedure of estimating t,It inAlgorithm3. LetX denote the feasible set.


Near-Optimal and Tractable Estimation under Shift-Invariance

Ostrovskii, Dmitrii M.

arXiv.org Machine Learning

How hard is it to estimate a discrete-time signal $(x_{1}, ..., x_{n}) \in \mathbb{C}^n$ satisfying an unknown linear recurrence relation of order $s$ and observed in i.i.d. complex Gaussian noise? The class of all such signals is parametric but extremely rich: it contains all exponential polynomials over $\mathbb{C}$ with total degree $s$, including harmonic oscillations with $s$ arbitrary frequencies. Geometrically, this class corresponds to the projection onto $\mathbb{C}^{n}$ of the union of all shift-invariant subspaces of $\mathbb{C}^\mathbb{Z}$ of dimension $s$. We show that the statistical complexity of this class, as measured by the squared minimax radius of the $(1-\delta)$-confidence $\ell_2$-ball, is nearly the same as for the class of $s$-sparse signals, namely $O\left(s\log(en) + \log(\delta^{-1})\right) \cdot \log^2(es) \cdot \log(en/s).$ Moreover, the corresponding near-minimax estimator is tractable, and it can be used to build a test statistic with a near-minimax detection threshold in the associated detection problem. These statistical results rest upon an approximation-theoretic one: we show that finite-dimensional shift-invariant subspaces admit compactly supported reproducing kernels whose Fourier spectra have nearly the smallest possible $\ell_p$-norms, for all $p \in [1,+\infty]$ at once.


Concentration of a sparse Bayesian model with Horseshoe prior in estimating high-dimensional precision matrix

Mai, The Tien

arXiv.org Machine Learning

Precision matrices are crucial in many fields such as social networks, neuroscience, and economics, representing the edge structure of Gaussian graphical models (GGMs), where a zero in an off-diagonal position of the precision matrix indicates conditional independence between nodes. In high-dimensional settings where the dimension of the precision matrix $p$ exceeds the sample size $n$ and the matrix is sparse, methods like graphical Lasso, graphical SCAD, and CLIME are popular for estimating GGMs. While frequentist methods are well-studied, Bayesian approaches for (unstructured) sparse precision matrices are less explored. The graphical horseshoe estimate by \citet{li2019graphical}, applying the global-local horseshoe prior, shows superior empirical performance, but theoretical work for sparse precision matrix estimations using shrinkage priors is limited. This paper addresses these gaps by providing concentration results for the tempered posterior with the fully specified horseshoe prior in high-dimensional settings. Moreover, we also provide novel theoretical results for model misspecification, offering a general oracle inequality for the posterior.


Differentially Private Optimization with Sparse Gradients

Ghazi, Badih, Guzmán, Cristóbal, Kamath, Pritish, Kumar, Ravi, Manurangsi, Pasin

arXiv.org Machine Learning

Motivated by applications of large embedding models, we study differentially private (DP) optimization problems under sparsity of individual gradients. We start with new near-optimal bounds for the classic mean estimation problem but with sparse data, improving upon existing algorithms particularly for the high-dimensional regime. Building on this, we obtain pure- and approximate-DP algorithms with almost optimal rates for stochastic convex optimization with sparse gradients; the former represents the first nearly dimension-independent rates for this problem. Finally, we study the approximation of stationary points for the empirical loss in approximate-DP optimization and obtain rates that depend on sparsity instead of dimension, modulo polylogarithmic factors.


Learning To Guide Human Decision Makers With Vision-Language Models

Banerjee, Debodeep, Teso, Stefano, Sayin, Burcu, Passerini, Andrea

arXiv.org Artificial Intelligence

There is increasing interest in developing AIs for assisting human decision-making in high-stakes tasks, such as medical diagnosis, for the purpose of improving decision quality and reducing cognitive strain. Mainstream approaches team up an expert with a machine learning model to which safer decisions are offloaded, thus letting the former focus on cases that demand their attention. his separation of responsibilities setup, however, is inadequate for high-stakes scenarios. On the one hand, the expert may end up over-relying on the machine's decisions due to anchoring bias, thus losing the human oversight that is increasingly being required by regulatory agencies to ensure trustworthy AI. On the other hand, the expert is left entirely unassisted on the (typically hardest) decisions on which the model abstained. As a remedy, we introduce learning to guide (LTG), an alternative framework in which - rather than taking control from the human expert - the machine provides guidance useful for decision making, and the human is entirely responsible for coming up with a decision. In order to ensure guidance is interpretable} and task-specific, we develop SLOG, an approach for turning any vision-language model into a capable generator of textual guidance by leveraging a modicum of human feedback. Our empirical evaluation highlights the promise of \method on a challenging, real-world medical diagnosis task.


SLOG: A Structural Generalization Benchmark for Semantic Parsing

Li, Bingzhi, Donatelli, Lucia, Koller, Alexander, Linzen, Tal, Yao, Yuekun, Kim, Najoung

arXiv.org Artificial Intelligence

The goal of compositional generalization benchmarks is to evaluate how well models generalize to new complex linguistic expressions. Existing benchmarks often focus on lexical generalization, the interpretation of novel lexical items in syntactic structures familiar from training; structural generalization tasks, where a model needs to interpret syntactic structures that are themselves unfamiliar from training, are often underrepresented, resulting in overly optimistic perceptions of how well models can generalize. We introduce SLOG, a semantic parsing dataset that extends COGS (Kim and Linzen, 2020) with 17 structural generalization cases. In our experiments, the generalization accuracy of Transformer models, including pretrained ones, only reaches 40.6%, while a structure-aware parser only achieves 70.8%. These results are far from the near-perfect accuracy existing models achieve on COGS, demonstrating the role of SLOG in foregrounding the large discrepancy between models' lexical and structural generalization capacities.


Learning to Guide Human Experts via Personalized Large Language Models

Banerjee, Debodeep, Teso, Stefano, Passerini, Andrea

arXiv.org Artificial Intelligence

Consider the problem of diagnosing lung pathologies based on the outcome of an X-ray scan. This task cannot be fully automated, for safety reasons, necessitating human supervision at some step of the process. At the same time, it is difficult for human experts to tackle it alone due to how sensitive the decision is, especially under time pressure. High-stakes tasks like this are natural candidates for hybrid decision making (HDM) approaches that support human decision makers by leveraging AI technology for the purpose of improving decision quality and lowering cognitive effort, without compromising control. Most current approaches to HDM rely on a learning to defer (LTD) setup, in which a machine learning model first assesses whether a decision can be taken in autonomy - i.e., it is either safe or can be answered with confidence - and defers it to a human partner whenever this is not the case [Madras et al., 2018, Mozannar and Sontag, 2020, Keswani et al., 2022, Verma and Nalisnick, 2022, Liu et al., 2022]. Other forms of HDM, like learning to complement [Wilder et al., 2021], prediction under human assistance [De et al., 2020], and algorithmic triage [Raghu et al., 2019, Okati et al., 2021] follow a similar pattern.


Provably Efficient Offline Multi-agent Reinforcement Learning via Strategy-wise Bonus

Cui, Qiwen, Du, Simon S.

arXiv.org Artificial Intelligence

This paper considers offline multi-agent reinforcement learning. We propose the strategy-wise concentration principle which directly builds a confidence interval for the joint strategy, in contrast to the point-wise concentration principle that builds a confidence interval for each point in the joint action space. For two-player zero-sum Markov games, by exploiting the convexity of the strategy-wise bonus, we propose a computationally efficient algorithm whose sample complexity enjoys a better dependency on the number of actions than the prior methods based on the point-wise bonus. Furthermore, for offline multi-agent general-sum Markov games, based on the strategy-wise bonus and a novel surrogate function, we give the first algorithm whose sample complexity only scales $\sum_{i=1}^mA_i$ where $A_i$ is the action size of the $i$-th player and $m$ is the number of players. In sharp contrast, the sample complexity of methods based on the point-wise bonus would scale with the size of the joint action space $\Pi_{i=1}^m A_i$ due to the curse of multiagents. Lastly, all of our algorithms can naturally take a pre-specified strategy class $\Pi$ as input and output a strategy that is close to the best strategy in $\Pi$. In this setting, the sample complexity only scales with $\log |\Pi|$ instead of $\sum_{i=1}^mA_i$.