Goto

Collaborating Authors

 existence


Learning Gradient Boosted Decision Trees with Algorithmic Recourse

Neural Information Processing Systems

This paper proposes a new algorithm for learning gradient boosted decision trees while ensuring the existence of recourse actions. Algorithmic recourse aims to provide a recourse action for altering the undesired prediction result given by a model. While existing studies often focus on extracting valid and executable actions from a given learned model, such reasonable actions do not always exist for models optimized solely for predictive accuracy. To address this issue, recent studies proposed a framework for learning a model while guaranteeing the existence of reasonable actions with high probability. However, these methods can not be applied to gradient boosted decision trees, which are renowned as one of the most popular models for tabular datasets. We propose an efficient gradient boosting algorithm that takes recourse guarantee into account, while maintaining the same time complexity as the standard ones. We also propose a post-processing method for refining a learned model under the constraint of a recourse guarantee and provide a PAC-style analysis of the refined model. Experimental results demonstrated that our method successfully provided reasonable actions to more instances than the baselines without significantly degrading accuracy and computational efficiency.


Which Algorithms Have Tight Generalization Bounds?

Neural Information Processing Systems

We study which machine learning algorithms have tight generalization bounds with respect to a given collection of population distributions. Our results build on and extend the recent work of Gastpar et al. (2023). First, we present conditions that preclude the existence of tight generalization bounds. Specifically, we show that algorithms that have certain inductive biases that cause them to be unstable do not admit tight generalization bounds. Next, we show that algorithms that are sufficiently loss-stable do have tight generalization bounds. We conclude with a simple characterization that relates the existence of tight generalization bounds to the conditional variance of the algorithm's loss.


Learning Gradient Boosted Decision Trees with Algorithmic Recourse

Neural Information Processing Systems

This paper proposes a new algorithm for learning gradient boosted decision trees while ensuring the existence of recourse actions. Algorithmic recourse aims to provide a recourse action for altering the undesired prediction result given by a model. While existing studies often focus on extracting valid and executable actions from a given learned model, such reasonable actions do not always exist for models optimized solely for predictive accuracy. To address this issue, recent studies proposed a framework for learning a model while guaranteeing the existence of reasonable actions with high probability. However, these methods can not be applied to gradient boosted decision trees, which are renowned as one of the most popular models for tabular datasets. We propose an efficient gradient boosting algorithm that takes recourse guarantee into account, while maintaining the same time complexity as the standard ones. We also propose a post-processing method for refining a learned model under the constraint of a recourse guarantee and provide a PAC-style analysis of the refined model. Experimental results demonstrated that our method successfully provided reasonable actions to more instances than the baselines without significantly degrading accuracy and computational efficiency.


A Minimalist Example of Edge-of-Stability and Progressive Sharpening

Neural Information Processing Systems

Recent advances in deep learning optimization have unveiled two intriguing phenomena under large learning rates: Edge of Stability (EoS) and Progressive Sharpening (PS), challenging classical Gradient Descent (GD) analyses.


Nonparametric Instrumental Variable Analysis Without Structural Equations: Debiased Inference on Functionals of Inverse Problems with No Solutions

arXiv.org Machine Learning

Instrumental variable (IV) analyses generally start by posing a structural equation: Y = hstructural(X)+ฯต, (1) where hstructural represents the causal effect of X on Y, and X and ฯต may be endogenous (E[ฯต | X] = 0). Then given an exogenous instrument Z satisfying the exclusion restriction, the common statistical solution given joint observations of W = (X,Y,Z) P is to conduct inference on some continuous linear functional h 7 EP[m(W;h)] of a solution h H to the linear equation implied by exclusion: TPh = rP, (2) where TP: H G maps h 7 argming GEP(h(X) g(Z))2, rP = argminr GEP(Y r(Z))2, and H, G are closed linear subspaces of square-integrable functions of X and of Z, respectively. For example, if these are all square-integrable functions, then (TPh)(Z) = EP[h(X) | Z] is the conditional expectation.


A Note on Non-Negative $L_1$-Approximating Polynomials

arXiv.org Machine Learning

$L_1$-Approximating polynomials, i.e., polynomials that approximate indicator functions in $L_1$-norm under certain distributions, are widely used in computational learning theory. We study the existence of \textit{non-negative} $L_1$-approximating polynomials with respect to Gaussian distributions. This is a stronger requirement than $L_1$-approximation but weaker than sandwiching polynomials (which themselves have many applications). These non-negative approximating polynomials have recently found uses in smoothed learning from positive-only examples. In this short note, we prove that every class of sets with Gaussian surface area (GSA) at most $ฮ“$ under the standard Gaussian admits degree-$k$ non-negative polynomials that $\eps$-approximate its indicator functions in $L_1$-norm, for $k=\tilde{O}(ฮ“^2/\varepsilon^2)$. Equivalently, finite GSA implies $L_1$-approximation with the stronger pointwise guarantee that the approximating polynomial has range contained in $[0,\infty)$. Up to a constant-factor, this matches the degree of the best currently known Gaussian $L_1$-approximation degree bound without the non-negativity constraint.



The optimal betting wealth growth rate

arXiv.org Machine Learning

This paper characterizes the best possible rate of growth of wealth in a Kelly betting game when repeatedly betting against a general i.i.d. null hypothesis $\mathscr{P}$, but the data are drawn i.i.d from an arbitrary alternative $Q$. We prove that it equals $\lim_{n \to \infty}n^{-1}\inf_{P \in (\mathscr P)^n)^{\circ\circ}} \mathrm{KL}(Q^n,P)$, where ${\mathscr P}^n = \{P^n: P \in \mathscr{P}\}$ and $(\mathscr {P}^n)^{\circ\circ}$ is its bipolar, i.e., this rate is achievable and one cannot do better. This quantity is in general smaller than a more popular quantity in the literature, $\mathrm{KL}_{\inf}(Q,\mathscr{P}) := \inf_{P \in \mathscr P}\mathrm{KL}(Q,P)$. If $\mathrm{KL}_{\mathrm{inf}}(\cdot,\mathscr P)$ is weakly lowersemicontinuous (w.l.s.c.) at $Q$, we show that the two quantities are equal; in particular, this happens when $\mathscr P$ is weakly compact. For simple alternatives, we provide the first matching necessary and sufficient condition for when power-one sequential tests exist (without assumptions on $\mathscr P, Q$). We also derive the optimal worst-case growth rate against composite $\mathscr Q$. We emphasize that test supermartingales on reduced filtrations suffice for all i.i.d. testing problems, and more general e-processes are not required. We thus completely generalize the recent results of Larsson et al.~\cite{larsson2025numeraire} to the sequential setting.


The Many Faces of Adversarial Risk

Neural Information Processing Systems

Adversarial risk quantifies the performance of classifiers on adversarially perturbed data. Numerous definitions of adversarial risk--not all mathematically rigorous and differing subtly in the details--have appeared in the literature. In this paper, we revisit these definitions, make them rigorous, and critically examine their similarities and differences. Our technical tools derive from optimal transport, robust statistics, functional analysis, and game theory. Our contributions include the following: generalizing Strassen's theorem to the unbalanced optimal transport setting with applications to adversarial classification with unequal priors; showing an equivalence between adversarial robustness and robust hypothesis testing with -Wasserstein uncertainty sets; proving the existence of a pure Nash equilibrium in the two-player game between the adversary and the algorithm; and characterizing adversarial risk by the minimum Bayes error between a pair of distributions belonging to the -Wasserstein uncertainty sets. Our results generalize and deepen recently discovered connections between optimal transport and adversarial robustness and reveal new connections to Choquet capacities and game theory.


The Many Faces of Adversarial Risk

Neural Information Processing Systems

Adversarial risk quantifies the performance of classifiers on adversarially perturbed data. Numerous definitions of adversarial risk--not all mathematically rigorous and differing subtly in the details--have appeared in the literature. In this paper, we revisit these definitions, make them rigorous, and critically examine their similarities and differences. Our technical tools derive from optimal transport, robust statistics, functional analysis, and game theory. Our contributions include the following: generalizing Strassen's theorem to the unbalanced optimal transport setting with applications to adversarial classification with unequal priors; showing an equivalence between adversarial robustness and robust hypothesis testing with -Wasserstein uncertainty sets; proving the existence of a pure Nash equilibrium in the two-player game between the adversary and the algorithm; and characterizing adversarial risk by the minimum Bayes error between a pair of distributions belonging to the -Wasserstein uncertainty sets. Our results generalize and deepen recently discovered connections between optimal transport and adversarial robustness and reveal new connections to Choquet capacities and game theory.