Goto

Collaborating Authors

 Search



Online Minimax Multiobjective Optimization: Multicalibeating and Other Applications Daniel Lee

Neural Information Processing Systems

We introduce a simple but general online learning framework in which a learner plays against an adversary in a vector-valued game that changes every round. Even though the learner's objective is not convex-concave (and so the minimax theorem does not apply), we give a simple algorithm that can compete with the setting in which the adversary must announce their action first, with optimally diminishing regret.




e1c13a13fc6b87616b787b986f98a111-Supplemental.pdf

Neural Information Processing Systems

This section gives the worst-case time analysis for Algorithm 1. This gives the bound shown in Eq. 3. B.1 Loss function space L Recall that the loss function search space is defined as: (Loss Function Search Space) L::= targeted Loss, n with Z | untargeted Loss with Z | targeted Loss, n - untargeted Loss with Z Z::= logits | probs To refer to different settings, we use the following notation: U: for the untargeted loss, T: for the targeted loss, D: for the targeted untargeted loss L: for using logits, and P: for using probs Effectively, the search space includes all the possible combinations expect that the cross-entropy loss supports only probability. B.2 Attack Algorithm & Parameters Space S Recall the attack space defined as: S::= S; S | randomize S | EOT S, n | repeat S, n | try S for n | Attack with params with loss L randomize The type of every parameter is either integer or float. Generic parameters and the supported loss for each attack algorithm are defined in Table 4. B.3 Search space conditioned on network property Following Stutz et al. (2020), we use the robust test error (Rerr) metric We define robust accuracy as 1 Rerr. Note however that Rerr defined in Eq. 5 has intractable maximization problem in the denominator, Note that we use a zero knowledge detector model, so none of the attacks in the search space are aware of the detector.






SAND: One-Shot Feature Selection with Additive Noise Distortion

arXiv.org Artificial Intelligence

Feature selection is a critical step in data-driven applications, reducing input dimensionality to enhance learning accuracy, computational efficiency, and interpretability. Existing state-of-the-art methods often require post-selection retraining and extensive hyperparameter tuning, complicating their adoption. We introduce a novel, non-intrusive feature selection layer that, given a target feature count $k$, automatically identifies and selects the $k$ most informative features during neural network training. Our method is uniquely simple, requiring no alterations to the loss function, network architecture, or post-selection retraining. The layer is mathematically elegant and can be fully described by: \begin{align} \nonumber \tilde{x}_i = a_i x_i + (1-a_i)z_i \end{align} where $x_i$ is the input feature, $\tilde{x}_i$ the output, $z_i$ a Gaussian noise, and $a_i$ trainable gain such that $\sum_i{a_i^2}=k$. This formulation induces an automatic clustering effect, driving $k$ of the $a_i$ gains to $1$ (selecting informative features) and the rest to $0$ (discarding redundant ones) via weighted noise distortion and gain normalization. Despite its extreme simplicity, our method delivers state-of-the-art performance on standard benchmark datasets and a novel real-world dataset, outperforming or matching existing approaches without requiring hyperparameter search for $k$ or retraining. Theoretical analysis in the context of linear regression further validates its efficacy. Our work demonstrates that simplicity and performance are not mutually exclusive, offering a powerful yet straightforward tool for feature selection in machine learning.