AITopics | Optimization

The Double-Edged Sword of Implicit Bias: Generalization vs. Robustness in ReLU Networks

Neural Information Processing SystemsApr-25-2026, 13:48:16 GMT

In this work, we study the implications of the implicit bias of gradient flow on generalization and adversarial robustness in ReLU networks. We focus on a setting where the data consists of clusters and the correlations between cluster means are small, and show that in two-layer ReLU networks gradient flow is biased towards solutions that generalize well, but are vulnerable to adversarial examples. Our results hold even in cases where the network is highly overparameterized. Despite the potential for harmful overfitting in such settings, we prove that the implicit bias of gradient flow prevents it. However, the implicit bias also leads to non-robust solutions (susceptible to small adversarial ℓ2-perturbations), even though robust networks that fit the data exist.

artificial intelligence, machine learning, optimization problem, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

Add feedback

AGradient Method for Multilevel Optimization Ryo Sato The University of Tokyo Mirai Tanaka The Institute of Statistical Mathematics RIKEN Akiko Takeda The University of Tokyo RIKEN

Neural Information Processing SystemsApr-25-2026, 13:46:13 GMT

Although application examples of multilevel optimization have already been discussed since the 1990s, the development of solution methods was almost limited to bilevel cases due to the difficulty of the problem. In recent years, in machine learning, Franceschi et al. have proposed a method for solving bilevel optimization problems by replacing their lower-level problems with the T steepest descent update equations with some prechosen iteration number T. In this paper, we have developed a gradient-based algorithm for multilevel optimization with n levels based on their idea and proved that our reformulation asymptotically converges to the original multilevel problem. As far as we know, this is one of the first algorithms with some theoretical guarantee for multilevel optimization. Numerical experiments show that a trilevel hyperparameter learning model considering data poisoning produces more stable prediction results than an existing bilevel hyperparameter learning model in noisy data settings.

artificial intelligence, machine learning, optimization problem, (13 more...)

Neural Information Processing Systems

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.76)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)

Add feedback

3de568f8597b94bda53149c7d7f5958c-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 13:46:09 GMT

artificial intelligence, machine learning, optimization problem, (13 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.75)

Add feedback

1bd8cfc0e4c53869b7f1d0ed4b1e78e1-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 13:32:40 GMT

artificial intelligence, machine learning, optimization problem, (16 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Calibrated Data-Dependent Constraints with Exact Satisfaction Guarantees

Neural Information Processing SystemsApr-25-2026, 12:41:07 GMT

We consider the task of training machine learning models with data-dependent constraints. Such constraints often arise as empirical versions of expected value constraints that enforce fairness or stability goals. We reformulate data-dependent constraints so that they are calibrated: enforcing the reformulated constraints guarantees that their expected value counterparts are satisfied with a user-prescribed probability. The resulting optimization problem is amendable to standard stochastic optimization algorithms, and we demonstrate the efficacy of our method on a fairness-sensitive classification task where we wish to guarantee the classifier's fairness (at test time).

artificial intelligence, constraint, machine learning, (15 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Add feedback

1b0da24d136f46bfaee78e8da907127e-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 12:24:43 GMT

artificial intelligence, machine learning, optimization problem, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

398475c83b47075e8897a083e97eb9f0-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 12:22:51 GMT

We revisit first-order optimization under local information constraints such as local privacy, gradient quantization, and computational constraints limiting access to a few coordinates of the gradient. In this setting, the optimization algorithm is not allowed to directly access the complete output of the gradient oracle, but only gets limited information about it subject to the local information constraints. We study the role of adaptivity in processing the gradient output to obtain this limited information from it. We consider optimization for both convex and strongly convex functions and obtain tight or nearly tight lower bounds for the convergence rate, when adaptive gradient processing is allowed. Prior work was restricted to convex functions and allowed only nonadaptive processing of gradients. For both of these function classes and for the three information constraints mentioned above, our lower bound implies that adaptive processing of gradients cannot outperform nonadaptive processing in most regimes of interest. We complement these results by exhibiting a natural optimization problem under information constraints for which adaptive processing of gradient strictly outperforms nonadaptive processing.

artificial intelligence, constraint, machine learning, (19 more...)

Neural Information Processing Systems

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

Add feedback

398475c83b47075e8897a083e97eb9f0-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 12:22:48 GMT

artificial intelligence, constraint, machine learning, (16 more...)

Neural Information Processing Systems

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.72)

Add feedback

BCDNets: Scalable Variational Approaches for Bayesian Causal Discovery

Neural Information Processing SystemsApr-25-2026, 12:06:33 GMT

A structural equation model (SEM) is an effective framework to reason over causal relationships represented via a directed acyclic graph (DAG). Recent advances have enabled effective maximum-likelihood point estimation of DAGs from observational data. However, a point estimate may not accurately capture the uncertainty in inferring the underlying graph in practical scenarios, wherein the true DAG is non-identifiable and/or the observed dataset is limited. We propose Bayesian Causal Discovery Nets (BCD Nets), a variational inference framework for estimating a distribution over DAGs characterizing a linear-Gaussian SEM. Developing a full Bayesian posterior over DAGs is challenging due to the the discrete and combinatorial nature of graphs. We analyse key design choices for scalable VI over DAGs, such as 1) the parametrization of DAGs via an expressive variational family, 2) a continuous relaxation that enables low-variance stochastic optimization, and 3) suitable priors over the latent variables. We provide a series of experiments on real and synthetic data showing that BCDNets outperform maximum-likelihood methods on standard causal discovery metrics such as structural Hamming distance in low data regimes.

artificial intelligence, machine learning, posterior, (15 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.46)

Genre: Research Report (0.93)

Industry: Health & Medicine (1.00)

Technology: