AITopics | Optimization

Searching for Optimal Per-Coordinate Step-sizes with Multidimensional Backtracking

Neural Information Processing SystemsApr-24-2026, 11:50:20 GMT

The backtracking line-search is an effective technique to automatically tune the step-size in smooth optimization. It guarantees similar performance to using the theoretically optimal step-size. Many approaches have been developed to instead tune per-coordinate step-sizes, also known as diagonal preconditioners, but none of the existing methods are provably competitive with the optimal per-coordinate stepsizes. We propose multidimensional backtracking, an extension of the backtracking line-search to find good diagonal preconditioners for smooth convex problems. Our key insight is that the gradient with respect to the step-sizes, also known as hypergradients, yields separating hyperplanes that let us search for good preconditioners using cutting-plane methods. As black-box cutting-plane approaches like the ellipsoid method are computationally prohibitive, we develop an efficient algorithm tailored to our setting. Multidimensional backtracking is provably competitive with the best diagonal preconditioner and requires no manual tuning.

artificial intelligence, machine learning, optimization problem, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Industry: Transportation > Air (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Mathematics of Computing (0.92)

Add feedback

0904c7edde20d7134a77fc7f9cd86ea2-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 11:07:31 GMT

artificial intelligence, inductive learning, machine learning, (14 more...)

Neural Information Processing Systems

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)

Add feedback

03a3655fff3e9bdea48de9f49e938e32-Paper.pdf

Neural Information Processing SystemsApr-24-2026, 10:52:34 GMT

artificial intelligence, machine learning, planning & scheduling, (17 more...)

Neural Information Processing Systems

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Controlled Sparsity via Constrained Optimization or: How ILearned to Stop Tuning Penalties and Love Constraints

Neural Information Processing SystemsApr-24-2026, 10:51:30 GMT

The performance of trained neural networks is robust to harsh levels of pruning. Coupled with the ever-growing size of deep learning models, this observation has motivated extensive research on learning sparse models. In this work, we focus on the task of controlling the level of sparsity when performing sparse learning. Existing methods based on sparsity-inducing penalties involve expensive trial-anderror tuning of the penalty factor, thus lacking direct control of the resulting model sparsity. In response, we adopt a constrained formulation: using the gate mechanism proposed by Louizos et al. [31], we formulate a constrained optimization problem where sparsification is guided by the training objective and the desired sparsity target in an end-to-end fashion. Experiments on CIFAR-{10, 100}, TinyImageNet, and ImageNet using WideResNet and ResNet{18, 50} models validate the effectiveness of our proposal and demonstrate that we can reliably achieve pre-determined sparsity targets without compromising on predictive performance.

artificial intelligence, constraint, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > Canada (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

089b592cccfafdca8e0178e85b609f19-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 10:51:27 GMT

artificial intelligence, constraint, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > Canada (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

0730b81dbc16cce7e85b519cb7fe5a8d-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 10:50:49 GMT

artificial intelligence, data mining, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > New York (0.28)

Genre: Workflow (0.46)

Industry: Education > Educational Setting > Online (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Oracle Complexity in Nonsmooth Nonconvex Optimization

Neural Information Processing SystemsApr-24-2026, 10:32:59 GMT

It is well-known that given a smooth, bounded-from-below, and possibly nonconvex function, standard gradient-based methods can find -stationary points (with gradient norm less than) in O(1/ 2) iterations. However, many important nonconvex optimization problems, such as those associated with training modern neural networks, are inherently not smooth, making these results inapplicable. In this paper, we study nonsmooth nonconvex optimization from an oracle complexity viewpoint, where the algorithm is assumed to be given access only to local information about the function at various points. We provide two main results (under mild assumptions): First, we consider the problem of getting near -stationary points. This is perhaps the most natural relaxation of finding -stationary points, which is impossible in the nonsmooth nonconvex case. We prove that this relaxed goal cannot be achieved efficiently, for any distance and smaller than some constants. Our second result deals with the possibility of tackling nonsmooth nonconvex optimization by reduction to smooth optimization: Namely, applying smooth optimization methods on a smooth approximation of the objective function. For this approach, we prove an inherent trade-off between oracle complexity and smoothness: On the one hand, smoothing a nonsmooth nonconvex function can be done very efficiently (e.g., by randomized smoothing), but with dimension-dependent factors in the smoothness parameter, which can strongly affect iteration complexity when plugging into standard smooth optimization methods. On the other hand, these dimension factors can be eliminated with suitable smoothing methods, but only by making the oracle complexity of the smoothing process exponentially large.

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

2022DOPE

Archana Bura

Neural Information Processing SystemsApr-24-2026, 09:51:03 GMT

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: North America > Canada (0.28)

Industry: Energy (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.30)

Add feedback

Over the Returned Counterfactuals

Neural Information Processing SystemsApr-24-2026, 09:34:32 GMT

In this appendix, we discuss a technique to optimize over the counterfactuals found by counterfactual explanation methods, such as [6]. We restate lemma 3.1 and provide a proof. Lemma 3.1 Assuming the counterfactual algorithm A (x) follows the form of the objective in equation 1, @@xcf G(x,A (x)) = 0, and m is the number of parameters in the model, we can write the derivative of counterfactual algorithm A with respect to model parameters as the Jacobian, @ @ A (x)= @2G(x,A (x)) @x2cf 1 G(x,xcf) (7) This problem is identical to a well-studied class of bi-level optimization problems in deep learning. In these problems, we must compute the derivative of a function with respect to some parameter (here) that includes an inner argmin, which itself depends on the parameter. We follow [44] to complete the proof.

artificial intelligence, counterfactual, machine learning, (17 more...)

Neural Information Processing Systems

Technology: