AITopics | gradient-based optimization

Collaborating Authors

gradient-based optimization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Composite Activation Function for Learning Stable Binary Representations

Park, Seokhun, Kim, Choeun, Lee, Kwanho, Park, Sehyun, Kong, Insung, Kim, Yongdai

arXiv.org Machine LearningMay-13-2026

Activation functions play a central role in neural networks by shaping internal representations. Recently, learning binary activation representations has attracted significant attention due to their advantages in computational and memory efficiency, as well as interpretability. However, training neural networks with Heaviside activations remains challenging, as their non-differentiability obstructs standard gradient-based optimization. In this paper, we propose Heavy Tailed Activation Function (HTAF), a smooth approximation to the Heaviside function that enables stable training with gradient-based optimization. We construct HTAF as a sigmoid hyperbolic tangent composite function and theoretically show that it maintains a large gradient mass around zero inputs while exhibiting slower gradient decay in the tail regions. We show that Spiking Neural Networks, Binary Neural Networks and Deep Heaviside neural Networks can be trained stably using HTAF with gradient-based optimization. Finally, we introduce Implicit Concept Bottleneck Models (ICBMs), an interpretable image model that leverages HTAF to induce discrete feature representations. Extensive experiments across various architectures and image datasets demonstrate that ICBM enables stable discretization while achieving prediction performance comparable to or better than standard models.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Machine Learning

2605.11558

Genre: Research Report > New Finding (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

How to Boost Any Loss Function

Neural Information Processing SystemsMar-18-2026, 18:07:01 GMT

Boosting is a highly successful ML-born optimization setting in which one is required to computationally efficiently learn arbitrarily good models based on the access to a weak learner oracle, providing classifiers performing at least slightly differently from random guessing. A key difference with gradient-based optimization is that boosting's original model does not requires access to first order information about a loss, yet the decades long history of boosting has quickly evolved it into a first order optimization setting -- sometimes even wrongfully *defining* it as such. Owing to recent progress extending gradient-based optimization to use only a loss' zeroth ($0^{th}$) order information to learn, this begs the question: what loss functions be efficiently optimized with boosting and what is the information really needed for boosting to meet the *original* boosting blueprint's requirements?We provide a constructive formal answer essentially showing that *any* loss function can be optimized with boosting and thus boosting can achieve a feat not yet known to be possible in the classical $0^{th}$ order setting, since loss functions are not required to be be convex, nor differentiable or Lipschitz -- and in fact not required to be continuous either. Some tools we use are rooted in quantum calculus, the mathematical field -- not to be confounded with quantum computation -- that studies calculus without passing to the limit, and thus without using first order information.

artificial intelligence, machine learning, neural information processing system 37, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.43)

Add feedback

21fe5b8ba755eeaece7a450849876228-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-11-2026, 16:43:16 GMT

experiment, optimization, regularization, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.32)

Add feedback

a878dbebc902328b41dbf02aa87abb58-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 17:35:11 GMT

gradient, optimization, simulator, (17 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Asia > Russia (0.04)
(6 more...)

Genre: Research Report > Promising Solution (0.34)

Industry:

Energy (0.68)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Trivializations for Gradient-Based Optimization on Manifolds

Neural Information Processing SystemsDec-25-2025, 02:10:38 GMT

We introduce a framework to study the transformation of problems with manifold constraints into unconstrained problems through parametrizations in terms of a Euclidean space.

gradient-based optimization, name change, trivialization, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.38)

Add feedback

Differentiable Spline Approximations

Neural Information Processing SystemsDec-24-2025, 16:50:34 GMT

The paradigm of differentiable programming has significantly enhanced the scope of machine learning via the judicious use of gradient-based optimization. However, standard differentiable programming methods (such as autodiff) typically require that the machine learning models be differentiable, limiting their applicability. Our goal in this paper is to use a new, principled approach to extend gradient-based optimization to functions well modeled by splines, which encompass a large family of piecewise polynomial models. We derive the form of the (weak) Jacobian of such functions and show that it exhibits a block-sparse structure that can be computed implicitly and efficiently. Overall, we show that leveraging this redesigned Jacobian in the form of a differentiable layer'' in predictive models leads to improved performance in diverse applications such as image segmentation, 3D point cloud reconstruction, and finite element analysis.

differentiable spline approximation, gradient-based optimization, name change, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Black-Box Optimization with Local Generative Surrogates

Neural Information Processing SystemsDec-24-2025, 10:13:04 GMT

We propose a novel method for gradient-based optimization of black-box simulators using differentiable local surrogate models. In fields such as physics and engineering, many processes are modeled with non-differentiable simulators with intractable likelihoods. Optimization of these forward models is particularly challenging, especially when the simulator is stochastic. To address such cases, we introduce the use of deep generative models to iteratively approximate the simulator in local neighborhoods of the parameter space. We demonstrate that these local surrogates can be used to approximate the gradient of the simulator, and thus enable gradient-based optimization of simulator parameters. In cases where the dependence of the simulator on the parameter space is constrained to a low dimensional submanifold, we observe that our method attains minima faster than baseline methods, including Bayesian optimization, numerical optimization and approaches using score function gradient estimators.

black-box optimization, optimization, simulator, (6 more...)

Neural Information Processing Systems

Industry: Transportation > Air (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Add feedback

Universal Adversarial Suffixes Using Calibrated Gumbel-Softmax Relaxation

Soor, Sampriti, Ghosh, Suklav, Sur, Arijit

arXiv.org Artificial IntelligenceDec-10-2025

Language models (LMs) are often used as zero-shot or few-shot classifiers by scoring label words, but they remain fragile to adversarial prompts. Prior work typically optimizes task- or model-specific triggers, making results difficult to compare and limiting transferability. We study universal adversarial suffixes: short token sequences (4-10 tokens) that, when appended to any input, broadly reduce accuracy across tasks and models. Our approach learns the suffix in a differentiable "soft" form using Gumbel-Softmax relaxation and then discretizes it for inference. Training maximizes calibrated cross-entropy on the label region while masking gold tokens to prevent trivial leakage, with entropy regularization to avoid collapse. A single suffix trained on one model transfers effectively to others, consistently lowering both accuracy and calibrated confidence. Experiments on sentiment analysis, natural language inference, paraphrase detection, commonsense QA, and physical reasoning with Qwen2-1.5B, Phi-1.5, and TinyLlama-1.1B demonstrate consistent attack effectiveness and transfer across tasks and model families.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2512.08123

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.50)

Add feedback

JaxWildfire: A GPU-Accelerated Wildfire Simulator for Reinforcement Learning

Çakır, Ufuk, Darvariu, Victor-Alexandru, Lacerda, Bruno, Hawes, Nick

arXiv.org Artificial IntelligenceDec-9-2025

Artificial intelligence methods are increasingly being explored for managing wildfires and other natural hazards. In particular, reinforcement learning (RL) is a promising path towards improving outcomes in such uncertain decision-making scenarios and moving beyond reactive strategies. However, training RL agents requires many environment interactions, and the speed of existing wildfire simulators is a severely limiting factor. We introduce $\texttt{JaxWildfire}$, a simulator underpinned by a principled probabilistic fire spread model based on cellular automata. It is implemented in JAX and enables vectorized simulations using $\texttt{vmap}$, allowing high throughput of simulations on GPUs. We demonstrate that $\texttt{JaxWildfire}$ achieves 6-35x speedup over existing software and enables gradient-based optimization of simulator parameters. Furthermore, we show that $\texttt{JaxWildfire}$ can be used to train RL agents to learn wildfire suppression policies. Our work is an important step towards enabling the advancement of RL techniques for managing natural hazards.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2512.06102

Country:

North America > United States (1.00)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report (0.40)

Industry: Government > Regional Government > North America Government > United States Government (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Pathlet Variational Auto-Encoder for Robust Trajectory Generation

Tang, Yuanbo, Tang, Yan, Zhang, Zixuan, Zhao, Zihui, Li, Yang

arXiv.org Artificial IntelligenceNov-21-2025

Trajectory generation has recently drawn growing interest in privacy-preserving urban mobility studies and location-based service applications. Although many studies have used deep learning or generative AI methods to model trajectories and have achieved promising results, the robustness and interpretability of such models are largely unexplored. This limits the application of trajectory generation algorithms on noisy real-world data and their trustworthiness in downstream tasks. To address this issue, we exploit the regular structure in urban trajectories and propose a deep generative model based on the pathlet representation, which encode trajectories with binary vectors associated with a learned dictionary of trajectory segments. Specifically, we introduce a probabilistic graphical model to describe the trajectory generation process, which includes a Variational Autoencoder (VAE) component and a linear decoder component. During training, the model can simultaneously learn the latent embedding of pathlet representations and the pathlet dictionary that captures mobility patterns in the trajectory dataset. The conditional version of our model can also be used to generate customized trajectories based on temporal and spatial constraints. Our model can effectively learn data distribution even using noisy data, achieving relative improvements of $35.4\%$ and $26.3\%$ over strong baselines on two real-world trajectory datasets. Moreover, the generated trajectories can be conveniently utilized for multiple downstream tasks, including trajectory prediction and data denoising. Lastly, the framework design offers a significant efficiency advantage, saving $64.8\%$ of the time and $56.5\%$ of GPU memory compared to previous approaches.

artificial intelligence, machine learning, trajectory, (19 more...)

arXiv.org Artificial Intelligence

2511.16105

Country:

Asia > China (0.30)
North America > United States (0.29)

Genre: Research Report (1.00)

Industry: Transportation (0.97)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.54)

Add feedback