AITopics | parameter

Collaborating Authors

parameter

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Penalising the biases in norm regularisation enforces sparsity

Neural Information Processing SystemsDec-26-2025, 14:27:31 GMT

Controlling the parameters' norm often yields good generalisation when training neural networks. Beyond simple intuitions, the relation between regularising parameters' norm and obtained estimators remains theoretically misunderstood. For one hidden ReLU layer networks with unidimensional data, this work shows the parameters' norm required to represent a function is given by the total variation of its second derivative, weighted by a $\sqrt{1+x^2}$ factor. Notably, this weighting factor disappears when the norm of bias terms is not regularised. The presence of this additional weighting factor is of utmost significance as it is shown to enforce the uniqueness and sparsity (in the number of kinks) of the minimal norm interpolator. Conversely, omitting the bias' norm allows for non-sparse solutions.Penalising the bias terms in the regularisation, either explicitly or implicitly, thus leads to sparse estimators.

name change, norm regularisation enforce sparsity, penalising, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Add feedback

Parameter tuning and model selection in Optimal Transport with semi-dual Brenier formulation

Neural Information Processing SystemsDec-24-2025, 19:37:00 GMT

Over the past few years, numerous computational models have been developed to solve Optimal Transport (OT) in a stochastic setting, where distributions are represented by samples and where the goal is to find the closest map to the ground truth OT map, unknown in practical settings. So far, no quantitative criterion has yet been put forward to tune the parameter of these models and select maps that best approximate the ground truth. To perform this task, we propose to leverage the Brenier formulation of OT. Theoretically, we show that this formulation guarantees that, up to sharp a distortion parameter depending on the smoothness/strong convexity and a statistical deviation term, the selected map achieves the lowest quadratic error to the ground truth. This criterion, estimated via convex optimization, enables parameter tuning and model selection among entropic regularization of OT, input convex neural networks and smooth and strongly convex nearest-Brenier (SSNB) models.We also use this criterion to question the use of OT in Domain-Adaptation (DA). In a standard DA experiment, it enables us to identify the potential that is closest to the true OT map between the source and the target. Yet, we observe that this selected potential is far from being the one that performs best for the downstream transfer classification task.

brenier formulation, model selection, optimal transport, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Make Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning

Neural Information Processing SystemsDec-24-2025, 11:17:44 GMT

Parameter-efficient fine-tuning (PEFT) of pre-trained language models (PLMs) has emerged as a highly successful approach, with training only a small number of parameters without sacrificing performance and becoming the de-facto learning paradigm with the increasing size of PLMs. However, existing PEFT methods are not memory-efficient, because they still require caching most of the intermediate activations for the gradient calculation, akin to fine-tuning. One effective way to reduce the activation memory is to apply a reversible model, so the intermediate activations are not necessary to be cached and can be recomputed. Nevertheless, modifying a PLM to its reversible variant is not straightforward, since the reversible model has a distinct architecture from the currently released PLMs. In this paper, we first investigate what is a key factor for the success of existing PEFT methods, and realize that it's essential to preserve the PLM's starting point when initializing a PEFT method.

fine-tuning, make pre-trained model reversible, memory efficient fine-tuning, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

TinyTL: Reduce Memory, Not Parameters for Efficient On-Device Learning

Neural Information Processing SystemsDec-24-2025, 05:53:04 GMT

Efficient on-device learning requires a small memory footprint at training time to fit the tight memory constraint. Existing work solves this problem by reducing the number of trainable parameters. However, this doesn't directly translate to memory saving since the major bottleneck is the activations, not parameters.

efficient on-device learning, name change, reduce memory, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.57)

Add feedback

Reviews: A Unified Approach for Learning the Parameters of Sum-Product Networks

Neural Information Processing SystemsJan-20-2025, 13:04:07 GMT

The single contribution of the paper which is relevant in practice is an alternative derivation of an existing method (Expectation Maximization for learning SPN weights). While this is an interesting result, I think that it does not grant alone a publication in NIPS since it's hard to imagine how this can contribute to better theoretical understanding or practical applications of SPNs. The interpretation of SPNs as mixtures of tree structured SPNs, which is reported as a novelty by the authors, was actually first derived in [Dennis and Vantura, Greedy Structure Search for Sum-Product Networks, 2015]. The paper is overall well written, clearly structured and the derivation of the results is really interesting. My main concern, as detailed above, is that in my opinion the potential impact of this paper is low, and the novelty is also somewhat limited due to the fact that the interpretation of SPN as mixture of trees was already given in [Dennis and Vantura, Greedy Structure Search for Sum-Product Networks, 2015] and that this is basically just an alternative derivation of EM.

derivation, sum-product network, unified approach, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.99)

Add feedback

Penalising the biases in norm regularisation enforces sparsity

Neural Information Processing SystemsJan-19-2025, 19:58:23 GMT

Controlling the parameters' norm often yields good generalisation when training neural networks. Beyond simple intuitions, the relation between regularising parameters' norm and obtained estimators remains theoretically misunderstood. For one hidden ReLU layer networks with unidimensional data, this work shows the parameters' norm required to represent a function is given by the total variation of its second derivative, weighted by a \sqrt{1 x 2} factor. Notably, this weighting factor disappears when the norm of bias terms is not regularised. The presence of this additional weighting factor is of utmost significance as it is shown to enforce the uniqueness and sparsity (in the number of kinks) of the minimal norm interpolator.

norm regularisation enforce sparsity, parameter, penalising, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.30)

Add feedback

Reviews: Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks

Neural Information Processing SystemsOct-7-2024, 05:57:27 GMT

If so, I am confused why this is highlighted as a virtue of adding noise, since the purely deterministic dynamics of GD also evince this behavior. Numerical experiments: These are slightly hard to interpret. First, which plots show SGD dynamics, and which are for GD? Second, I'm puzzled by how to interpret the dotted lines in each plot. In the case of RBF, how are we to make sense of the empirical n {-2} decay? Is this somehow predicted in the analysis of the GD, or is it an empirical phenomenon which is not theoretically addressed in this work.

convergence and asymptotic error scaling, long time convergence, neural network, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)

Add feedback

Tuning Random Forest model Machine Learning Predictive modeling

@machinelearnbotDec-25-2017, 16:10:17 GMT

A month back, I participated in a Kaggle competition called TFI. I started with my first submission at 50th percentile. Having worked relentlessly on feature engineering for more than 2 weeks, I managed to reach 20th percentile. To my surprise, right after tuning the parameters of the machine learning algorithm I was using, I was able to breach top 10th percentile. This is how important tuning these machine learning algorithms are.

artificial intelligence, machine learning, model, (17 more...)

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

#32afdf375aa2

@machinelearnbotDec-12-2017, 22:59:56 GMT

Machine learning has been successfully applied to demand planning, but leading suppliers of supply chain planning are beginning to work on using machine learning to improve production planning. But architecturally and culturally, this is a much tougher problem than machine learning applied to demand planning. In the $2 billion-plus supply chain planning market, ARC Advisory Group's latest market study shows production planning as being a critical application SCP solution representing over 25 percent of the total market. Production planning applications are used for both planning daily production at a factory to creating weekly or monthly plans to divvy up the production tasks that need to be accomplished across multiple factories. Machine learning is a form of continuous improvement.

artificial intelligence, machine learning, planning, (17 more...)

@machinelearnbot

Industry: Information Technology (0.32)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning (0.63)

Add feedback

An overview of gradient descent optimization algorithms

@machinelearnbotNov-23-2017, 22:30:19 GMT

Note: If you are looking for a review paper, this blog post is also available as an article on arXiv. Added derivations of AdaMax and Nadam. Gradient descent is one of the most popular algorithms to perform optimization and by far the most common way to optimize neural networks. At the same time, every state-of-the-art Deep Learning library contains implementations of various algorithms to optimize gradient descent (e.g. These algorithms, however, are often used as black-box optimizers, as practical explanations of their strengths and weaknesses are hard to come by.

gradient, neural network, survey article, (17 more...)

@machinelearnbot

Genre: Overview (0.84)

Industry: Education (0.95)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback