AITopics | parameterisation

Collaborating Authors

parameterisation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Interpretable Machine Learning for Spatial Science: A Lie-Algebraic Kernel for Rotationally Anisotropic Gaussian Processes

Warrior, Kane, Chakrabarty, Dalia

arXiv.org Machine LearningMay-13-2026

Many three-dimensional spatial fields are anisotropic, with directions of rapid and slow variation that need not align with the coordinate axes. Standard Gaussian process kernels with Automatic Relevance Determination (ARD) capture only axis-aligned anisotropy, while generic full symmetric positive definite (SPD) metrics can represent rotated anisotropy but do not parameterise principal length-scales and directions directly. We introduce an interpretable rotationally anisotropic GP kernel that parameterises a three-dimensional SPD covariance metric using three principal length-scales and an explicit SO(3) rotation. The rotation is represented by an axis-angle vector and mapped to SO(3) via the Lie-algebra exponential map, giving unconstrained Euclidean coordinates for inference while always inducing a valid SPD metric. The construction spans the same family of three-dimensional SPD covariance metrics as a generic full-SPD parameterisation, but exposes the geometry differently: length-scales and orientation are explicit, interpretable, and directly available for prior specification and posterior summaries. We perform Bayesian inference on these quantities using Markov Chain Monte Carlo (MCMC), and characterise the resulting symmetries and weakly identified regimes. On synthetic data with rotated anisotropy, the posterior recovers the generating metric and improves prediction relative to an axis-aligned ARD baseline, while matching the predictive performance of a generic full SPD baseline. When the ground truth is axis-aligned, posterior mass concentrates near the identity rotation and predictive performance matches ARD. On a material-density dataset from a laboratory-fabricated nano-brick, the inferred metric reveals rotated anisotropy that is not captured by axis-aligned kernels.

artificial intelligence, bayesian inference, machine learning, (19 more...)

arXiv.org Machine Learning

2605.11179

Country: North America > United States (0.46)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.66)

Add feedback

Inverse-Free Sparse Variational Gaussian Processes

Cortinovis, Stefano, Aitchison, Laurence, Eleftheriadis, Stefanos, van der Wilk, Mark

arXiv.org Machine LearningApr-2-2026

Gaussian processes (GPs) offer appealing properties but are costly to train at scale. Sparse variational GP (SVGP) approximations reduce cost yet still rely on Cholesky decompositions of kernel matrices, ill-suited to low-precision, massively parallel hardware. While one can construct valid variational bounds that rely only on matrix multiplications (matmuls) via an auxiliary matrix parameter, optimising them with off-the-shelf first-order methods is challenging. We make the inverse-free approach practical by proposing a better-conditioned bound and deriving a matmul-only natural-gradient update for the auxiliary parameter, markedly improving stability and convergence. We further provide simple heuristics, such as step-size schedules and stopping criteria, that make the overall optimisation routine fit seamlessly into existing workflows. Across regression and classification benchmarks, we demonstrate that our method 1) serves as a drop-in replacement in SVGP-based models (e.g., deep GPs), 2) recovers similar performance to traditional methods, and 3) can be faster than baselines when well tuned.

artificial intelligence, machine learning, r-svgp, (17 more...)

arXiv.org Machine Learning

2604.00697

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Israel (0.04)
(2 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

fadec8f2e65f181d777507d1df69b92f-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 19:08:42 GMT

machine learning, manifold, natural language, (19 more...)

Neural Information Processing Systems

Country:

South America > Chile (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Netherlands (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

c4de8ced6214345614d33fb0b16a8acd-Supplemental.pdf

Neural Information Processing SystemsFeb-11-2026, 02:12:49 GMT

bbpn, qr algorithm, section 4, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.49)

Add feedback

SupplementaryMaterial

Neural Information Processing SystemsFeb-8-2026, 21:43:47 GMT

We provide additional results for EGTA applied to networked MARL system control for CPR management. Restraint percentages under different regeneration rates The heatmaps in Figure 7 (A-C) highlight the differences in restraint percentage for different values ofα as the regeneration rate is changed from high(0.1)to In the case where agents are completely self-interested (α = 0)shownin(A), themajority ofalgorithms without communication display verylowlevels of restraint for all rates of regeneration. The orange ovals in these diagrams indicate which system configurations correspond to the highest expected payofffor all agents. Schelling diagrams using a different parameterisation An alternative parameterisation for a Schelling diagram is to plot payoffs for a particular agent (cooperating or defecting) with respect to the number ofother cooperators on thex-axis, instead of thetotalnumber of cooperators.

artificial intelligence, schelling diagram, supplementarymaterial, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.36)

Add feedback

1343e23bc2d34c054040e73ad86582cf-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 04:56:06 GMT

causal effect, dataset, frugal flow, (16 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > United States > Colorado > Denver County > Denver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

Add feedback

SupplementaryMaterial: Appendix BayesianDeepEnsemblesviatheNeuralTangentKernel ARecapofstandardandNTKparameterisations

Neural Information Processing SystemsFeb-7-2026, 10:35:48 GMT

We see that the different parameterisations yield the same distribution for the functional output f(,θ)atinitialisation, butgivedifferent scalings tothe parameter gradients inthe backward pass. GP(0,Θ L) and is independent off0() in the infinite width limit. Let X0 be an arbitrary test set. In fact, even with a heteroscedastic priorθ N(0,Λ) with a diagonal matrix Λ Rp p+ and diagonal entries {λj}pj=1, it is straightforward to show that the correct setting of regularisation iskθk2Λ = θ>Λ 1θ in order to obtain a posterior sample of θ. For an NN in the linearised regime [23], this is related to the fact that the NTK and standard parameterisations initialise parameters differently, yet yield the same functional distribution for a randomly initialised NN.

artificial intelligence, machine learning, parameterisation, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.50)

Add feedback

Completed Hyperparameter Transfer across Modules, Width, Depth, Batch and Duration

Mlodozeniec, Bruno, Ablin, Pierre, Béthune, Louis, Busbridge, Dan, Klein, Michal, Ramapuram, Jason, Cuturi, Marco

arXiv.org Machine LearningDec-30-2025

Hyperparameter tuning can dramatically impact training stability and final performance of large-scale models. Recent works on neural network parameterisations, such as $μ$P, have enabled transfer of optimal global hyperparameters across model sizes. These works propose an empirical practice of search for optimal global base hyperparameters at a small model size, and transfer to a large size. We extend these works in two key ways. To handle scaling along most important scaling axes, we propose the Complete$^{(d)}$ Parameterisation that unifies scaling in width and depth -- using an adaptation of CompleteP -- as well as in batch-size and training duration. Secondly, with our parameterisation, we investigate per-module hyperparameter optimisation and transfer. We characterise the empirical challenges of navigating the high-dimensional hyperparameter landscape, and propose practical guidelines for tackling this optimisation problem. We demonstrate that, with the right parameterisation, hyperparameter transfer holds even in the per-module hyperparameter regime. Our study covers an extensive range of optimisation hyperparameters of modern models: learning rates, AdamW parameters, weight decay, initialisation scales, and residual block multipliers. Our experiments demonstrate significant training speed improvements in Large Language Models with the transferred per-module hyperparameters.

large language model, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2512.22382

Country: Europe (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Learning Layer-wise Equivariances Automatically using Gradients

Neural Information Processing SystemsDec-25-2025, 11:02:40 GMT

However, symmetries provide fixed hard constraints on the functions a network can represent, need to be specified in advance, and can not be adapted. Our goal is to allow flexible symmetry constraints that can automatically be learned from data using gradients. Learning symmetry and associated weight connectivity structures from scratch is difficult for two reasons. First, it requires efficient and flexible parameterisations of layer-wise equivariances. Secondly, symmetries act as constraints and are therefore not encouraged by training losses measuring data fit. To overcome these challenges, we improve parameterisations of soft equivariance and learn the amount of equivariance in layers by optimising the marginal likelihood, estimated using differentiable Laplace approximations. The objective balances data fit and model complexity enabling layer-wise symmetry discovery in deep networks. We demonstrate the ability to automatically learn layer-wise equivariances on image classification tasks, achieving equivalent or improved performance over baselines with hard-coded symmetry.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.58)

Add feedback

Filters

Collaborating Authors

parameterisation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Interpretable Machine Learning for Spatial Science: A Lie-Algebraic Kernel for Rotationally Anisotropic Gaussian Processes

Inverse-Free Sparse Variational Gaussian Processes

fadec8f2e65f181d777507d1df69b92f-Paper-Conference.pdf

f1e709e6aef16ba2f0cd6c7e4f52b9b6-Paper.pdf

c4de8ced6214345614d33fb0b16a8acd-Supplemental.pdf

SupplementaryMaterial

1343e23bc2d34c054040e73ad86582cf-Paper-Conference.pdf

SupplementaryMaterial: Appendix BayesianDeepEnsemblesviatheNeuralTangentKernel ARecapofstandardandNTKparameterisations

Completed Hyperparameter Transfer across Modules, Width, Depth, Batch and Duration

Learning Layer-wise Equivariances Automatically using Gradients