AITopics

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > United States > Arizona > Maricopa County > Phoenix (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-13-2026, 22:21:23 GMT

581e1a06fa20f2c079dc5fb2db236335-Paper-Conference.pdf

large language model, machine learning, natural language, (19 more...)

Country:

Asia > China > Beijing > Beijing (0.04)
Europe > Denmark (0.04)
Asia > China > Shandong Province (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Artificial Intelligence > Vision (0.67)
(2 more...)

Neural Information Processing SystemsFeb-13-2026, 01:37:11 GMT

Algorithmic Linearly Constrained Gaussian Processes

Markus Lange-Hegermann

Neural Information Processing Systems http://nips.cc/

differential equation, equation, gaussian process, (16 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
South America > Brazil (0.04)
North America > Canada > Quebec > Montreal (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Modeling & Simulation (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Billera, Lukas, Nordlinder, Hedwig Nora, Murrell, Ben

Time dependent loss reweighting for flow matching and diffusion models is theoretically justified

arXiv.org Machine LearningNov-21-2025

This brief note clarifies that, in Generator Matching (which subsumes a large family of flow matching and diffusion models over continuous, manifold, and discrete spaces), both the Bregman divergence loss and the linear parameterization of the generator can depend on both the current state $X_t$ and the time $t$, and we show that the expectation over time in the loss can be taken with respect to a broad class of time distributions. We also show this for Edit Flows, which falls outside of Generator Matching. That the loss can depend on $t$ clarifies that time-dependent loss weighting schemes, often used in practice to stabilize training, are theoretically justified when the specific flow or diffusion scheme is a special case of Generator Matching (or Edit Flows). It also often simplifies the construction of $X_1$-predictor schemes, which are sometimes preferred for model-related reasons. We show examples that rely upon the dependence of linear parameterizations, and of the Bregman divergence loss, on $t$ and $X_t$.

artificial intelligence, bregman divergence, machine learning, (16 more...)

2511.16599

Genre: Research Report (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsNov-20-2025, 17:07:16 GMT

Algorithmic Linearly Constrained Gaussian Processes

Markus Lange-Hegermann

The resulting mean function is used for regression.

artificial intelligence, machine learning, modeling & simulation, (19 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
South America > Brazil (0.04)
North America > Canada > Quebec > Montreal (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Modeling & Simulation (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

de Farias, Matheus Vinícius Barreto, de Castro, Mario

Effects of label noise on the classification of outlier observations

arXiv.org Machine LearningNov-13-2025

The following study presents results obtained from experiments in which, before training a classification model, we added noise to the labels of the training set, so that the information contained in this set is not entirely correct. In fact, most datasets encountered in practical situations contain some degree of noise, which highlights the importance of this type of study for new techniques before implementing them in real-world applications. In this case, we are interested in measuring the impact of noise addition on BCOPS (Guan & Tib-shirani, 2022), a algorithm based on conformal prediction (V ovk et al., 2005) which, when combined with other machine learning methods, allows the construction of prediction sets for the test set observations in classification tasks. Prediction sets are sets that contain the possible values (for regression tasks) or possible classes (for classification tasks) for new observations. These sets are constructed so that the probability of the true value or class being contained within them meets a coverage guarantee. In the work developed by Guan & Tibshirani (2022), the possibility of using these prediction sets to detect outlier observations - meaning, observations whose true class was not present during training - is emphasized. Thus, we aim to measure both the classification coverage and the abstention rate on outlier observations of the BCOPS algorithm under the addition of noise, considering some of the datasets and machine learning algorithms used by Guan & Tibshirani (2022).

artificial intelligence, machine learning, outlier observation, (18 more...)

2511.08808

Country:

Europe > Austria > Vienna (0.14)
South America > Brazil > São Paulo (0.05)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsOct-10-2025, 10:49:34 GMT

Practical Shuffle Coding

It is a variant of shuffle coding that is many orders of magnitude faster than the original and enables'one-shot' compression of single

autoregressive shuffle, graph, shuffle, (15 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > United States > Arizona > Maricopa County > Phoenix (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsOct-10-2025, 03:16:14 GMT

On Mesa-Optimization in Autoregressively Trained Transformers: Emergence and Capability

When do mesa-optimization algorithms emerge in autoregressively trained transformers?

assumption 4, gradient descent, transformer, (13 more...)

Country:

Asia > China > Beijing > Beijing (0.04)
Europe > Denmark (0.04)
Asia > China > Shandong Province (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.87)
Information Technology > Artificial Intelligence > Vision (0.67)
(2 more...)

Chang, Tien-En, Chen, Argon

Variable Selection Using Relative Importance Rankings

arXiv.org Machine LearningSep-16-2025

Although conceptually related, variable selection and relative importance (RI) analysis have been treated quite differently in the literature. While RI is typically used for post-hoc model explanation, this paper explores its potential for variable ranking and filter-based selection before model creation. Specifically, we anticipate strong performance from the RI measures because they incorporate both direct and combined effects of predictors, addressing a key limitation of marginal correlation that ignores dependencies among predictors. We implement and evaluate the RI-based variable selection methods using general dominance (GD), comprehensive relative importance (CRI), and a newly proposed, computationally efficient variant termed CRI.Z. We first demonstrate how the RI measures more accurately rank the variables than the marginal correlation, especially when there are suppressed or weak predictors. We then show that predictive models built on these rankings are highly competitive, often outperforming state-of-the-art methods such as the lasso and relaxed lasso. The proposed RI-based methods are particularly effective in challenging cases involving clusters of highly correlated predictors, a setting known to cause failures in many benchmark methods. Although lasso methods have dominated the recent literature on variable selection, our study reveals that the RI-based method is a powerful and competitive alternative. We believe these underutilized tools deserve greater attention in statistics and machine learning communities. The code is available at: https://github.com/tien-endotchang/RI-variable-selection.

correlation 0, predictor, snr 0, (13 more...)

2509.10853

Country: Asia > Taiwan > Taiwan Province > Taipei (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

arXiv.org Machine LearningSep-10-2025

Toric geometry of ReLU neural networks

Fu, Yaoying

Given a continuous finitely piecewise linear function $f:\mathbb{R}^{n_0} \to \mathbb{R}$ and a fixed architecture $(n_0,\ldots,n_k;1)$ of feedforward ReLU neural networks, the exact function realization problem is to determine when some network with the given architecture realizes $f$. To develop a systematic way to answer these questions, we establish a connection between toric geometry and ReLU neural networks. This approach enables us to utilize numerous structures and tools from algebraic geometry to study ReLU neural networks. Starting with an unbiased ReLU neural network with rational weights, we define the ReLU fan, the ReLU toric variety, and the ReLU Cartier divisor associated with the network. This work also reveals the connection between the tropical geometry and the toric geometry of ReLU neural networks. As an application of the toric geometry framework, we prove a necessary and sufficient criterion of functions realizable by unbiased shallow ReLU neural networks by computing intersection numbers of the ReLU Cartier divisor and torus-invariant curves.

definition 4, neural network, relu neural network, (13 more...)

2509.05894

Country: Asia > Middle East > Israel (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)