AITopics

2605.15454

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)

Patanè, Giulia, Menafoglio, Alessandra, Krauth, Alexander, Fechner, Peter, Dede', Luca, Colosimo, Bianca Maria, Nicolussi, Federica

K-Models: a Flexible and Interpretable Method for Ordinal Clustering with Application to Antigen-Antibody Interaction Profiles

arXiv.org Machine LearningMay-15-2026

Existing clustering methods for functional data often prioritize partitioning accuracy over interpretability, making it challenging to extract meaningful insights when the data-generating process follows a specific underlying structure and an ordinal relationship among clusters is suspected. This work introduces K-Models, a novel framework that integrates ordinal constraints and estimates key underlying elements of the random process generating the observed functional profiles, improving both interpretability and structure identification. The proposed method is evaluated through simulations and real-world applications. In particular, it is tested on Region of Interest (ROI) curves, which represent reaction profiles from a reflectometric sensor monitoring biomolecular interactions, such as antigen-antibody binding. These curves represent changes in reflected light intensity over time at multiple measurement spots with immobilized antigens during analyte exposure, capturing the binding dynamics of the system. The goal is to identify intrinsic signal patterns solely from the observed dynamics, making this dataset an ideal benchmark for assessing the added interpretability of the proposed approach. By incorporating structural assumptions into the clustering process, K-Models enhances interpretability while maintaining performance comparable to state-of-the-art techniques, providing a valuable tool for analyzing functional data with an underlying ordinal structure.

artificial intelligence, functional data, machine learning, (17 more...)

2605.14828

Country: Europe (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.61)
Health & Medicine > Therapeutic Area > Immunology (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.90)

arXiv.org Machine LearningMay-14-2026

Delightful Exploration

Osband, Ian

Most exploration algorithms search broadly until uncertainty is resolved. When the action space is too large to resolve within budget, practitioners default to $\varepsilon$-greedy, which bounds disruption but spends its override blindly. We introduce \textit{Delight-gated exploration} (DE), a host--override rule that spends exploratory actions only when their prospective delight (expected improvement times surprisal) exceeds a gate price. This practical heuristic recovers a classical result: Pandora's reservation-value rule for costly search, with surprisal setting the effective inspection cost. Resolved arms exit the gate, fresh arms shut off above a prior-determined threshold, and selected linear-bandit overrides consume finite information budget. Across Bernoulli bandits, linear bandits, and tabular MDPs, the same hyperparameters transfer without retuning, and DE shows much weaker regret growth than Thompson Sampling and $\varepsilon$-greedy in the tested unresolved regimes. Delight improves acting for the same reason it improves learning: it prices scarce resources by the product of upside and surprisal.

artificial intelligence, data mining, machine learning, (20 more...)

2605.13287

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Yavari, Amirhossein, Esfahlani, Farnaz Zamani

Beyond Activation Alignment: The Geometry of Neural Sensitivity

arXiv.org Machine LearningMay-8-2026

Activation-alignment measures such as Representational Similarity Analysis (RSA), Canonical Correlation Analysis (CCA), and Centered Kernel Alignment (CKA) are widely used to compare biological and artificial neural representations. Recent theoretical work interprets many of these methods as assessing agreement between optimal linear readouts over broad families of global tasks. However, agreement at the level of global readouts does not determine how a system uses local stimulus evidence. Specifically, representations may align in activation space yet differ in their sensitivity to small perturbations. To address this challenge, we introduce a complementary framework based on local decodable information, which focuses on a representation's ability, under noise, to discriminate small perturbations within a specified stimulus-coordinate subspace. Building on Fisher information and local representation geometry, we summarize each representation using the expected projected pullback/Fisher metric over that subspace. This formulation induces a second-moment family of local discrimination tasks, for which the resulting operator provides a minimal, complete dataset-level summary of expected discriminability. We compare these regularized signatures using a log-spectral distance on the manifold of symmetric positive definite (SPD) matrices, yielding the Spectral Riemannian Alignment Score (S-RAS) and a uniform multiplicative certificate over the corresponding family of lifted task values. Empirically, this framework enables the recovery of corresponding layers across independently trained artificial neural networks, supports transferable class-conditional probes, reveals controlled dissociations between standard and robust training, and uncovers stimulus-coordinate family effects across mouse visual cortex using the Allen Brain Observatory static gratings dataset.

artificial intelligence, machine learning, representation, (18 more...)

2605.03222

Country: North America > United States > Oklahoma (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Pascal Mettes, Elise van der Pol, Cees Snoek

Hyperspherical Prototype Networks

Neural Information Processing SystemsApr-30-2026, 20:24:36 GMT

This paper introduces hyperspherical prototype networks, which unify classification and regression with prototypes on hyperspherical output spaces. For classification, a common approach is to define prototypes as the mean output vector over training examples per class. Here, we propose to use hyperspheres as output spaces, with class prototypes defined a priori with large margin separation. We position prototypes through data-independent optimization, with an extension to incorporate priors from class semantics. By doing so, we do not require any prototype updating, we can handle any training size, and the output dimensionality is no longer constrained to the number of classes. Furthermore, we generalize to regression, by optimizing outputs as an interpolation between two prototypes on the hypersphere. Since both tasks are now defined by the same loss function, they can be jointly trained for multi-task problems. Experimentally, we show the benefit of hyperspherical prototype networks for classification, regression, and their combination over other prototype methods, softmax cross-entropy, and mean squared error approaches.

artificial intelligence, machine learning, prototype, (15 more...)

Country: Europe > Netherlands > North Holland > Amsterdam (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Information Processing SystemsApr-30-2026, 20:24:22 GMT

02a32ad2669e6fe298e607fe7cc0e1a0-AuthorFeedback.pdf

We thank all the reviewers (R1,R2,R3) for their feedback and suggestions.1 Table A: Multi-task comparison across task weights. We have per-2 formed loss balancing with five different weights t3 in the multi-task loss Lm = t Lc +(1 t) Lr for4 the classification and regression losses. The results5 on OmniArt are reported in Table A. Our proposal6 is robust to the weight value, tuning the task weight7 is not vital. We obtain a moderate gain for both clas-8 sification and regression with a weight of t = 0.25.9 For the multi-task baseline, emphasizing regression10 reduces the regression error, as the gradient magnitude of the regression loss is much lower than the one for the11 classification loss.

artificial intelligence, dimension, machine learning, (19 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.59)

Neural Information Processing SystemsApr-25-2026, 21:59:37 GMT

Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks

One of the central questions in the theory of deep learning is to understand how neural networks learn hierarchical features. The ability of deep networks to extract salient features is crucial to both their outstanding generalization ability and the modern deep learning paradigm of pretraining and finetuneing. However, this feature learning process remains poorly understood from a theoretical perspective, with existing analyses largely restricted to two-layer networks. In this work we show that three-layer neural networks have provably richer feature learning capabilities than two-layer networks. We analyze the features learned by a three-layer network trained with layer-wise gradient descent, and present a general purpose theorem which upper bounds the sample complexity and width needed to achieve low test error when the target has specific hierarchical structure. We instantiate our framework in specific statistical learning settings - single-index models and functions of quadratic features - and show that in the latter setting three-layer networks obtain a sample complexity improvement over all existing guarantees for two-layer networks. Crucially, this sample complexity improvement relies on the ability of three-layer networks to efficiently learn nonlinear features. We then establish a concrete optimization-based depth separation by constructing a function which is efficiently learnable via gradient descent on a three-layer network, yet cannot be learned efficiently by a two-layer network. Our work makes progress towards understanding the provable benefit of three-layer neural networks over two-layer networks in the feature learning regime.

artificial intelligence, machine learning, xtax, (16 more...)

Country: North America > United States > Minnesota (0.27)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsApr-25-2026, 21:40:24 GMT

51200d29d1fc15f5a71c1dab4bb54f7c-Paper.pdf

artificial intelligence, machine learning, natural language, (18 more...)

Country: North America > Canada > Ontario (0.28)

Industry:

Media (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsApr-25-2026, 09:39:53 GMT

327af0f71f7acdfd882774225f04775f-Supplemental.pdf

We will now derive continuous dynamics (2) in the main paper. Let 1m = 1 if class 1 is selected at iteration mand 1m = 0 otherwise. Likewise, we can obtain the dynamics of X2j similarly. We will next prove the separation theorem in binary classification, Theorem 2.1. Given the feature vectors X1i(t), X2j(t) for i,j [n], as t and large n, 1. if α > β, they are asymptotically separable with probability tending to one, 2. if α β, they are asymptotically separable with probability tending to zero. This also aligns with our intuition that the intra-class effect should be stronger than its inter-class counterpart. On the other hand, when α>β, ignoring a null set we may assume c1 >c2 without loss of generality.

artificial intelligence, l-model, machine learning, (19 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)