AITopics | Supervised Learning

Collaborating Authors

Supervised Learning

Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Polysemy of Synthetic Neurons Towards a New Type of Explanatory Categorical Vector Spaces

Pichat, Michael, Pogrund, William, Pichat, Paloma, Poumay, Judicael, Gasparian, Armanouche, Demarchi, Samuel, Corbet, Martin, Georgeon, Alois, Veillet-Guillem, Michael

arXiv.org Artificial IntelligenceMay-14-2025

The polysemantic nature of synthetic neurons in artificial intelligence language models is currently understood as the result of a necessary superposition of distributed features within the latent space. We propose an alternative approach, geometrically defining a neuron in layer n as a categorical vector space with a non-orthogonal basis, composed of categorical sub-dimensions extracted from preceding neurons in layer n-1. This categorical vector space is structured by the activation space of each neuron and enables, via an intra-neuronal attention process, the identification and utilization of a critical categorical zone for the efficiency of the language model - more homogeneous and located at the intersection of these different categorical sub-dimensions.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.07831

Country:

Europe (0.92)
North America > United States (0.67)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(5 more...)

Add feedback

Structured Prediction with Abstention via the Lovász Hinge

Finocchiaro, Jessie, Frongillo, Rafael, Nueve, Enrique

arXiv.org Artificial IntelligenceMay-13-2025

The Lovász hinge is a convex loss function proposed for binary structured classification, in which k related binary predictions jointly evaluated by a submodular function. Despite its prevalence in image segmentation and related tasks, the consistency of the Lovász hinge has remained open. We show that the Lovász hinge is inconsistent with its desired target unless the set function used for evaluation is modular. Leveraging the embedding framework of Finocchiaro et al. (2024), we find the target loss for which the Lovász hinge is consistent. This target, which we call the structured abstain problem, is a variant of selective classification for structured prediction that allows one to abstain on any subset of the k binary predictions. We derive a family of link functions, each of which is simultaneously consistent for all polymatroids, a subset of submodular set functions. We then give sufficient conditions on the polymatroid for the structured abstain problem to be tightly embedded by the Lovász hinge, meaning no target prediction is redundant. We experimentally demonstrate the potential of the structured abstain problem for interpretability in structured classification tasks. Finally, for the multiclass setting, we show that one can combine the binary encoding construction of Ramaswamy et al. (2018) with our link construction to achieve an efficient consistent surrogate for a natural multiclass generalization of the structured abstain problem.

artificial intelligence, machine learning, prediction, (17 more...)

arXiv.org Artificial Intelligence

2505.06446

Country: North America > United States (0.28)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.63)

Add feedback

Structural Alignment in Link Prediction

Sardina, Jeffrey Seathrún

arXiv.org Artificial IntelligenceMay-9-2025

While Knowledge Graphs (KGs) have become increasingly popular across various scientific disciplines for their ability to model and interlink huge quantities of data, essentially all real-world KGs are known to be incomplete. As such, with the growth of KG use has been a concurrent development of machine learning tools designed to predict missing information in KGs, which is referred to as the Link Prediction Task. The majority of state-of-the-art link predictors to date have followed an embedding-based paradigm. In this paradigm, it is assumed that the information content of a KG is best represented by the (individual) vector representations of its nodes and edges, and that therefore node and edge embeddings are particularly well-suited to performing link prediction. This thesis proposes an alternative perspective on the field's approach to link prediction and KG data modelling. Specifically, this work re-analyses KGs and state-of-the-art link predictors from a graph-structure-first perspective that models the information content of a KG in terms of whole triples, rather than individual nodes and edges. Following a literature review and two core sets of experiments, this thesis concludes that a structure-first perspective on KGs and link prediction is both viable and useful for understanding KG learning and for enabling cross-KG transfer learning for the link prediction task. This observation is used to create and propose the Structural Alignment Hypothesis, which postulates that link prediction can be understood and modelled as a structural task. All code and data used for this thesis are open-sourced. This thesis was written bilingually, with the main document in English and an informal extended summary in Irish. An Irish-language translation dictionary of machine learning terms (the Foclóir Tráchtais) created for this work is open-sourced as well.

data mining, kg structure and kgem hyperparameter, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2505.04939

Country: Europe > Ireland (0.27)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(4 more...)

Add feedback

From Two Sample Testing to Singular Gaussian Discrimination

Santoro, Leonardo V., Waghmare, Kartik G., Panaretos, Victor M.

arXiv.org Machine LearningMay-8-2025

We establish that testing for the equality of two probability measures on a general separable and compact metric space is equivalent to testing for the singularity between two corresponding Gaussian measures on a suitable Reproducing Kernel Hilbert Space. The corresponding Gaussians are defined via the notion of kernel mean and covariance embedding of a probability measure. Discerning two singular Gaussians is fundamentally simpler from an information-theoretic perspective than non-parametric two-sample testing, particularly in high-dimensional settings. Our proof leverages the Feldman-Hajek criterion for singularity/equivalence of Gaussians on Hilbert spaces, and shows that discrepancies between distributions are heavily magnified through their corresponding Gaussian embeddings: at a population level, distinct probability measures lead to essentially separated Gaussian embeddings. This appears to be a new instance of the blessing of dimensionality that can be harnessed for the design of efficient inference tools in great generality.

artificial intelligence, machine learning, operator, (18 more...)

arXiv.org Machine Learning

2505.04613

Country:

North America > United States > New York (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)

Add feedback

Self-Balancing, Memory Efficient, Dynamic Metric Space Data Maintenance, for Rapid Multi-Kernel Estimation

Ellendula, Aditya S, Bajaj, Chandrajit

arXiv.org Artificial IntelligenceApr-28-2025

We present a dynamic self-balancing octree data structure that enables efficient neighborhood maintenance in evolving metric spaces, a key challenge in modern machine learning systems. Many learning and generative models operate as dynamical systems whose representations evolve during training, requiring fast, adaptive spatial organization. Our two-parameter octree supports logarithmic-time updates and queries, eliminating the need for costly full rebuilds as data distributions shift. We demonstrate its effectiveness in four areas: (1) accelerating Stein variational gradient descent by supporting more particles with lower overhead; (2) enabling real-time, incremental KNN classification with logarithmic complexity; (3) facilitating efficient, dynamic indexing and retrieval for retrieval-augmented generation; and (4) improving sample efficiency by jointly optimizing input and latent spaces. Across all applications, our approach yields exponential speedups while preserving accuracy, particularly in high-dimensional spaces where maintaining adaptive spatial structure is critical.

application, artificial intelligence, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2504.18003

Country:

North America > United States > Texas (0.28)
North America > Canada (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Robust Estimation in metric spaces: Achieving Exponential Concentration with a Fr\'echet Median

Kim, Jakwang, Park, Jiyoung, Bhattacharya, Anirban

arXiv.org Machine LearningApr-18-2025

There is growing interest in developing statistical estimators that achieve exponential concentration around a population target even when the data distribution has heavier than exponential tails. More recent activity has focused on extending such ideas beyond Euclidean spaces to Hilbert spaces and Riemannian manifolds. In this work, we show that such exponential concentration in presence of heavy tails can be achieved over a broader class of parameter spaces called CAT($\kappa$) spaces, a very general metric space equipped with the minimal essential geometric structure for our purpose, while being sufficiently broad to encompass most typical examples encountered in statistics and machine learning. The key technique is to develop and exploit a general concentration bound for the Fr\'echet median in CAT($\kappa$) spaces. We illustrate our theory through a number of examples, and provide empirical support through simulation studies.

artificial intelligence, estimator, machine learning, (14 more...)

arXiv.org Machine Learning

2504.14161

Country:

North America > United States > Texas (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
(7 more...)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.62)

Add feedback

Watch: New speed climbing record set in the Swiss Alps

BBC NewsApr-16-2025, 01:45:20 GMT

A Swiss and Austrian climbing pair have shattered the speed record for completing the daunting north faces of a famed trio of Swiss mountains - the Eiger, Mönch and Jungfrau. Switzerland's Nicolas Hojac and Austria's Philipp Brugger shaved nearly ten hours off the previous record set more than two decades ago.

new speed, swiss alp

BBC News

Country:

Europe > Switzerland (0.43)
Europe > Austria (0.43)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.79)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.79)

Add feedback

Bregman-Hausdorff divergence: strengthening the connections between computational geometry and machine learning

Pham, Tuyen, Kouřimská, Hana Dal Poz, Wagner, Hubert

arXiv.org Artificial IntelligenceApr-11-2025

The purpose of this paper is twofold. On a technical side, we propose an extension of the Hausdorff distance from metric spaces to spaces equipped with asymmetric distance measures. Specifically, we focus on the family of Bregman divergences, which includes the popular Kullback--Leibler divergence (also known as relative entropy). As a proof of concept, we use the resulting Bregman--Hausdorff divergence to compare two collections of probabilistic predictions produced by different machine learning models trained using the relative entropy loss. The algorithms we propose are surprisingly efficient even for large inputs with hundreds of dimensions. In addition to the introduction of this technical concept, we provide a survey. It outlines the basics of Bregman geometry, as well as computational geometry algorithms. We focus on algorithms that are compatible with this geometry and are relevant for machine learning.

artificial intelligence, machine learning, spatial reasoning, (19 more...)

arXiv.org Artificial Intelligence

2504.07322

Country:

North America > United States (0.67)
Europe (0.67)

Genre:

Overview (0.48)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)

Add feedback

Random Normed k-Means: A Paradigm-Shift in Clustering within Probabilistic Metric Spaces

Hemdanou, Abderrafik Laakel, Achtoun, Youssef, Sefian, Mohammed Lamarti, Tahiri, Ismail, Afia, Abdellatif El

arXiv.org Machine LearningApr-4-2025

Existing approaches remain largely constrained by traditional distance metrics, limiting their effectiveness in handling random data. In this work, we introduce the first k-means variant in the literature that operates within a probabilistic metric space, replacing conventional distance measures with a well-defined distance distribution function. This pioneering approach enables more flexible and robust clustering in both deterministic and random datasets, establishing a new foundation for clustering in stochastic environments. By adopting a probabilistic perspective, our method not only introduces a fresh paradigm but also establishes a rigorous theoretical framework that is expected to serve as a key reference for future clustering research involving random data. Extensive experiments on diverse real and synthetic datasets assess our model's effectiveness using widely recognized evaluation metrics, including Silhouette, Davies-Bouldin, Calinski Harabasz, the adjusted Rand index, and distortion. Comparative analyses against established methods such as k-means++, fuzzy c-means, and kernel probabilistic k-means demonstrate the superior performance of our proposed random normed k-means (RNKM) algorithm. Notably, RNKM exhibits a remarkable ability to identify nonlinearly separable structures, making it highly effective in complex clustering scenarios. These findings position RNKM as a groundbreaking advancement in clustering research, offering a powerful alternative to traditional techniques while addressing a long-standing gap in the literature. By bridging probabilistic metrics with clustering, this study provides a foundational reference for future developments and opens new avenues for advanced data analysis in dynamic, data-driven applications.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

2504.03928

Country:

Asia > Middle East > UAE (0.04)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
North America > United States > Wisconsin (0.04)
(4 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.66)
Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area (0.68)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.85)

Add feedback

New universal operator approximation theorem for encoder-decoder architectures (Preprint)

Gödeke, Janek, Fernsel, Pascal

arXiv.org Artificial IntelligenceMar-31-2025

Motivated by the rapidly growing field of mathematics for operator approximation with neural networks, we present a novel universal operator approximation theorem for a broad class of encoder-decoder architectures. In this study, we focus on approximating continuous operators in $\mathcal{C}(\mathcal{X}, \mathcal{Y})$, where $\mathcal{X}$ and $\mathcal{Y}$ are infinite-dimensional normed or metric spaces, and we consider uniform convergence on compact subsets of $\mathcal{X}$. Unlike standard results in the operator learning literature, we investigate the case where the approximating operator sequence can be chosen independently of the compact sets. Taking a topological perspective, we analyze different types of operator approximation and show that compact-set-independent approximation is a strictly stronger property in most relevant operator learning frameworks. To establish our results, we introduce a new approximation property tailored to encoder-decoder architectures, which enables us to prove a universal operator approximation theorem ensuring uniform convergence on every compact subset. This result unifies and extends existing universal operator approximation theorems for various encoder-decoder architectures, including classical DeepONets, BasisONets, special cases of MIONets, architectures based on frames and other related approaches.

artificial intelligence, machine learning, sequence, (17 more...)

arXiv.org Artificial Intelligence

2503.24092

Country:

Europe > Germany > Bremen > Bremen (0.14)
North America > United States (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(5 more...)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.36)

Add feedback