Representation Of Examples
Linear Regression in p-adic metric spaces
Baker, Gregory D., McCallum, Scott, Pattinson, Dirk
Many real-world machine learning problems involve inherently hierarchical data, yet traditional approaches rely on Euclidean metrics that fail to capture the discrete, branching nature of hierarchical relationships. We present a theoretical foundation for machine learning in p-adic metric spaces, which naturally respect hierarchical structure. Our main result proves that an n-dimensional plane minimizing the p-adic sum of distances to points in a dataset must pass through at least n + 1 of those points -- a striking contrast to Euclidean regression that highlights how p-adic metrics better align with the discrete nature of hierarchical data. As a corollary, a polynomial of degree n constructed to minimise the p-adic sum of residuals will pass through at least n + 1 points. As a further corollary, a polynomial of degree n approximating a higher degree polynomial at a finite number of points will yield a difference polynomial that has distinct rational roots. We demonstrate the practical significance of this result through two applications in natural language processing: analyzing hierarchical taxonomies and modeling grammatical morphology. These results suggest that p-adic metrics may be fundamental to properly handling hierarchical data structures in machine learning. In hierarchical data, interpolation between points often makes less sense than selecting actual observed points as representatives.
- Oceania > Australia (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- Europe > Germany > Saxony > Leipzig (0.04)
- Asia > Taiwan > Taiwan Province > Taipei (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.62)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.53)
End-to-End Deep Learning for Predicting Metric Space-Valued Outputs
Zhou, Yidong, Iao, Su I, Müller, Hans-Georg
Many modern applications involve predicting structured, non-Euclidean outputs such as probability distributions, networks, and symmetric positive-definite matrices. These outputs are naturally modeled as elements of general metric spaces, where classical regression techniques that rely on vector space structure no longer apply. We introduce E2M (End-to-End Metric regression), a deep learning framework for predicting metric space-valued outputs. E2M performs prediction via a weighted Fréchet means over training outputs, where the weights are learned by a neural network conditioned on the input. This construction provides a principled mechanism for geometry-aware prediction that avoids surrogate embeddings and restrictive parametric assumptions, while fully preserving the intrinsic geometry of the output space. We establish theoretical guarantees, including a universal approximation theorem that characterizes the expressive capacity of the model and a convergence analysis of the entropy-regularized training objective. Through extensive simulations involving probability distributions, networks, and symmetric positive-definite matrices, we show that E2M consistently achieves state-of-the-art performance, with its advantages becoming more pronounced at larger sample sizes. Applications to human mortality distributions and New York City taxi networks further demonstrate the flexibility and practical utility of the framework.
- North America > United States > New York (0.25)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > United States > California > Yolo County > Davis (0.04)
- (2 more...)
- Banking & Finance > Economy (0.67)
- Transportation > Passenger (0.66)
- Transportation > Ground > Road (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.89)
Metrics for Parametric Families of Networks
Gómez, Mario, Ma, Guanqun, Needham, Tom, Wang, Bei
We introduce a general framework for analyzing data modeled as parameterized families of networks. Building on a Gromov-Wasserstein variant of optimal transport, we define a family of parameterized Gromov-Wasserstein distances for comparing such parametric data, including time-varying metric spaces induced by collective motion, temporally evolving weighted social networks, and random graph models. We establish foundational properties of these distances, showing that they subsume several existing metrics in the literature, and derive theoretical approximation guarantees. In particular, we develop computationally tractable lower bounds and relate them to graph statistics commonly used in random graph theory. Furthermore, we prove that our distances can be consistently approximated in random graph and random metric space settings via empirical estimates from generative models. Finally, we demonstrate the practical utility of our framework through a series of numerical experiments.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Ohio (0.04)
- North America > United States > Michigan (0.04)
- (2 more...)
- Health & Medicine (0.46)
- Information Technology (0.34)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.55)
Hyperbolic Coarse-to-Fine Few-Shot Class-Incremental Learning
In the field of machine learning, hyperbolic space demonstrates superior representation capabilities for hierarchical data compared to conventional Euclidean space. This work focuses on the Coarse-To-Fine Few-Shot Class-Incremental Learning (C2FSCIL) task. Our study follows the Knowe approach, which contrastively learns coarse class labels and subsequently normalizes and freezes the classifier weights of learned fine classes in the embedding space. To better interpret the "coarse-to-fine" paradigm, we propose embedding the feature extractor into hyperbolic space. Specifically, we employ the Poincaré ball model of hyperbolic space, enabling the feature extractor to transform input images into feature vectors within the Poincaré ball instead of Euclidean space. We further introduce hyperbolic contrastive loss and hyperbolic fully-connected layers to facilitate model optimization and classification in hyperbolic space. Additionally, to enhance performance under few-shot conditions, we implement maximum entropy distribution in hyperbolic space to estimate the probability distribution of fine-class feature vectors. This allows generation of augmented features from the distribution to mitigate overfitting during training with limited samples. Experiments on C2FSCIL benchmarks show that our method effectively improves both coarse and fine class accuracies.
- Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)
- Europe > Switzerland (0.04)
- Asia > China (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.56)
Maximum diversity, weighting and invariants of time series
Magnitude, obtained as a special case of Euler characteristic of enriched category, represents a sense of the size of metric spaces and is related to classical notions such as cardinality, dimension, and volume. While the studies have explained the meaning of magnitude from various perspectives, continuity also gives a valuable view of magnitude. Based on established results about continuity of magnitude and maximum diversity, this article focuses on continuity of weighting, a distribution whose totality is magnitude, and its variation corresponding to maximum diversity. Meanwhile, recent studies also illuminated the connection between magnitude and data analysis by applying magnitude theory to point clouds representing the data or the set of model parameters. This article will also provide an application for time series analysis by introducing a new kind of invariants of periodic time series, where the invariance follows directly from the continuity results. As a use-case, a simple machine learning experiment is conducted with real-world data, in which the suggested invariants improved the performance.
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.38)
Towards understanding Accelerated Stein Variational Gradient Flow -- Analysis of Generalized Bilinear Kernels for Gaussian target distributions
Stein variational gradient descent (SVGD) is a kernel-based and non-parametric particle method for sampling from a target distribution, such as in Bayesian inference and other machine learning tasks. Different from other particle methods, SVGD does not require estimating the score, which is the gradient of the log-density. However, in practice, SVGD can be slow compared to score-estimation-based sampling algorithms. To design a fast and efficient high-dimensional sampling algorithm with the advantages of SVGD, we introduce accelerated SVGD (ASVGD), based on an accelerated gradient flow in a metric space of probability densities following Nesterov's method. We then derive a momentum-based discrete-time sampling algorithm, which evolves a set of particles deterministically. To stabilize the particles' position update, we also include a Wasserstein metric regularization. This paper extends the conference version \cite{SL2025}. For the bilinear kernel and Gaussian target distributions, we study the kernel parameter and damping parameters with an optimal convergence rate of the proposed dynamics. This is achieved by analyzing the linearized accelerated gradient flows at the equilibrium. Interestingly, the optimal parameter is a constant, which does not depend on the covariance of the target distribution. For the generalized kernel functions, such as the Gaussian kernel, numerical examples with varied target distributions demonstrate the effectiveness of ASVGD compared to SVGD and other popular sampling methods. Furthermore, we show that in the setting of Bayesian neural networks, ASVGD outperforms SVGD significantly in terms of log-likelihood and total iteration times.
- North America > United States > South Carolina > Richland County > Columbia (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)
Preserving Vector Space Properties in Dimensionality Reduction: A Relationship Preserving Loss Framework
Weinwurm, Eddi, Kovalenko, Alexander
Dimensionality reduction can distort vector space properties such as orthogonality and linear independence, which are critical for tasks including cross-modal retrieval, clustering, and classification. We propose a Relationship Preserving Loss (RPL), a loss function that preserves these properties by minimizing discrepancies between relationship matrices (e.g., Gram or cosine) of high-dimensional data and their low-dimensional embeddings. RPL trains neural networks for non-linear projections and is supported by error bounds derived from matrix perturbation theory. Initial experiments suggest that RPL reduces embedding dimensions while largely retaining performance on downstream tasks, likely due to its preservation of key vector space properties. While we describe here the use of RPL in dimensionality reduction, this loss can also be applied more broadly, for example to cross-domain alignment and transfer learning, knowledge distillation, fairness and invariance, dehubbing, graph and manifold learning, and federated learning, where distributed embeddings must remain geometrically consistent.
- North America > United States > New York (0.04)
- North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
- Europe > Czechia > Prague (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.83)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.82)
Transparent Semantic Spaces: A Categorical Approach to Explainable Word Embeddings
Fabregat-Hernández, Ares, Palanca, Javier, Botti, Vicent
The paper introduces a novel framework based on category theory to enhance the explainability of artificial intelligence systems, particularly focusing on word embeddings. Furthermore, the paper defines the categories of configurations Conf and word embeddings Emb, accompanied by the concept of divergence as a decoration on Emb . It establishes a mathematically precise method for comparing word embeddings, demonstrating the equivalence between the GloVe and Word2Vec algorithms and the metric MDS algorithm, transitioning from neural network algorithms (black box) to a transparent framework. Finally, the paper presents a mathematical approach to computing biases before embedding and offers insights on mitigating biases at the semantic space level, advancing the field of explainable artificial intelligence. Introduction Word embeddings have emerged as a cornerstone in natural language processing (NLP) and machine learning (ML) applications, revolutionizing the representation of textual data (see [IUS23]). At the heart of word em-beddings lies the idea of capturing semantic relationships between words in a continuous vector space, enabling machines to understand and process human language more effectively (see [HAMJ16, LG14]). By mapping words to high-dimensional vectors, word embeddings encode semantic similarities and syntactic structures, thereby facilitating a wide array of downstream tasks such as sentiment analysis, named entity recognition, machine translation, and document classification. In addition to enhancing model performance and accuracy, word embeddings offer several practical advantages in ML applications. They provide a compact and dense representation of textual data, enabling efficient storage, retrieval, and computation. Moreover, word embeddings capture contextual nuances and semantic meanings that traditional bag-of-words or one-hot encoding schemes fail to capture, leading to more nuanced and context-aware language understanding. As such, word embeddings serve as foundational building blocks for a broad spectrum of ML tasks, empowering researchers and practitioners to unlock new capabilities in language understanding and processing. In recent years, word embeddings have become indispensable tools for natural language processing tasks, offering compact representations of textual data that capture semantic relationships between words. Biases embedded in the training data can be perpetuated in word embeddings, leading to unfair associations and stereotypes. Addressing these challenges requires interdisciplinary efforts from researchers in machine learning, natural language processing, and ethics. Strategies such as debiasing techniques, dimensionality reduction methods, and transparency-enhancing approaches are being actively explored to mitigate these challenges and improve the reliability and fairness of word embeddings in practical applications.
- Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.35)
Metric spaces of walks and Lipschitz duality on graphs
Arnau, R., Cortés, A. González, Pérez, E. A. Sánchez, Sanjuan, S.
The suggested procedure involves integrating the proximity function null P as a mechanism to guide exploration on the space of walks. While the use of null P that we have explained has focused on classification and metric analysis, its geometric interpretation and ability to quantify similarity between walks suggest a broader applicability, particularly in settings where the reward landscape is sparse or the graph structure is too large for exhaustive exploration. We propose an improvement to the exploration strategy used in reinforcement learning algorithms that incrementally construct walks within graph-based environments. Traditionally, these algorithms alternate between exploitation (choosing the next node to maximize an estimated reward) and exploration (randomly selecting a new node). The novelty lies in replacing random exploration with a proximity-guided strategy using a function null P . Instead of sampling uniformly, the agent compares potential path extensions to a reference set of high-reward walks, prioritizing those that are most similar in structure. This approach introduces a more informed, data-driven method for exploration, focusing on areas of the graph that resemble previously successful trajectories.
- Europe > Spain (0.04)
- Asia > Middle East > Israel (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.55)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.51)
Elliptical Perturbations for Differential Privacy
We study elliptical distributions in locally convex vector spaces, and determine conditions when they can or cannot be used to satisfy differential privacy (DP). A requisite condition for a sanitized statistical summary to satisfy DP is that the corresponding privacy mechanism must induce equivalent probability measures for all possible input databases. We show that elliptical distributions with the same dispersion operator, C, are equivalent if the difference of their means lies in the Cameron-Martin space of C . In the case of releasing finite-dimensional summaries using elliptical perturbations, we show that the privacy parameter null can be computed in terms of a one-dimensional maximization problem. We apply this result to consider multivariate Laplace, t, Gaussian, and K -norm noise. Surprisingly, we show that the multivariate Laplace noise does not achieve null -DP in any dimension greater than one. Finally, we show that when the dimension of the space is infinite, no elliptical distribution can be used to give null -DP; only (null,δ)-DP is possible.
- North America > United States > Pennsylvania (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- North America > Canada (0.04)
- (3 more...)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.68)
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.35)