Representation Of Examples
Metrics for Parametric Families of Networks
Gómez, Mario, Ma, Guanqun, Needham, Tom, Wang, Bei
We introduce a general framework for analyzing data modeled as parameterized families of networks. Building on a Gromov-Wasserstein variant of optimal transport, we define a family of parameterized Gromov-Wasserstein distances for comparing such parametric data, including time-varying metric spaces induced by collective motion, temporally evolving weighted social networks, and random graph models. We establish foundational properties of these distances, showing that they subsume several existing metrics in the literature, and derive theoretical approximation guarantees. In particular, we develop computationally tractable lower bounds and relate them to graph statistics commonly used in random graph theory. Furthermore, we prove that our distances can be consistently approximated in random graph and random metric space settings via empirical estimates from generative models. Finally, we demonstrate the practical utility of our framework through a series of numerical experiments.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Ohio (0.04)
- North America > United States > Michigan (0.04)
- (2 more...)
- Health & Medicine (0.46)
- Information Technology (0.34)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.55)
Hyperbolic Coarse-to-Fine Few-Shot Class-Incremental Learning
In the field of machine learning, hyperbolic space demonstrates superior representation capabilities for hierarchical data compared to conventional Euclidean space. This work focuses on the Coarse-To-Fine Few-Shot Class-Incremental Learning (C2FSCIL) task. Our study follows the Knowe approach, which contrastively learns coarse class labels and subsequently normalizes and freezes the classifier weights of learned fine classes in the embedding space. To better interpret the "coarse-to-fine" paradigm, we propose embedding the feature extractor into hyperbolic space. Specifically, we employ the Poincaré ball model of hyperbolic space, enabling the feature extractor to transform input images into feature vectors within the Poincaré ball instead of Euclidean space. We further introduce hyperbolic contrastive loss and hyperbolic fully-connected layers to facilitate model optimization and classification in hyperbolic space. Additionally, to enhance performance under few-shot conditions, we implement maximum entropy distribution in hyperbolic space to estimate the probability distribution of fine-class feature vectors. This allows generation of augmented features from the distribution to mitigate overfitting during training with limited samples. Experiments on C2FSCIL benchmarks show that our method effectively improves both coarse and fine class accuracies.
- Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)
- Europe > Switzerland (0.04)
- Asia > China (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.56)
Maximum diversity, weighting and invariants of time series
Magnitude, obtained as a special case of Euler characteristic of enriched category, represents a sense of the size of metric spaces and is related to classical notions such as cardinality, dimension, and volume. While the studies have explained the meaning of magnitude from various perspectives, continuity also gives a valuable view of magnitude. Based on established results about continuity of magnitude and maximum diversity, this article focuses on continuity of weighting, a distribution whose totality is magnitude, and its variation corresponding to maximum diversity. Meanwhile, recent studies also illuminated the connection between magnitude and data analysis by applying magnitude theory to point clouds representing the data or the set of model parameters. This article will also provide an application for time series analysis by introducing a new kind of invariants of periodic time series, where the invariance follows directly from the continuity results. As a use-case, a simple machine learning experiment is conducted with real-world data, in which the suggested invariants improved the performance.
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.38)
Towards understanding Accelerated Stein Variational Gradient Flow -- Analysis of Generalized Bilinear Kernels for Gaussian target distributions
Stein variational gradient descent (SVGD) is a kernel-based and non-parametric particle method for sampling from a target distribution, such as in Bayesian inference and other machine learning tasks. Different from other particle methods, SVGD does not require estimating the score, which is the gradient of the log-density. However, in practice, SVGD can be slow compared to score-estimation-based sampling algorithms. To design a fast and efficient high-dimensional sampling algorithm with the advantages of SVGD, we introduce accelerated SVGD (ASVGD), based on an accelerated gradient flow in a metric space of probability densities following Nesterov's method. We then derive a momentum-based discrete-time sampling algorithm, which evolves a set of particles deterministically. To stabilize the particles' position update, we also include a Wasserstein metric regularization. This paper extends the conference version \cite{SL2025}. For the bilinear kernel and Gaussian target distributions, we study the kernel parameter and damping parameters with an optimal convergence rate of the proposed dynamics. This is achieved by analyzing the linearized accelerated gradient flows at the equilibrium. Interestingly, the optimal parameter is a constant, which does not depend on the covariance of the target distribution. For the generalized kernel functions, such as the Gaussian kernel, numerical examples with varied target distributions demonstrate the effectiveness of ASVGD compared to SVGD and other popular sampling methods. Furthermore, we show that in the setting of Bayesian neural networks, ASVGD outperforms SVGD significantly in terms of log-likelihood and total iteration times.
- North America > United States > South Carolina > Richland County > Columbia (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)
Preserving Vector Space Properties in Dimensionality Reduction: A Relationship Preserving Loss Framework
Weinwurm, Eddi, Kovalenko, Alexander
Dimensionality reduction can distort vector space properties such as orthogonality and linear independence, which are critical for tasks including cross-modal retrieval, clustering, and classification. We propose a Relationship Preserving Loss (RPL), a loss function that preserves these properties by minimizing discrepancies between relationship matrices (e.g., Gram or cosine) of high-dimensional data and their low-dimensional embeddings. RPL trains neural networks for non-linear projections and is supported by error bounds derived from matrix perturbation theory. Initial experiments suggest that RPL reduces embedding dimensions while largely retaining performance on downstream tasks, likely due to its preservation of key vector space properties. While we describe here the use of RPL in dimensionality reduction, this loss can also be applied more broadly, for example to cross-domain alignment and transfer learning, knowledge distillation, fairness and invariance, dehubbing, graph and manifold learning, and federated learning, where distributed embeddings must remain geometrically consistent.
- North America > United States > New York (0.04)
- North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
- Europe > Czechia > Prague (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.83)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.82)
Transparent Semantic Spaces: A Categorical Approach to Explainable Word Embeddings
Fabregat-Hernández, Ares, Palanca, Javier, Botti, Vicent
The paper introduces a novel framework based on category theory to enhance the explainability of artificial intelligence systems, particularly focusing on word embeddings. Furthermore, the paper defines the categories of configurations Conf and word embeddings Emb, accompanied by the concept of divergence as a decoration on Emb . It establishes a mathematically precise method for comparing word embeddings, demonstrating the equivalence between the GloVe and Word2Vec algorithms and the metric MDS algorithm, transitioning from neural network algorithms (black box) to a transparent framework. Finally, the paper presents a mathematical approach to computing biases before embedding and offers insights on mitigating biases at the semantic space level, advancing the field of explainable artificial intelligence. Introduction Word embeddings have emerged as a cornerstone in natural language processing (NLP) and machine learning (ML) applications, revolutionizing the representation of textual data (see [IUS23]). At the heart of word em-beddings lies the idea of capturing semantic relationships between words in a continuous vector space, enabling machines to understand and process human language more effectively (see [HAMJ16, LG14]). By mapping words to high-dimensional vectors, word embeddings encode semantic similarities and syntactic structures, thereby facilitating a wide array of downstream tasks such as sentiment analysis, named entity recognition, machine translation, and document classification. In addition to enhancing model performance and accuracy, word embeddings offer several practical advantages in ML applications. They provide a compact and dense representation of textual data, enabling efficient storage, retrieval, and computation. Moreover, word embeddings capture contextual nuances and semantic meanings that traditional bag-of-words or one-hot encoding schemes fail to capture, leading to more nuanced and context-aware language understanding. As such, word embeddings serve as foundational building blocks for a broad spectrum of ML tasks, empowering researchers and practitioners to unlock new capabilities in language understanding and processing. In recent years, word embeddings have become indispensable tools for natural language processing tasks, offering compact representations of textual data that capture semantic relationships between words. Biases embedded in the training data can be perpetuated in word embeddings, leading to unfair associations and stereotypes. Addressing these challenges requires interdisciplinary efforts from researchers in machine learning, natural language processing, and ethics. Strategies such as debiasing techniques, dimensionality reduction methods, and transparency-enhancing approaches are being actively explored to mitigate these challenges and improve the reliability and fairness of word embeddings in practical applications.
- Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.35)
Metric spaces of walks and Lipschitz duality on graphs
Arnau, R., Cortés, A. González, Pérez, E. A. Sánchez, Sanjuan, S.
The suggested procedure involves integrating the proximity function null P as a mechanism to guide exploration on the space of walks. While the use of null P that we have explained has focused on classification and metric analysis, its geometric interpretation and ability to quantify similarity between walks suggest a broader applicability, particularly in settings where the reward landscape is sparse or the graph structure is too large for exhaustive exploration. We propose an improvement to the exploration strategy used in reinforcement learning algorithms that incrementally construct walks within graph-based environments. Traditionally, these algorithms alternate between exploitation (choosing the next node to maximize an estimated reward) and exploration (randomly selecting a new node). The novelty lies in replacing random exploration with a proximity-guided strategy using a function null P . Instead of sampling uniformly, the agent compares potential path extensions to a reference set of high-reward walks, prioritizing those that are most similar in structure. This approach introduces a more informed, data-driven method for exploration, focusing on areas of the graph that resemble previously successful trajectories.
- Europe > Spain (0.04)
- Asia > Middle East > Israel (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.55)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.51)
Elliptical Perturbations for Differential Privacy
We study elliptical distributions in locally convex vector spaces, and determine conditions when they can or cannot be used to satisfy differential privacy (DP). A requisite condition for a sanitized statistical summary to satisfy DP is that the corresponding privacy mechanism must induce equivalent probability measures for all possible input databases. We show that elliptical distributions with the same dispersion operator, C, are equivalent if the difference of their means lies in the Cameron-Martin space of C . In the case of releasing finite-dimensional summaries using elliptical perturbations, we show that the privacy parameter null can be computed in terms of a one-dimensional maximization problem. We apply this result to consider multivariate Laplace, t, Gaussian, and K -norm noise. Surprisingly, we show that the multivariate Laplace noise does not achieve null -DP in any dimension greater than one. Finally, we show that when the dimension of the space is infinite, no elliptical distribution can be used to give null -DP; only (null,δ)-DP is possible.
- North America > United States > Pennsylvania (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- North America > Canada (0.04)
- (3 more...)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.68)
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.35)
- Asia > Singapore (0.04)
- North America > Canada (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.51)
- Information Technology > Data Science > Data Mining > Big Data (0.48)
Deep Language Geometry: Constructing a Metric Space from LLM Weights
Shamrai, Maksym, Hamolia, Vladyslav
We introduce a novel framework that utilizes the internal weight activations of modern Large Language Models (LLMs) to construct a metric space of languages. Unlike traditional approaches based on hand-crafted linguistic features, our method automatically derives high-dimensional vector representations by computing weight importance scores via an adapted pruning algorithm. Our approach captures intrinsic language characteristics that reflect linguistic phenomena. We validate our approach across diverse datasets and multilingual LLMs, covering 106 languages. The results align well with established linguistic families while also revealing unexpected inter-language connections that may indicate historical contact or language evolution. The source code, computed language latent vectors, and visualization tool are made publicly available at https://github.com/mshamrai/deep-language-geometry.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.73)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)