Transparent Semantic Spaces: A Categorical Approach to Explainable Word Embeddings
Fabregat-Hernández, Ares, Palanca, Javier, Botti, Vicent
–arXiv.org Artificial Intelligence
The paper introduces a novel framework based on category theory to enhance the explainability of artificial intelligence systems, particularly focusing on word embeddings. Furthermore, the paper defines the categories of configurations Conf and word embeddings Emb, accompanied by the concept of divergence as a decoration on Emb . It establishes a mathematically precise method for comparing word embeddings, demonstrating the equivalence between the GloVe and Word2Vec algorithms and the metric MDS algorithm, transitioning from neural network algorithms (black box) to a transparent framework. Finally, the paper presents a mathematical approach to computing biases before embedding and offers insights on mitigating biases at the semantic space level, advancing the field of explainable artificial intelligence. Introduction Word embeddings have emerged as a cornerstone in natural language processing (NLP) and machine learning (ML) applications, revolutionizing the representation of textual data (see [IUS23]). At the heart of word em-beddings lies the idea of capturing semantic relationships between words in a continuous vector space, enabling machines to understand and process human language more effectively (see [HAMJ16, LG14]). By mapping words to high-dimensional vectors, word embeddings encode semantic similarities and syntactic structures, thereby facilitating a wide array of downstream tasks such as sentiment analysis, named entity recognition, machine translation, and document classification. In addition to enhancing model performance and accuracy, word embeddings offer several practical advantages in ML applications. They provide a compact and dense representation of textual data, enabling efficient storage, retrieval, and computation. Moreover, word embeddings capture contextual nuances and semantic meanings that traditional bag-of-words or one-hot encoding schemes fail to capture, leading to more nuanced and context-aware language understanding. As such, word embeddings serve as foundational building blocks for a broad spectrum of ML tasks, empowering researchers and practitioners to unlock new capabilities in language understanding and processing. In recent years, word embeddings have become indispensable tools for natural language processing tasks, offering compact representations of textual data that capture semantic relationships between words. Biases embedded in the training data can be perpetuated in word embeddings, leading to unfair associations and stereotypes. Addressing these challenges requires interdisciplinary efforts from researchers in machine learning, natural language processing, and ethics. Strategies such as debiasing techniques, dimensionality reduction methods, and transparency-enhancing approaches are being actively explored to mitigate these challenges and improve the reliability and fairness of word embeddings in practical applications.
arXiv.org Artificial Intelligence
Aug-29-2025
- Country:
- Europe
- Spain > Valencian Community
- Valencia Province > Valencia (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Spain > Valencian Community
- Europe
- Genre:
- Research Report (0.63)
- Technology: