Unraveling the Localized Latents: Learning Stratified Manifold Structures in LLM Embedding Space with Sparse Mixture-of-Experts
–arXiv.org Artificial Intelligence
However, real-world data often exhibit complex local structures that can be challenging for single-model approaches with a smooth global manifold in the embedding space to unravel. In this work, we conjecture that in the latent space of these large language models, the embeddings live in a local manifold structure with different dimensions depending on the perplexities and domains of the input data, commonly referred to as a Stratified Manifold structure, which in combination form a structured space known as a Stratified Space. To investigate the validity of this structural claim, we propose an analysis framework based on a Mixture-of-Experts (MoE) model where each expert is implemented with a simple dictionary learning algorithm at varying sparsity levels. By incorporating an attention-based soft-gating network, we verify that our model learns specialized sub-manifolds for an ensemble of input data sources, reflecting the semantic stratification in LLM embedding space. We further analyze the intrinsic dimensions of these stratified sub-manifolds and present extensive statistics on expert assignments, gating entropy, and inter-expert distances. Our experimental results demonstrate that our method not only validates the claim of a stratified manifold structure in the LLM embedding space, but also provides interpretable clusters that align with the intrinsic semantic variations of the input data.
arXiv.org Artificial Intelligence
Feb-19-2025
- Country:
- Asia > Middle East
- Jordan (0.04)
- Europe
- Slovakia > Bratislava
- Bratislava (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Slovakia > Bratislava
- North America > United States
- California > Alameda County
- Berkeley (0.04)
- New Jersey > Middlesex County
- Piscataway (0.14)
- Oregon > Multnomah County
- Portland (0.04)
- Washington > King County
- Seattle (0.04)
- California > Alameda County
- Asia > Middle East
- Genre:
- Research Report > New Finding (0.34)
- Technology: