Exploring Large Action Sets with Hyperspherical Embeddings using von Mises-Fisher Sampling

Bendada, Walid, Salha-Galvan, Guillaume, Hennequin, Romain, Bontempelli, Théo, Bouabça, Thomas, Cazenave, Tristan

Jul-2-2025–arXiv.org Artificial Intelligence

This paper introduces von Mises-Fisher exploration (vMF-exp), a scalable method for exploring large action sets in reinforcement learning problems where hyperspherical embedding vectors represent these actions. vMF-exp involves initially sampling a state embedding representation using a von Mises-Fisher distribution, then exploring this representation's nearest neighbors, which scales to virtually unlimited numbers of candidate actions. We show that, under theoretical assumptions, vMF-exp asymptotically maintains the same probability of exploring each action as Boltzmann Exploration (B-exp), a popular alternative that, nonetheless, suffers from scalability issues as it requires computing softmax values for each action. Consequently, vMF-exp serves as a scalable alternative to B-exp for exploring large action sets with hyperspherical embeddings. Experiments on simulated data, real-world public data, and the successful large-scale deployment of vMF-exp on the recommender system of a global music streaming service empirically validate the key properties of the proposed method.

hyperspherical embedding, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

Jul-2-2025

arXiv.org PDF

Add feedback

Country:
- North America (0.45)
- Asia (0.28)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (0.92)

Industry:
- Media > Music (1.00)
- Leisure & Entertainment (1.00)

Technology:
- Information Technology
  - Data Science > Data Mining (1.00)
  - Artificial Intelligence
    - Natural Language (1.00)
    - Machine Learning > Reinforcement Learning (1.00)
    - Representation & Reasoning > Personal Assistant Systems (0.88)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found