Fast Slate Policy Optimization: Going Beyond Plackett-Luce

Sakhi, Otmane, Rohde, David, Chopin, Nicolas

Dec-29-2023–arXiv.org Machine Learning

An increasingly important building block of large scale machine learning systems is based on returning slates; an ordered lists of items given a query. Applications of this technology include: search, information retrieval and recommender systems. When the action space is large, decision systems are restricted to a particular structure to complete online queries quickly. This paper addresses the optimization of these large scale decision systems given an arbitrary reward function. We cast this learning problem in a policy optimization framework and propose a new class of policies, born from a novel relaxation of decision functions. This results in a simple, yet efficient learning algorithm that scales to massive action spaces. We compare our method to the commonly adopted Plackett-Luce policy class and demonstrate the effectiveness of our approach on problems with action space sizes in the order of millions.

information retrieval, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

Dec-29-2023

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Virginia > Arlington County
      - Arlington (0.04)
    - New York > New York County
      - New York City (0.05)
    - California
      - San Francisco County > San Francisco (0.14)
      - Santa Clara County > Stanford (0.04)
  - Canada > British Columbia
    - Metro Vancouver Regional District > Vancouver (0.04)
- Europe
  - Italy > Tuscany
    - Florence (0.04)
  - France > Hauts-de-France
    - Nord > Lille (0.04)
- Asia > Middle East
  - Republic of Türkiye > Batman Province > Batman (0.04)

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (0.88)
  - Natural Language > Information Retrieval (0.66)
  - Machine Learning
    - Neural Networks (0.93)
    - Statistical Learning (0.68)