ROIDICE: Offline Return on Investment Maximization for Efficient Decision Making Woosung Kim Hayeong Lee 1 Jongmin Lee 2 Byung-Jun Lee

May-28-2025, 14:23:16 GMT–Neural Information Processing Systems

In this paper, we propose a novel policy optimization framework that maximizes Return on Investment (ROI) of a policy using a fixed dataset within a Markov Decision Process (MDP) equipped with a cost function. ROI, defined as the ratio between the return and the accumulated cost of a policy, serves as a measure of the efficiency of the policy. Despite the importance of maximizing ROI in various applications, it remains a challenging problem due to its nature as a ratio of two long-term values: return and accumulated cost. To address this, we formulate the ROI maximizing reinforcement learning problem as linear fractional programming. We then incorporate the stationary distribution correction (DICE) framework to develop a practical offline ROI maximization algorithm. Our proposed algorithm, ROIDICE, yields an efficient policy that offers a superior trade-off between return and accumulated cost compared to policies trained using existing frameworks.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

May-28-2025, 14:23:16 GMT

Conferences PDF

Add feedback

Genre:
- Research Report > Experimental Study (0.93)

Industry:
- Banking & Finance > Trading (0.93)
- Education (0.66)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.34)
    - Reinforcement Learning (1.00)
  - Representation & Reasoning > Optimization (1.00)