Short-Long Policy Evaluation with Novel Actions

Nam, Hyunji Alex, Chandak, Yash, Brunskill, Emma

Jul-9-2024–arXiv.org Artificial Intelligence

From incorporating LLMs in education, to identifying new drugs and improving ways to charge batteries, innovators constantly try new strategies in search of better long-term outcomes for students, patients and consumers. One major bottleneck in this innovation cycle is the amount of time it takes to observe the downstream effects of a decision policy that incorporates new interventions. The key question is whether we can quickly evaluate long-term outcomes of a new decision policy without making long-term observations. Organizations often have access to prior data about past decision policies and their outcomes, evaluated over the full horizon of interest. Motivated by this, we introduce a new setting for short-long policy evaluation for sequential decision making tasks. Our proposed methods significantly outperform prior results on simulators of HIV treatment, kidney dialysis and battery charging. We also demonstrate that our methods can be useful for applications in AI safety by quickly identifying when a new decision policy is likely to have substantially lower performance than past policies.

battery, decision policy, evaluation, (16 more...)

arXiv.org Artificial Intelligence

Jul-9-2024

arXiv.org PDF

Add feedback

Country:
- Europe > France (0.04)
- North America > United States
  - New York > New York County
    - New York City (0.04)
  - California > Santa Clara County
    - Palo Alto (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report > Experimental Study (0.46)

Industry:
- Health & Medicine > Therapeutic Area
  - Infections and Infectious Diseases (1.00)
  - Immunology > HIV (0.98)

Technology:
- Information Technology
  - Data Science > Data Mining (1.00)
  - Artificial Intelligence
    - Natural Language (0.87)
    - Machine Learning
      - Statistical Learning (1.00)
      - Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found