Appendix

Feb-8-2026, 12:35:10 GMT–Neural Information Processing Systems

Inthis section, we provide additional discussions of applying decision-focused learning toMDPs problems. Specifically, the assumption on smooth policy is similar to the idea of soft Q-learning [12] and soft actor-critic [13]proposed by Haarnoja et al. The randomly initiated neural network uses ReLU layers asnonlinearity followed byalinear layer intheend. Training parameters Across all three examples, we consider the discounted setting where the discount factor isγ = 0.95. Torelax the optimal policygivenbythe RL solver,we relax the Bellman equation used to run value-iteration by relaxing all the argmax and max operators in theBellman equation tosoftmax with temperature0.1,i.e., weuseSOFTMAX(0.1 Q-values)to replace all the argmax over Q values.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Feb-8-2026, 12:35:10 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (0.70)
  - Neural Networks (0.49)

Duplicate Docs Excel Report

Title
49e863b146f3b5470ee222ee84669b1c-Supplemental.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found