cost constraint
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Research Report > New Finding (0.46)
- Overview (0.46)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Estonia > Tartu County > Tartu (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (4 more...)
- Research Report (0.69)
- Overview (0.46)
Prompt Optimization as a State-Space Search Problem
Language Models are extremely susceptible to performance collapse with even small changes to input prompt strings. Libraries such as DSpy (from Stanford NLP) avoid this problem through demonstration-based prompt optimisation. Inspired by this, I propose an alternative approach that treats prompt optimisation as a classical state-space search problem. I model the prompt space as a graph where nodes represent prompt states and edges correspond to deliberate transformations such as shortening, adding examples, or re- ordering content. Using beam search and random walk algorithms, I systematically explore this space, evaluating candidates on development sets and pruning unpromising branches. Across five NLP tasks (sentiment classification, question answering, summarisation, reason- ing, and natural language inference), I find that even shallow search configurations (beam width=2, depth=2) improve upon seed prompts on development sets. For instance, beam search achieves development accuracy gains from 0.40 to 0.80 on reasoning tasks, though test set improvements are more modest (0.20 to 0.50), indicating overfitting to the develop- ment heuristic. Analysis of successful optimisation paths reveals that transformations that make prompts concise appear most frequently, while verbosity operators are never selected. My results validate prompt optimization as a search problem and suggest that with greater computational resources and improved evaluation metrics, deeper exploration could yield more robust prompts that generalize beyond development sets. Code and implementation are available at [https://github.com/MaanasTaneja/PromptOptimiser].
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Estonia > Tartu County > Tartu (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Online Optimization for Offline Safe Reinforcement Learning
Chemingui, Yassine, Deshwal, Aryan, Fern, Alan, Nguyen-Tang, Thanh, Doppa, Janardhan Rao
We study the problem of Offline Safe Reinforcement Learning (OSRL), where the goal is to learn a reward-maximizing policy from fixed data under a cumulative cost constraint. We propose a novel OSRL approach that frames the problem as a minimax objective and solves it by combining offline RL with online optimization algorithms. We prove the approximate optimality of this approach when integrated with an approximate offline RL oracle and no-regret online optimization. We also present a practical approximation that can be combined with any offline RL algorithm, eliminating the need for offline policy evaluation. Empirical results on the DSRL benchmark demonstrate that our method reliably enforces safety constraints under stringent cost budgets, while achieving high rewards. The code is available at https://github.com/yassineCh/O3SRL.
- North America > United States > Washington (0.04)
- North America > United States > Oregon (0.04)
- North America > United States > New Jersey (0.04)
- North America > United States > Minnesota (0.04)
A Proof of the strong duality (4) In this section, we explain why the equalities (4) hold when the problem (r, c, B
The first and third equalities are straightforward. We restate a result extracted from the monograph by Luenberger [1969]. It relies on the dual functional φ, whose expression we recall below. Theorem 2 (stated as Theorem 1 in Section 8.6, page 224 in Luenberger, 1969) . " is required to apply the theorem.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Research Report > New Finding (0.46)
- Overview (0.46)
- North America > United States > California (0.28)
- Asia > China > Guangdong Province (0.14)
- Research Report (0.46)
- Instructional Material > Course Syllabus & Notes (0.45)
- Energy > Power Industry (1.00)
- Energy > Oil & Gas > Upstream (0.46)