fval
Adaptivity Under Realizability Constraints: Comparing In-Context and Agentic Learning
Kratsios, Anastasis, Neuman, A. Martina, Petersen, Philipp
We compare in-context learning with fixed queries and agentic learning with adaptive queries for uniform approximation of task families. We consider two settings: an unrestricted regime, where querying and approximation are arbitrary functions, and a realizable regime, where we require these operations to be implemented by ReLU neural networks. In both settings, adaptivity never hinders approximation performance. However, this advantage can change when one passes from the unrestricted regime to the realizable regime. We identify four distinct approximation scenarios, each witnessed by an explicit task family: (a) no advantage of adaptivity; (b) an advantage in the unrestricted regime that persists under ReLU realizability; (c) an advantage that arises only under realizability; and (d) an advantage that disappears under realizability. This demonstrates that representational constraints interact profoundly with the effect of adaptivity.
Theory-InspiredPath-RegularizedDifferential NetworkArchitectureSearch(SupplementaryFile)
Next, we also report the average gate activate probability in the normal and reduction cells in Figure 1 (b). At the beginning of the search, we initialize the activation probability of each gate to be one. SameasDARTS, we alternatively update the network parameterW and the architecture parameterβ via gradient descent which is detailed in Algorithm 1. When we compute the gradient βFBtrain(W,β), we ignore the second-order Hessian to accelerate the computation which is the sameasfirst-orderDARTS. For brevity, we usually ignore the notation(k) and i and use X(l) to denote the outputX(l) of any sampleXi ( i = 1,,n) in the l-th layer at any iteration.
Optimizing Expectation with Guarantees in POMDPs
Chatterjee, Krishnendu (The Institute of Science and Technology Austria) | Novotný, Petr (The Institute of Science and Technology Austria) | Pérez, Guillermo A. (Université Libre de Bruxelles) | Raskin, Jean-François (Université Libre de Bruxelles) | Žikelić, Đorđe (University of Cambridge)
A standard objective in partially-observable Markov decision processes (POMDPs) is to find a policy that maximizes the expected discounted-sum payoff. However, such policies may still permit unlikely but highly undesirable outcomes, which is problematic especially in safety-critical applications. Recently, there has been a surge of interest in POMDPs where the goal is to maximize the probability to ensure that the payoff is at least a given threshold, but these approaches do not consider any optimization beyond satisfying this threshold constraint. In this work we go beyond both the “expectation” and “threshold” approaches and consider a “guaranteed payoff optimization (GPO)” problem for POMDPs, where we are given a threshold t and the objective is to find a policy σ such that a) each possible outcome of σ yields a discounted-sum payoff of at least t, and b) the expected discounted-sum payoff of σ is optimal (or near-optimal) among all policies satisfying a). We present a practical approach to tackle the GPO problem and evaluate it on standard POMDP benchmarks.