Not enough data to create a plot.
Try a different view from the menu above.
Distributional Successor Features Enable Zero-Shot Policy Optimization
Intelligent agents must be generalists, capable of quickly adapting to various tasks. In reinforcement learning (RL), model-based RL learns a dynamics model of the world, in principle enabling transfer to arbitrary reward functions through planning. However, autoregressive model rollouts suffer from compounding error, making model-based RL ineffective for long-horizon problems. Successor features offer an alternative by modeling a policy's long-term state occupancy, reducing policy evaluation under new rewards to linear regression. Yet, policy optimization with successor features can be challenging. This work proposes a novel class of models, i.e., Distributional Successor Features for Zero-Shot Policy Optimization (DiSPOs), that learn a distribution of successor features of a stationary dataset's behavior policy, along with a policy that acts to realize different successor features within the dataset. By directly modeling long-term outcomes in the dataset, DiSPOs avoid compounding error while enabling a simple scheme for zero-shot policy optimization across reward functions. We present a practical instantiation of DiSPOs using diffusion models and show their efficacy as a new class of transferable models, both theoretically and empirically across various simulated robotics problems.
90080022263cddafddd4a0726f1fb186-Paper-Conference.pdf
Recent research has developed several Monte Carlo methods for estimating the normalization constant (partition function) based on the idea of annealing. This means sampling successively from a path of distributions that interpolate between a tractable "proposal" distribution and the unnormalized "target" distribution. Prominent estimators in this family include annealed importance sampling and annealed noise-contrastive estimation (NCE). Such methods hinge on a number of design choices: which estimator to use, which path of distributions to use and whether to use a path at all; so far, there is no definitive theory on which choices are efficient. Here, we evaluate each design choice by the asymptotic estimation error it produces.
Large Language Models as Urban Residents: An LLM Agent Framework for Personal Mobility Generation Jiawei Wang 1 Chuang Yang 1 Zengqing Wu
This paper introduces a novel approach using Large Language Models (LLMs) integrated into an agent framework for flexible and effective personal mobility generation. LLMs overcome the limitations of previous models by effectively processing semantic data and offering versatility in modeling various tasks.
Learning Optimal Tax Design in Nonatomic Congestion Games Maryam Fazel Paul G. Allen School of Computer Science Department of Electrical Engineering
In multiplayer games, self-interested behavior among the players can harm the social welfare. Tax mechanisms are a common method to alleviate this issue and induce socially optimal behavior. In this work, we take the initial step of learning the optimal tax that can maximize social welfare with limited feedback in congestion games. We propose a new type of feedback named equilibrium feedback, where the tax designer can only observe the Nash equilibrium after deploying a tax plan. Existing algorithms are not applicable due to the exponentially large tax function space, nonexistence of the gradient, and nonconvexity of the objective. To tackle these challenges, we design a computationally efficient algorithm that leverages several novel components: (1) a piece-wise linear tax to approximate the optimal tax; (2) extra linear terms to guarantee a strongly convex potential function; (3) an efficient subroutine to find the exploratory tax that can provide critical information about the game.
Appendix A Dataset Details
We evaluate TPSR and several baseline methods on the following four standard benchmark datasets: Feynman, Black-box, and Strogatz from SRBench [42], and In-domain Synthetic Data generated based on [18]. More details on each of these datasets are given below. The regression input points (x, y) from these equations are provided in Penn Machine Learning Benchmark (PMLB) [42, 43] and have been studied in SRBench [42] for the symbolic regression task. The input dimension is limited to d 10 and the true underlying function of points is known. We split the dataset into B bags of 200 input points (when N is larger than 200) since the transformer SR model is pretrained on N 200 input points as per [18]. The input points for these problems are included in PMLB [43] and have been examined in SRBench [42] for symbolic regression. The input dimension for these problems is restricted to d = 2 and the true underlying functions are provided. The aim of SR study on these black-box datasets is to find an interpretable model expression that fits the data effectively.