Provably Efficient Reinforcement Learning with Linear Function Approximation under Adaptivity Constraints
We study reinforcement learning (RL) with linear function approximation under the adaptivity constraint. We consider two popular limited adaptivity models: the batch learning model and the rare policy switch model, and propose two efficient online RL algorithms for episodic linear Markov decision processes, where the transition probability and the reward function can be represented as a linear function of some known feature mapping.
A Safely Imitating a Neural Policy - S
In this section, we describe our projection algorithm for piecewise linear policies in more detail. Since linear policies are differentiable, we adopt a projected gradient descent approach. Algorithm 3 Safely imitating a network using a given starting point and partition. Now our safe imitation algorithm is described in Algorithm 3. In each iteration, we first compute a safe region in the parameter space of g This is done by starting with a region bigger than the gradient step size and then iteratively searching for unsafe controllers and trimming the region to remove them. This trimming process continues until S can be verified using abstract interpretation [8].
448d5eda79895153938a8431919f4c9f-AuthorFeedback.pdf
First, we would like to thank the reviewers for their helpful feedback. It is difficult/impossible to generate a worst case environment model in many scenarios. MDP, as opposed to the entirety of the agent's dynamics. Writing such worst-case constraints by hand is often feasible. How efficient is our algorithm?
A Appendix
A.1 Illustration of group actions This section is intended to provide a visual, more intuitive understanding of the different group actions on the tensors of our network. We begin with a visualization of the group action for the input space. We exemplify it over the sequence GGACT, whose reverse complement is AGTCC. The representation with arbitrary P can mix an arbitrary number of channels together with the group action. Cohen et al. [11, Theorem 3.3] gives a general result about linear equivariant mapping.
Stylus: Automatic Adapter Selection for Diffusion Models Michael Luo 1 Justin Wong 1 Brandon Trabucco 2 Yanping Huang 3
Beyond scaling base models with more data or parameters, fine-tuned adapters provide an alternative way to generate high fidelity, custom images at reduced costs. As such, adapters have been widely adopted by open-source communities, accumulating a database of over 100K adapters--most of which are highly customized with insufficient descriptions. To generate high quality images, This paper explores the problem of matching the prompt to a set of relevant adapters, built on recent work that highlight the performance gains of composing adapters. We introduce Stylus, which efficiently selects and automatically composes task-specific adapters based on a prompt's keywords. Stylus outlines a three-stage approach that first summarizes adapters with improved descriptions and embeddings, retrieves relevant adapters, and then further assembles adapters based on prompts' keywords by checking how well they fit the prompt. To evaluate Stylus, we developed StylusDocs, a curated dataset featuring 75K adapters with pre-computed adapter embeddings. In our evaluation on popular Stable Diffusion checkpoints, Stylus achieves greater CLIP/FID Pareto efficiency and is twice as preferred, with humans and multimodal models as evaluators, over the base model.
) Communication Complexity in Federated Min-Max Learning
Federated min-max learning has received increasing attention in recent years thanks to its wide range of applications in various learning paradigms. Similar to the conventional federated learning for empirical risk minimization problems, communication complexity also emerges as one of the most critical concerns that affects the future prospect of federated min-max learning. To lower the communication complexity of federated min-max learning, a natural approach is to utilize the idea of infrequent communications (through multiple local updates) same as in conventional federated learning. However, due to the more complicated inter-outer problem structure in federated min-max learning, theoretical understandings of communication complexity for federated min-max learning with infrequent communications remain very limited in the literature. This is particularly true for settings with non-i.i.d.
Controllable and Compositional Generation with Latent-Space Energy-Based Models
Controllable generation is one of the key requirements for successful adoption of deep generative models in real-world applications, but it still remains as a great challenge. In particular, the compositional ability to generate novel concept combinations is out of reach for most current models. In this work, we use energybased models (EBMs) to handle compositional generation over a set of attributes. To make them scalable to high-resolution image generation, we introduce an EBM in the latent space of a pre-trained generative model such as StyleGAN. We propose a novel EBM formulation representing the joint distribution of data and attributes together, and we show how sampling from it is formulated as solving an ordinary differential equation (ODE).
Optimal Adaptive Electrode Selection to Maximize Simultaneously Recorded Neuron Yield
Neural-Matrix style, high-density electrode arrays for brain-machine interfaces (BMIs) and neuroscientific research require the use of multiplexing: Each recording channel can be routed to one of several electrode sites on the array. This capability allows the user to flexibly distribute recording channels to the locations where the most desirable neural signals can be resolved. For example, in the Neuropixel probe, 960 electrodes can be addressed by 384 recording channels. However, currently no adaptive methods exist to use recorded neural data to optimize/customize the electrode selections per recording context. Here, we present an algorithm called classification-based selection (CBS) that optimizes the joint electrode selections for all recording channels so as to maximize isolation quality of detected neurons.