Plotting


UniFL: Improve Latent Diffusion Model via Unified Feedback Learning

Neural Information Processing Systems

Latent diffusion models (LDM) have revolutionized text-to-image generation, leading to the proliferation of various advanced models and diverse downstream applications. However, despite these significant advancements, current diffusion models still suffer from several limitations, including inferior visual quality, inadequate aesthetic appeal, and inefficient inference, without a comprehensive solution in sight. To address these challenges, we present UniFL, a unified framework that leverages feedback learning to enhance diffusion models comprehensively. UniFL stands out as a universal, effective, and generalizable solution applicable to various diffusion models, such as SD1.5 and SDXL. Notably, UniFL consists of three key components: perceptual feedback learning, which enhances visual quality; decoupled feedback learning, which improves aesthetic appeal; and adversarial feedback learning, which accelerates inference.


A Notation and basic definitions

Neural Information Processing Systems

H is a separable Hilbert space. X is a Polish space (we will require explicitly compactness in some theorems). We also assume it to be uniformly bounded i.e. sup B.1 Proof of Proposition 1 In this section, let us extend the definition in Eq. (4) to any operator A 2S(H), without the implied positivity restriction (in Eq. (4), we ask that A 0): 8A 2S(H), 8x 2X,f To prove linearity, let A, B 2S(H)and, 2 R. Since S(H) is a vector space, A + B 2S(H). Ah 0. In particular, for any x 2X,the previous inequality applied to h = (x) yields f B.2 Proof of Proposition 2 Recall the definition of f We have the lemma: Lemma 1 (Linearity of evaluations). B.3.1 Compact operators and spectral functions In this section, we briefly introduce compact self-adjoint operators and the spectral theory of compact self-adjoint operators.


Knowledge Composition using Task Vectors with Learned Anisotropic Scaling

Neural Information Processing Systems

Pre-trained models produce strong generic representations that can be adapted via fine-tuning on specialised datasets. The learned weight difference relative to the pre-trained model, known as a task vector, characterises the direction and stride of fine-tuning that enables the model to capture these specialised representations. The significance of task vectors is such that simple arithmetic operations on them can be used to combine diverse representations from different domains. This paper builds on these properties of task vectors and aims to answer (1) whether components of task vectors, particularly parameter blocks, exhibit similar characteristics, and (2) how such blocks can be used to enhance knowledge composition and transfer. To this end, we introduce aTLAS, an algorithm that linearly combines parameter blocks with different learned coefficients, resulting in anisotropic scaling at the task vector level.


A Appendix A.1 Additional Method Justification The key idea of Q

Neural Information Processing Systems

Since our objective in SLRL is to finish the task as soon as possible, and we may not be given expert demonstrations as prior data, we want to match the state-action pairs to those that lead to task completion. This problem has been studied in stochastic optimal control, particularly REPS [Peters et al., 2010]. A.2 Implementation Details and Hyperparameters In our experiments, we use soft actor-critic [Haarnoja et al., 2018] as our base RL algorithm. We use default hyperparameter values: a learning rate of 3e-4 for all networks, optimized using Adam, with a batch size of 256 sampled from the entire replay buffer (both prior and online data), a discount factor of 0.99. The policy and critic networks are MLPs with 2 fully-connected hidden layers of size 256.


You Only Live Once: Single-Life Reinforcement Learning Annie S. Chen 1, Chelsea Finn 1

Neural Information Processing Systems

Reinforcement learning algorithms are typically designed to learn a performant policy that can repeatedly and autonomously complete a task, usually starting from scratch. However, in many real-world situations, the goal might not be to learn a policy that can do the task repeatedly, but simply to perform a new task successfully once in a single trial. For example, imagine a disaster relief robot tasked with retrieving an item from a fallen building, where it cannot get direct supervision from humans. It must retrieve this object within one test-time trial, and must do so while tackling unknown obstacles, though it may leverage knowledge it has of the building before the disaster. We formalize this problem setting, which we call single-life reinforcement learning (SLRL), where an agent must complete a task within a single episode without interventions, utilizing its prior experience while contending with some form of novelty. SLRL provides a natural setting to study the challenge of autonomously adapting to unfamiliar situations, and we find that algorithms designed for standard episodic reinforcement learning often struggle to recover from out-of-distribution states in this setting.


A Proof for Equation (7) in Section 3.2

Neural Information Processing Systems

In Section 3.2, we propose a shifting operation in eq. Below, we summarize the shifting operation and prove its efficacy in proposition A.1. As presented in Section 3.2, for an f-divergence, its convex conjugate generator function f The environments we used for our experiments are from the OpenAI Gym [10] including the CartPole [8] from the classic RL literature, and five complex tasks simulated with MuJoCo [32], such as HalfCheetah, Hopper, Reacher, Walker, and Humanoid with task screenshots and version numbers shown in Figure 1. Note that behavior cloning (BC) employs the same structure to train a policy network with supervised learning. The reward signal network used in GAIL, BC+GAIL, AIRL, RKL-VIM and f-GAIL are all composed of three hidden layers of 100 units each with first two layers activated with tanh, and the final activation layers listed in Tab. 3. Details of f For the ablation study in Sec 4.3, we changed the number of linear layers to be 1, 2, 4 and 7 (with 100 nodes per layer) and the number of nodes per layer to be 25, 50, 100, and 200 (with 4 layers).


f-GAIL: Learning f-Divergence for Generative Adversarial Imitation Learning

Neural Information Processing Systems

Imitation learning (IL) aims to learn a policy from expert demonstrations that minimizes the discrepancy between the learner and expert behaviors. Various imitation learning algorithms have been proposed with different pre-determined divergences to quantify the discrepancy. This naturally gives rise to the following question: Given a set of expert demonstrations, which divergence can recover the expert policy more accurately with higher data efficiency? In this work, we propose f-GAIL, a new generative adversarial imitation learning (GAIL) model, that automatically learns a discrepancy measure from the f-divergence family as well as a policy capable of producing expert-like behaviors. Compared with IL baselines with various predefined divergence measures, f-GAIL learns better policies with higher data efficiency in six physics-based control tasks.


aims to match the state-action distributions between the learner and the

Neural Information Processing Systems

Thank reviewers for the comments. Please find our responses below, with reference indices consistent with the paper. Q3-5: Meaning of the learned divergence? We agree that BC minimizes the policy KL divergence as what we noted in Sec. 4 (line 200). It is consistent with the literature, e.g., Sec. 2 in [Yu et al. arXiv:1909.09314].


7c3465ba08732cc2db38f070bfae601a-Supplemental-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing Systems

Currently the dataset can be downloaded under this link (2.2 GB, compressed tar file): The Muscles in Time dataset will be published under a CC BY-NC 4.0 license as defined under Our data generation pipeline is licensed under Apache License Version 2.0 as defined under Data structure The structure of the provided MinT data is intentionally kept simple. The first and last 0.14 seconds are cut off since the muscle activation A short example on the musint package usage is displayed in Listing 2. The musint package can be installed via pip install musint. In Figure 9 we provide additional information on the data analyzed provided with Muscles in Time. Total Capture makes up a small part of the dataset with exceptionally long sequences. Dataset provides the largest contribution with 3.2h of analyzed recordings.