Plotting

is a generally well written paper, exploring a "potentially impressive " (R2), "useful " (R1, R3) and "novel " (R1, R2)

Neural Information Processing Systems

We thank the reviewers for their consideration of our paper and for their feedback. There appears to be one major concern and 4 minor questions/suggestions, which we kindly address below. The main concern of R2 seems to be that the paper relies on "disentanglement scores, which are computed based on The key motivation (but also assumption) of these works is that current notions of disentanglement (MIG, DCI, etc.) Until now there has been little empirical evidence verifying this. We present a heuristic to select fair representations. As described in Section 4.2, as a by-product of our investigation, R1-R2: Motivation for adjusted metrics in Section 4.2. We compute the adjusted metrics to answer the question "Given two representations with the same downstream R1: Chain of arguments in "How do we identify fair models?"


Structured Unrestricted-Rank Matrices for Parameter Efficient Fine-tuning

Neural Information Processing Systems

Recent efforts to scale Transformer models have been successful across a wide range of tasks [77]. However, fine-tuning these models for downstream tasks can be expensive, as it requires updating a large number of parameters in the Transformer model. Parameter-efficient fine-tuning (PEFT) approaches have emerged as a viable alternative that allow us to fine-tune models by updating only a small number of parameters. In this work, we propose a general framework for parameter efficient fine-tuning using structured unrestricted-rank matrices (SURM), which can serve as a drop-in replacement for popular approaches such as Adapters and LoRA. Unlike other methods like LoRA, SURMs provides more flexibility in finding the right balance between compactness and expressiveness. This is achieved by using low displacement rank matrices (LDRMs), which has not been used in this context before. SURMs remain competitive with baselines, often providing significant quality improvements while using a smaller parameter budget. SURMs achieve 5-7% accuracy gains on various image classification tasks while replacing lowrank matrices in LoRA. It also results in up to 12x reduction of the number of parameters in adapters (with virtually no loss in quality) on the GLUE benchmark.


An Accelerated Algorithm for Stochastic Bilevel Optimization under Unbounded Smoothness

Neural Information Processing Systems

This paper investigates a class of stochastic bilevel optimization problems where the upper-level function is nonconvex with potentially unbounded smoothness and the lower-level problem is strongly convex. These problems have significant applications in sequential data learning, such as text classification using recurrent neural networks. The unbounded smoothness is characterized by the smoothness constant of the upper-level function scaling linearly with the gradient norm, lacking a uniform upper bound.



1b33d16fc562464579b7199ca3114982-AuthorFeedback.pdf

Neural Information Processing Systems

We would like to thank all the reviewers for their effort, and their thoughtful comments. Being formal, it should be "the gradient associated to the pullback of f along exp". We will change it to "on which standard convergence results still apply". Thm 4.3 We will change "is equivalent" to The same can be said about higher order methods. We chose not to mention them in the main paper for simplicity.


STEER: Simple Temporal Regularization For Neural ODEs Arnab Ghosh Harkirat Singh Behl Emilien Dupont University of Oxford

Neural Information Processing Systems

Training Neural Ordinary Differential Equations (ODEs) is often computationally expensive. Indeed, computing the forward pass of such models involves solving an ODE which can become arbitrarily complex during training. Recent works have shown that regularizing the dynamics of the ODE can partially alleviate this. In this paper we propose a new regularization technique: randomly sampling the end time of the ODE during training. The proposed regularization is simple to implement, has negligible overhead and is effective across a wide variety of tasks. Further, the technique is orthogonal to several other methods proposed to regularize the dynamics of ODEs and as such can be used in conjunction with them. We show through experiments on normalizing flows, time series models and image recognition that the proposed regularization can significantly decrease training time and even improve performance over baseline models.


Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

Neural Information Processing Systems

Posterior sampling for reinforcement learning (PSRL) is an effective method for balancing exploration and exploitation in reinforcement learning. Randomised value functions (RVF) can be viewed as a promising approach to scaling PSRL. However, we show that most contemporary algorithms combining RVF with neural network function approximation do not possess the properties which make PSRL effective, and provably fail in sparse reward problems. Moreover, we find that propagation of uncertainty, a property of PSRL previously thought important for exploration, does not preclude this failure. We use these insights to design Successor Uncertainties (SU), a cheap and easy to implement RVF algorithm that retains key properties of PSRL. SU is highly effective on hard tabular exploration benchmarks. Furthermore, on the Atari 2600 domain, it surpasses human performance on 38 of 49 games tested (achieving a median human normalised score of 2.09), and outperforms its closest RVF competitor, Bootstrapped DQN, on 36 of those.


Two-way Deconfounder for Off-policy Evaluation in Causal Reinforcement Learning

Neural Information Processing Systems

Inspired by the two-way fixed effects regression model widely used in the panel data literature, we propose a two-way unmeasured confounding assumption to model the system dynamics in causal reinforcement learning and develop a two-way deconfounder algorithm that devises a neural tensor network to simultaneously learn both the unmeasured confounders and the system dynamics, based on which a model-based estimator can be constructed for consistent policy value estimation. We illustrate the effectiveness of the proposed estimator through theoretical results and numerical experiments.


Motion Forecasting in Continuous Driving Nan Song 1 Li Zhang

Neural Information Processing Systems

Motion forecasting for agents in autonomous driving is highly challenging due to the numerous possibilities for each agent's next action and their complex interactions in space and time. In real applications, motion forecasting takes place repeatedly and continuously as the self-driving car moves. However, existing forecasting methods typically process each driving scene within a certain range independently, totally ignoring the situational and contextual relationships between successive driving scenes. This significantly simplifies the forecasting task, making the solutions suboptimal and inefficient to use in practice. To address this fundamental limitation, we propose a novel motion forecasting framework for continuous driving, named RealMotion. It comprises two integral streams both at the scene level: (1) The scene context stream progressively accumulates historical scene information until the present moment, capturing temporal interactive relationships among scene elements.