Credit Assignment Through Broadcasting a Global Error Vector

Neural Information Processing Systems

Backpropagation (BP) uses detailed, unit-specific feedback to train deep neural networks (DNNs) with remarkable success. That biological neural circuits appear to perform credit assignment, but cannot implement BP, implies the existence of other powerful learning algorithms. Here, we explore the extent to which a globally broadcast learning signal, coupled with local weight updates, enables training of DNNs. We present both a learning rule, called global error-vector broadcasting (GEVB), and a class of DNNs, called vectorized nonnegative networks (VNNs), in which this learning rule operates. VNNs have vector-valued units and nonnegative weights past the first layer. The GEVB learning rule generalizes three-factor Hebbian learning, updating each weight by an amount proportional to the inner product of the presynaptic activation and a globally broadcast error vector when the postsynaptic unit is active. We prove that these weight updates are matched in sign to the gradient, enabling accurate credit assignment. Moreover, at initialization, these updates are exactly proportional to the gradient in the limit of infinite network width. GEVB matches the performance of BP in VNNs, and in some cases outperforms direct feedback alignment (DFA) applied in conventional networks.


SymILO: A Symmetry-Aware Learning Framework for Integer Linear Optimization

Neural Information Processing Systems

Integer linear programs (ILPs) are commonly employed to model diverse practical problems such as scheduling and planning. Recently, machine learning techniques have been utilized to solve ILPs. A straightforward idea is to train a model via supervised learning, with an ILP as the input and an optimal solution as the label. An ILP is symmetric if its variables can be permuted without changing the problem structure, resulting in numerous equivalent and optimal solutions. Randomly selecting an optimal solution as the label can introduce variability in the training data, which may hinder the model from learning stable patterns. In this work, we incorporate the intrinsic symmetry of ILPs and propose a novel training framework called SymILO. Specifically, we modify the learning task by introducing solution permutation along with neural network weights as learnable parameters and then design an alternating algorithm to jointly optimize the loss function. We conduct extensive experiments on ILPs involving different symmetries and the computational results demonstrate that our symmetry-aware approach significantly outperforms three existing methods---achieving 50.3%, 66.5%, and 45.4% average improvements, respectively.


Transcoders Find Interpretable LLM Feature Circuits

Neural Information Processing Systems

A key goal in mechanistic interpretability is circuit analysis: finding sparse subgraphs of models corresponding to specific behaviors or capabilities. However, MLP sublayers make fine-grained circuit analysis on transformer-based language models difficult. In particular, interpretable features--such as those found by sparse autoencoders (SAEs)--are typically linear combinations of extremely many neurons, each with its own nonlinearity to account for. Circuit analysis in this setting thus either yields intractably large circuits or fails to disentangle local and global behavior. To address this we explore transcoders, which seek to faithfully approximate a densely activating MLP layer with a wider, sparsely-activating MLP layer. We introduce a novel method for using transcoders to perform weights-based circuit analysis through MLP sublayers.


Understanding the Detrimental Class-level Effects of Data Augmentation: Supplementary Material

Neural Information Processing Systems

Following [1], we train ResNet-50 models for 88 epochs with SGD with momentum 0.9, using batch We use PyTorch [45], automatic mixed precision training with torch.amp We are focusing on analyzing model's behavior on the classes which were negatively Raghunathan et al. [47] showed standard error in linear regression could increase when While Miller et al. [41] showed In Table 1 we show the classes most negatively affected in accuracy by strong data augmentation (column "Affected class k") and the confusions the model starts making more frequently with stronger augmentation ("Confused class l"). In particular, we study the union of 50 classes most affected in original accuracy and 50 classes most affected in ReaL accuracy (see Section D) which do not belong to the animal subtree in WordNet tree. To quantitatively estimate the confusion type for each pair of classes, we measure the intrinsic distribution overlap of the classes and their semantic similarity. In this work we use ReaL labels released in Beyer et al. [4] to account for the label noise in evaluation However, one could merge the two multi-label annotation sets from Beyer et al. [4] and "siberian huskies are also eskimo dogs", "coffee mug is also "cassette player is also a tape player" [64].


Understanding the Detrimental Class-level Effects of Data Augmentation Mark Ibrahim 2 Randall Balestriero 2 Diane Bouchacourt 2

Neural Information Processing Systems

Data augmentation (DA) encodes invariance and provides implicit regularization critical to a model's performance in image classification tasks. However, while DA improves average accuracy, recent studies have shown that its impact can be highly class dependent: achieving optimal average accuracy comes at the cost of significantly hurting individual class accuracy by as much as 20% on ImageNet. There has been little progress in resolving class-level accuracy drops due to a limited understanding of these effects. In this work, we present a framework for understanding how DA interacts with class-level learning dynamics. Using higherquality multi-label annotations on ImageNet, we systematically categorize the affected classes and find that the majority are inherently ambiguous, co-occur, or involve fine-grained distinctions, while DA controls the model's bias towards one of the closely related classes. While many of the previously reported performance drops are explained by multi-label annotations, our analysis of class confusions reveals other sources of accuracy degradation. We show that simple class-conditional augmentation strategies informed by our framework improve performance on the negatively affected classes.


Differentiable Quality Diversity

Neural Information Processing Systems

Quality diversity (QD) is a growing branch of stochastic optimization research that studies the problem of generating an archive of solutions that maximize a given objective function but are also diverse with respect to a set of specified measure functions. However, even when these functions are differentiable, QD algorithms treat them as "black boxes", ignoring gradient information. We present the differentiable quality diversity (DQD) problem, a special case of QD, where both the objective and measure functions are first order differentiable. We then present MAP-Elites via a Gradient Arborescence (MEGA), a DQD algorithm that leverages gradient information to efficiently explore the joint range of the objective and measure functions. Results in two QD benchmark domains and in searching the latent space of a StyleGAN show that MEGA significantly outperforms state-ofthe-art QD algorithms, highlighting DQD's promise for efficient quality diversity optimization when gradient information is available. Source code is available at https://github.com/icaros-usc/dqd.


from a S PECTRL

Neural Information Processing Systems

F, for all j {0,..., k}, we have s From Lemma A.2, we get that ζ From Lemma A.2 we conclude that ζ These definitions are a standard extension of Boolean logic to real values. In particular, they preserve (1)--i.e., b |= s if and only if b q This stronger property follows from a straightforward induction on φ.


Compositional Reinforcement Learning from Logical Specifications Kishor Jothimurugan Suguman Bansal University of Pennsylvania University of Pennsylvania Osbert Bastani

Neural Information Processing Systems

We study the problem of learning control policies for complex tasks given by logical specifications. Recent approaches automatically generate a reward function from a given specification and use a suitable reinforcement learning algorithm to learn a policy that maximizes the expected reward. These approaches, however, scale poorly to complex tasks that require high-level planning.



Structured Matrix Basis for Multivariate Time Series Forecasting with Interpretable Dynamics Xiaodan Chen

Neural Information Processing Systems

Multivariate time series forecasting is of central importance in modern intelligent decision systems. The dynamics of multivariate time series are jointly characterized by temporal dependencies and spatial correlations. Hence, it is equally important to build the forecasting models from both perspectives. The real-world multivariate time series data often presents spatial correlations that show structures and evolve dynamically. To capture such dynamic spatial structures, the existing forecasting approaches often rely on a two-stage learning process (learning dynamic series representations and then generating spatial structures), which is sensitive to the small time-window input data and has high variance. To address this, we propose a novel forecasting model with a structured matrix basis. At its core is a dynamic spatial structure generation function whose output space is well-constrained and the generated structures have lower variance, meanwhile, it is more expressive and can offer interpretable dynamics. This is achieved via a novel structured parameterization and imposing structure regularization on the matrix basis. The resulting forecasting model can achieve up to 8.5% improvements over the existing methods on six benchmark datasets, and meanwhile, it enables us to gain insights into the dynamics of underlying systems.