auxiliary function
3b91129cf07287aac3de7b8adba2196f-Supplemental-Conference.pdf
As we choosec2 = 2 and c3 = 0, all the conditions in Theorem 2.2 are satisfied. Therefore, the exponential stability of the zero solution is assured. ApplyingItหo'sformulato log x yields: log x log x(0) = Convergence Time for ESFrom the conditions in Theorem 2.2 and the equivalent condition (10),(11), the hyperparameterb for ES can be substantially regarded as A.3.5 MoreDiscussionofAS/ES Understanding ASLoss We utilize the formula x 2(2 x,F(x) + G(x) 2F) (2 ฮฑ) x G(x) 2 q(x) in (3) to construct the AS loss. Here we explain this term in more detail. Particularly, we sample5,000pointsfrom ( ฯ1,, ฯ20) ( ฮธ1,, ฮธ19) U([0,5]20) U([ 5,5]19), and get the information of the dynamics based on system(18) with (19) and (20).
Reverse engineering recurrent neural networks with Jacobian switching linear dynamical systems
Recurrent neural networks (RNNs) are powerful models for processing time-series data, but it remains challenging to understand how they function. Improving this understanding is of substantial interest to both the machine learning and neuroscience communities. The framework of reverse engineering a trained RNN by linearizing around its fixed points has provided insight, but the approach has significant challenges. These include difficulty choosing which fixed point to expand around when studying RNN dynamics and error accumulation when reconstructing the nonlinear dynamics with the linearized dynamics. We present a new model that overcomes these limitations by co-training an RNN with a novel switching linear dynamical system (SLDS) formulation. A first-order Taylor series expansion of the co-trained RNN and an auxiliary function trained to pick out the RNN's fixed points govern the SLDS dynamics. The results are a trained SLDS variant that closely approximates the RNN, an auxiliary function that can produce a fixed point for each point in state-space, and a trained nonlinear RNN whose dynamics have been regularized such that its first-order terms perform the computation, if possible. This model removes the post-training fixed point optimization and allows us to unambiguously study the learned dynamics of the SLDS at any point in state-space.
LLM Collaboration With Multi-Agent Reinforcement Learning
Liu, Shuo, Chen, Tianle, Liang, Zeyu, Lyu, Xueguang, Amato, Christopher
A large amount of work has been done in Multi-Agent Systems (MAS) for modeling and solving problems with multiple interacting agents. However, most LLMs are pretrained independently and not specifically optimized for coordination. For example, existing LLM fine-tuning frameworks rely on individual rewards, which require complex reward designs for each agent to encourage collaboration. To address this challenge, we model LLM collaboration as a cooperative Multi-Agent Reinforcement Learning (MARL) problem. We develop a multi-agent, multi-turn algorithm, Multi-Agent Group Relative Policy Optimization (MAGRPO), to solve it, building on current RL approaches for LLMs as well as MARL techniques. Our experiments on LLM writing and coding collaboration demonstrate that fine-tuning multiple LLMs with MAGRPO enables agents to generate high-quality responses efficiently through effective cooperation. Our approach opens the door to using MARL methods for LLM collaboration and highlights the associated challenges.
Why and How Auxiliary Tasks Improve JEPA Representations
Yu, Jiacan, Chen, Siyi, Liu, Mingrui, Horiuchi, Nono, Braverman, Vladimir, Xu, Zicheng, Haramati, Dan, Balestriero, Randall
Joint-Embedding Predictive Architecture (JEPA) is increasingly used for visual representation learning and as a component in model-based RL, but its behavior remains poorly understood. We provide a theoretical characterization of a simple, practical JEPA variant that has an auxiliary regression head trained jointly with latent dynamics. We prove a No Unhealthy Representation Collapse theorem: in deterministic MDPs, if training drives both the latent-transition consistency loss and the auxiliary regression loss to zero, then any pair of non-equivalent observations, i.e., those that do not have the same transition dynamics or auxiliary value, must map to distinct latent representations. Thus, the auxiliary task anchors which distinctions the representation must preserve. Controlled ablations in a counting environment corroborate the theory and show that training the JEPA model jointly with the auxiliary head generates a richer representation than training them separately. Our work indicates a path to improve JEPA encoders: training them with an auxiliary function that, together with the transition dynamics, encodes the right equivalence relations.