bidirectional
GM Wants Your Electric Car to Power Your House--and Your Neighborhood
The automaker today is turning on vehicle-to-grid charging for its GM Energy customers. Will people actually use it? Some 250,000 electric vehicles manufactured by General Motors are driving around the US today--right now!--with an oft-secret capability: Their big, powerful batteries can charge other things. Potentially appliances, homes, and now, thanks to a software update pushed by the automaker this week, an electrical grid . Twelve of GM's EVs have this "bidirectional charging" capability, way more than US competitors' battery-electrics.
Bounds all around: training energy-based models with bidirectional bounds
Energy-based models (EBMs) provide an elegant framework for density estimation, but they are notoriously difficult to train. Recent work has established links to generative adversarial networks, where the EBM is trained through a minimax game with a variational value function. We propose a bidirectional bound on the EBM log-likelihood, such that we maximize a lower bound and minimize an upper bound when solving the minimax game. We link one bound to a gradient penalty that stabilizes training, thereby provide grounding for best engineering practice. To evaluate the bounds we develop a new and efficient estimator of the Jacobi-determinant of the EBM generator. We demonstrate that these developments stabilize training and yield high-quality density estimation and sample generation.
Simple Context Compression: Mean-Pooling and Multi-Ratio Training
A common strategy to reduce the computational costs of using long contexts in retrieval-augmented generation (RAG) with large language models (LLMs) is soft context compression, where the input sequence is transformed into a shorter continuous representation. We develop a lightweight and simple mean-pooling approach that consistently outperforms the widely used compression-tokens architecture, and study training the same compressor to output multiple compression ratios. We conduct extensive experiments across in-domain and out-of-domain QA datasets, as well as across model families, scales, and compression ratios. Overall, our simple mean-pooling approach achieves the strongest performance, with a relatively small drop when training for multiple compression ratios. More broadly though, across architectures and training regimes the trade-offs are more nuanced, illustrating the complex landscape of compression methods.
Bounds all around training energy based models with bidirectional bounds Supplementary Material
A.1 Proof of Theorem 1 Proof log null E The first inequality is derived by Holder's inequality, so Existence is ensured as long as the chosen activation functions have at least one derivative almost everywhere. Smooth activations naturally satisfy this assumption, but it is worth noting that e.g. the ReLU activation We cannot guarantee that the Jacobian has full rank through clever choices of neural architectures. This is a natural requirement for the generator anyway. In our model, we aim to maximize the entropy of the generator, which encourages the generator to create as diverse samples as possible. In practice this ensures that the Jacobian has full rank as a degenerate Jacobian implies a reduction of entropy.
Fractal Language Modelling by Universal Sequence Maps (USM)
Almeida, Jonas S, Russ, Daniel E, Vinga, Susana, Duarte, Ines, Mason, Lee, Bhawsar, Praphulla, Ge, Aaron, Oliveira, Arlindo, Balasubramanian, Jeya Balaji
Motivation: With the advent of Language Models using Transformers, popularized by ChatGPT, there is a renewed interest in exploring encoding procedures that numerically represent symbolic sequences at multiple scales and embedding dimensions. The challenge that encoding addresses is the need for mechanisms that uniquely retain contextual information about the succession of individual symbols, which can then be modeled by nonlinear formulations such as neural networks. Context: Universal Sequence Maps(USM) are iterated functions that bijectively encode symbolic sequences onto embedded numerical spaces. USM is composed of two Chaos Game Representations (CGR), iterated forwardly and backwardly, that can be projected into the frequency domain (FCGR). The corresponding USM coordinates can be used to compute a Chebyshev distance metric as well as k-mer frequencies, without having to recompute the embedded numeric coordinates, and, paradoxically, allowing for non-integers values of k. Results: This report advances the bijective fractal encoding by Universal Sequence Maps (USM) by resolving seeding biases affecting the iterated process. The resolution had two results, the first expected, the second an intriguing outcome: 1) full reconciliation of numeric positioning with sequence identity; and 2) uncovering the nature of USM as an efficient numeric process converging towards a steady state sequence embedding solution. We illustrate these results for genomic sequences because of the convenience of a planar representation defined by an alphabet with only 4 tokens (the 4 nucleotides). Nevertheless, the application to alphabet of arbitrary cardinality was found to be straightforward.
Accurate, fast, cheap: Choose three. Replacing Multi-Head-Attention with Bidirectional Recurrent Attention for Long-Form ASR
Ratajczak, Martin, Robichaud, Jean-Philippe, Fox, Jennifer Drexler
Long-form speech recognition is an application area of increasing research focus. ASR models based on multi-head attention (MHA) are ill-suited to long-form ASR because of their quadratic complexity in sequence length. We build on recent work that has investigated linear complexity recurrent attention (RA) layers for ASR. We find that bidirectional RA layers can match the accuracy of MHA for both short- and long-form applications. We present a strong limited-context attention (LCA) baseline, and show that RA layers are just as accurate while being more efficient. We develop a long-form training paradigm which further improves RA performance, leading to better accuracy than LCA with 44% higher throughput. We also present Direction Dropout, a novel regularization method that improves accuracy, provides fine-grained control of the accuracy/throughput trade-off of bidirectional RA, and enables a new alternating directions decoding mode with even higher throughput.
An Exploration of Mamba for Speech Self-Supervised Models
Lin, Tzu-Quan, Kuo, Heng-Cheng, Wei, Tzu-Chieh, Cheng, Hsi-Chun, Chen, Chun-Wei, Hsiao, Hsien-Fu, Tsao, Yu, Lee, Hung-yi
--While Mamba has demonstrated strong performance in language modeling, its potential as a speech self-supervised (SSL) model remains underexplored, with prior studies limited to isolated tasks. T o address this, we explore Mamba-based HuBERT models as alternatives to Transformer-based SSL architectures. Leveraging the linear-time Selective State Space, these models enable fine-tuning on long-context ASR with significantly lower compute. Moreover, they show superior performance when fine-tuned for streaming ASR. Beyond fine-tuning, these models show competitive performance on SUPERB probing benchmarks, particularly in causal settings. Our analysis shows that they yield higher-quality quantized representations and capture speaker-related features more distinctly than Transformer-based models. In recent years, Transformer-based models and their multi-head self-attention mechanisms have achieved remarkable success across various domains [1]-[3].
Exploring bidirectional bounds for minimax-training of Energy-based models
Geng, Cong, Wang, Jia, Chen, Li, Gao, Zhiyong, Frellsen, Jes, Hauberg, Sรธren
Energy-based models (EBMs) estimate unnormalized densities in an elegant framework, but they are generally difficult to train. Recent work has linked EBMs to generative adversarial networks, by noting that they can be trained through a minimax game using a variational lower bound. To avoid the instabilities caused by minimizing a lower bound, we propose to instead work with bidirectional bounds, meaning that we maximize a lower bound and minimize an upper bound when training the EBM. We investigate four different bounds on the log-likelihood derived from different perspectives. We derive lower bounds based on the singular values of the generator Jacobian and on mutual information. To upper bound the negative log-likelihood, we consider a gradient penalty-like bound, as well as one based on diffusion processes. In all cases, we provide algorithms for evaluating the bounds. We compare the different bounds to investigate, the pros and cons of the different approaches. Finally, we demonstrate that the use of bidirectional bounds stabilizes EBM training and yields high-quality density estimation and sample generation.
Enhanced SIRRT*: A Structure-Aware RRT* for 2D Path Planning with Hybrid Smoothing and Bidirectional Rewiring
-- Sampling-based motion planners such as Rapidly-exploring Random Tree* (RRT*) and its informed variant IRRT* are widely used for optimal path planning in complex environments. However, these methods often suffer from slow convergence and high variance due to their reliance on random sampling, particularly when initial solution discovery is delayed. This paper presents Enhanced SIRRT* (E-SIRRT*), a structure-aware planner that improves upon the original SIRRT* framework by introducing two key enhancements: hybrid path smoothing and bidirectional rewiring. Hybrid path smoothing refines the initial path through spline fitting and collision-aware correction, while bidirectional rewiring locally optimizes tree connectivity around the smoothed path to improve cost propagation. Experimental results demonstrate that E-SIRRT* consistently outperforms IRRT* and SIRRT* in terms of initial path quality, convergence rate, and robustness across 100 trials. Unlike IRRT*, which exhibits high variability due to stochastic initialization, E-SIRRT* achieves repeatable and efficient performance through deterministic skeleton-based initialization and structural refinement.