conv
Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection
Recent advances, such as Mamba, further enhance SSMs with inputdependent gating and hardware-aware implementations, positioning them as strong alternatives to Transformers for long sequence modeling. However, efficiently scaling the expressive power of SSMs, particularly with Mixture of Experts (MoE), remains challenging, as naive integration attempts often falter or degrade performance. In this work, we introduce Routing Mamba (RoM), a novel approach that scales SSM parameters using sparse mixtures of linear projection experts.
Compositional Neural Network Verification via Assume-Guarantee Reasoning
Verifying the behavior of neural networks is necessary if developers are to confidently deploy them as parts of mission-critical systems. Toward this end, researchers have been actively developing a range of increasingly sophisticated and scalable neural network verifiers. However, scaling verification to large networks is challenging, at least in part due to the significant memory requirements of verification algorithms. In this paper, we propose an assume-guarantee compositional framework, CoVeNN, that is parameterized by an underlying verifier to generate a sequence of verification sub-problems to address this challenge. We present an iterative refinement-based strategy for computing assumptions that allow sub-problems to retain sufficient accuracy. An evaluation using 7 neural networks and a total of 140 property specifications demonstrates that CoVeNN can verify nearly 7 times more problems than state-of-the-art verifiers.
Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning
Autoregressive (AR) language models generate text one token at a time, which limits their inference speed. Diffusion-based language models offer a promising alternative, as they can decode multiple tokens in parallel. However, we identify a key bottleneck in current diffusion LMs: the \textbf{long decoding-window problem}, where tokens generated far from the input context often become irrelevant or repetitive. Previous solutions like semi-autoregressive address this issue by splitting windows into blocks (sacrificing bidirectionality), but we find that this also leads to \textbf{time-interval expansion problem}, sacrificing the speed. Therefore, semi-AR eliminates the main advantages of diffusion models. To overcome this, we propose Convolutional decoding (\textit{Conv}), a normalization-based method that narrows the decoding window without hard segmentation, leading to better fluency and flexibility. Additionally, we introduce Rejecting Rule-based Fine-Tuning (R2FT), a post-hoc training scheme that better aligns tokens at positions far from context. Our methods achieve state-of-the-art results on open-ended generation benchmarks (e.g., AlpacaEval) among diffusion LM baselines, with significantly lower step size than previous works, demonstrating both speed and quality improvements. The code is available online (\url{https://github.com/ybseo-ac/Conv}).
Supplementary information for: Natural image synthesis for the retina with variational information bottleneck representation
To obtain a bound on the Information Bottleneck Gaussian Process (IB-GP) objective, we use the Markov chain constraint Y X Z and the factorized joint distribution [2]: p(X,Y,Z) = p(Y|X,Z)p(Z|X)p(X) = p(Y|X)p(Z|X)p(X) (1) to expand the mutual information terms in LIB = max I(Z,Y) βI(Z,X) . Henceforth, we use the stochastic encoder pϕ(Z|X)parameterized by ϕas an approximation for p(Z|X). In practice computation of H(Z) might be intractable (even though P(Z)is well defined). Therefore, a variational approximation ρ(Z) is used in place of p(Z) such that KL(p(Z),ρ(Z)) is minimal. In practice computation of p(Y,Z)and p(Y|Z)might be intractable (even though they are well defined).