Goto

Collaborating Authors

 simba


we address some of the questions raised by the reviewers as much as time and space allows

Neural Information Processing Systems

First, we thank all the reviewers for their invaluable assessment of our paper in this challenging time. To provide more reliable evidence that AdvFlow's distributional For the sake of completeness, we also add LID [31] The results are given in Table 1. This is indicating that the attacker's distributional properties are fooling the detectors. As seen, we get similar results to Table 2 of the paper, outperforming SimBA in defended baselines. Note that some of the current SOT A results in black-box adversarial attacks come from the attacker's knowledge about the However, once the target changes its training procedure (e.g., from vanilla See the official repo. of SimBA, where it clearly is indicated that the The results of Table 1 and 2 (as well as SVHN) will be added to the camera-ready version.



SupplementaryMaterialsofRandomNoiseDefense againstQuery-BasedBlack-BoxAttacks

Neural Information Processing Systems

In Section A, we talk about the societal impacts of our work In Section B, we provide detailed experimental settings as well as further evaluation results on CIFAR-10 and ImageNet. Forreal-worldapplications,theDNNmodelaswellas the training dataset, are often hidden from users. Extensive experiments verify our theoretical analysis and showtheeffectiveness ofourdefense methods against several state-of-the-art query-based attacks. On ImageNet, [23] released the ResNet-50 model fine-tuned with Gaussian noise sampled from N(0,0.5I)andwedirectlyadoptit. The experimental results on ImageNet are shown in Figure 3 (a-d).


Stable-by-Design Neural Network-Based LPV State-Space Models for System Identification

Sertbaş, Ahmet Eren, Kumbasar, Tufan

arXiv.org Artificial Intelligence

Accurate modeling of nonlinear systems is essential for reliable control, yet conventional identification methods often struggle to capture latent dynamics while maintaining stability. We propose a \textit{stable-by-design LPV neural network-based state-space} (NN-SS) model that simultaneously learns latent states and internal scheduling variables directly from data. The state-transition matrix, generated by a neural network using the learned scheduling variables, is guaranteed to be stable through a Schur-based parameterization. The architecture combines an encoder for initial state estimation with a state-space representer network that constructs the full set of scheduling-dependent system matrices. For training the NN-SS, we develop a framework that integrates multi-step prediction losses with a state-consistency regularization term, ensuring robustness against drift and improving long-horizon prediction accuracy. The proposed NN-SS is evaluated on benchmark nonlinear systems, and the results demonstrate that the model consistently matches or surpasses classical subspace identification methods and recent gradient-based approaches. These findings highlight the potential of stability-constrained neural LPV identification as a scalable and reliable framework for modeling complex nonlinear systems.


SimBA: Simplifying Benchmark Analysis Using Performance Matrices Alone

Subramani, Nishant, Gomez, Alfredo, Diab, Mona

arXiv.org Artificial Intelligence

Modern language models are evaluated on large benchmarks, which are difficult to make sense of, especially for model selection. Looking at the raw evaluation numbers themselves using a model-centric lens, we propose SimBA, a three phase framework to Simplify Benchmark Analysis. The three phases of SimBA are: stalk, where we conduct dataset & model comparisons, prowl, where we discover a representative subset, and pounce, where we use the representative subset to predict performance on a held-out set of models. Applying SimBA to three popular LM benchmarks: HELM, MMLU, and BigBenchLite reveals that across all three benchmarks, datasets and models relate strongly to one another (stalk). We develop an representative set discovery algorithm which covers a benchmark using raw evaluation scores alone. Using our algorithm, we find that with 6.25% (1/16), 1.7% (1/58), and 28.4% (21/74) of the datasets for HELM, MMLU, and BigBenchLite respectively, we achieve coverage levels of at least 95% (prowl). Additionally, using just these representative subsets, we can both preserve model ranks and predict performance on a held-out set of models with near zero mean-squared error (pounce). Taken together, SimBA can help model developers improve efficiency during model training and dataset creators validate whether their newly created dataset differs from existing datasets in a benchmark. Our code is open source, available at https://github.com/nishantsubramani/simba.




Exploring State-Space-Model based Language Model in Music Generation

Lee, Wei-Jaw, Hsieh, Fang-Chih, Chen, Xuanjun, Tsai, Fang-Duo, Yang, Yi-Hsuan

arXiv.org Artificial Intelligence

ABSTRACT The recent surge in State Space Models (SSMs) [8, 9], particularly the emergence of Mamba, has established them as strong alternatives or complementary modules to Transformers across diverse domains. In this work, we aim to explore the potential of Mamba-based architectures for text-to-music generation. We adopt discrete tokens of Residual V ector Quantization (RVQ) as the modeling representation and empirically find that a single-layer code-book can capture semantic information in music. Motivated by this observation, we focus on modeling a single-codebook representation and adapt SiMBA, originally designed as a Mamba-based encoder, to function as a decoder for sequence modeling. We compare its performance against a standard Transformer-based decoder. Our results suggest that, under limited-resource settings, SiMBA achieves much faster convergence and generates outputs closer to the ground truth. This demonstrates the promise of SSMs for efficient and expressive text-to-music generation. We put audio examples on Github.


DeepFake Doctor: Diagnosing and Treating Audio-Video Fake Detection

Klemt, Marcel, Segna, Carlotta, Rohrbach, Anna

arXiv.org Artificial Intelligence

Generative AI advances rapidly, allowing the creation of very realistic manipulated video and audio. This progress presents a significant security and ethical threat, as malicious users can exploit DeepFake techniques to spread misinformation. Recent DeepFake detection approaches explore the multimodal (audio-video) threat scenario. In particular, there is a lack of reproducibility and critical issues with existing datasets - such as the recently uncovered silence shortcut in the widely used FakeAVCeleb dataset. Considering the importance of this topic, we aim to gain a deeper understanding of the key issues affecting benchmarking in audio-video DeepFake detection. We examine these challenges through the lens of the three core benchmarking pillars: datasets, detection methods, and evaluation protocols. To address these issues, we spotlight the recent DeepSpeak v1 dataset and are the first to propose an evaluation protocol and benchmark it using SOTA models. We introduce SImple Multimodal BAseline (SIMBA), a competitive yet minimalistic approach that enables the exploration of diverse design choices. We also deepen insights into the issue of audio shortcuts and present a promising mitigation strategy. Finally, we analyze and enhance the evaluation scheme on the widely used FakeAVCeleb dataset. Our findings offer a way forward in the complex area of audio-video DeepFake detection.


Sparsified State-Space Models are Efficient Highway Networks

Song, Woomin, Tack, Jihoon, Mo, Sangwoo, Oh, Seunghyuk, Shin, Jinwoo

arXiv.org Artificial Intelligence

State-space models (SSMs) offer a promising architecture for sequence modeling, providing an alternative to Transformers by replacing expensive self-attention with linear recurrences. In this paper, we propose a simple yet effective trick to enhance SSMs within given computational budgets by sparsifying them. Our intuition is that tokens in SSMs are highly redundant due to gradual recurrent updates, and dense recurrence operations block the delivery of past information. In particular, we observe that upper layers of SSMs tend to be more redundant as they encode global information, while lower layers encode local information. Motivated by this, we introduce Simba, a hierarchical sparsification method for SSMs based on token pruning. Simba sparsifies upper layers more than lower layers, encouraging the upper layers to behave like highways. To achieve this, we propose a novel token pruning criterion for SSMs, measuring the global impact of tokens on the final output by accumulating local recurrences. We demonstrate that Simba outperforms the baseline model, Mamba, with the same FLOPS in various natural language tasks. Moreover, we illustrate the effect of highways, showing that Simba not only enhances efficiency but also improves the information flow across long sequences. Code is available at https://github.com/woominsong/Simba.