spt
Revisiting the Sliced Wasserstein Kernel for persistence diagrams: a Figalli-Gigli approach
The Sliced Wasserstein Kernel (SWK) for persistence diagrams was introduced in (Carri{è}re et al. 2017) as a powerful tool to implicitly embed persistence diagrams in a Hilbert space with reasonable distortion. This kernel is built on the intuition that the Figalli-Gigli distance-that is the partial matching distance routinely used to compare persistence diagrams-resembles the Wasserstein distance used in the optimal transport literature, and that the later could be sliced to define a positive definite kernel on the space of persistence diagrams. This efficient construction nonetheless relies on ad-hoc tweaks on the Wasserstein distance to account for the peculiar geometry of the space of persistence diagrams. In this work, we propose to revisit this idea by directly using the Figalli-Gigli distance instead of the Wasserstein one as the building block of our kernel. On the theoretical side, our sliced Figalli-Gigli kernel (SFGK) shares most of the important properties of the SWK of Carri{è}re et al., including distortion results on the induced embedding and its ease of computation, while being more faithful to the natural geometry of persistence diagrams. In particular, it can be directly used to handle infinite persistence diagrams and persistence measures. On the numerical side, we show that the SFGK performs as well as the SWK on benchmark applications.
stable-pretraining-v1: Foundation Model Research Made Simple
Balestriero, Randall, Van Assel, Hugues, BuGhanem, Sami, Maes, Lucas
Foundation models and self-supervised learning (SSL) have become central to modern AI, yet research in this area remains hindered by complex codebases, redundant re-implementations, and the heavy engineering burden of scaling experiments. We present stable-pretraining, a modular, extensible, and performance-optimized library built on top of PyTorch, Lightning, Hugging Face, and TorchMetrics. Unlike prior toolkits focused narrowly on reproducing state-of-the-art results, stable-pretraining is designed for flexibility and iteration speed: it unifies essential SSL utilities--including probes, collapse detection metrics, augmentation pipelines, and extensible evaluation routines--within a coherent and reliable framework. A central design principle is logging everything, enabling fine-grained visibility into training dynamics that makes debugging, monitoring, and reproducibility seamless. We validate the library by demonstrating its ability to generate new research insights with minimal overhead, including depthwise representation probing and the analysis of CLIP degradation under synthetic data finetuning. By lowering barriers to entry while remaining scalable to large experiments, stable-pretraining aims to accelerate discovery and expand the possibilities of foundation model research.
Single-Pixel Tactile Skin via Compressive Sampling
Slepyan, Ariel, Xing, Laura, Zhang, Rudy, Thakor, Nitish
Development of large-area, high-speed electronic skins is a grand challenge for robotics, prosthetics, and human-machine interfaces, but is fundamentally limited by wiring complexity and data bottlenecks. Here, we introduce Single-Pixel Tactile Skin (SPTS), a paradigm that uses compressive sampling to reconstruct rich tactile information from an entire sensor array via a single output channel. This is achieved through a direct circuit-level implementation where each sensing element, equipped with a miniature microcontroller, contributes a dynamically weighted analog signal to a global sum, performing distributed compressed sensing in hardware. Our flexible, daisy-chainable design simplifies wiring to a few input lines and one output, and significantly reduces measurement requirements compared to raster scanning methods. We demonstrate the system's performance by achieving object classification at an effective 3500 FPS and by capturing transient dynamics, resolving an 8 ms projectile impact into 23 frames. A key feature is the support for adaptive reconstruction, where sensing fidelity scales with measurement time. This allows for rapid contact localization using as little as 7% of total data, followed by progressive refinement to a high-fidelity image - a capability critical for responsive robotic systems. This work offers an efficient pathway towards large-scale tactile intelligence for robotics and human-machine interfaces.
A LoD of Gaussians: Unified Training and Rendering for Ultra-Large Scale Reconstruction with External Memory
Windisch, Felix, Köhler, Thomas, Radl, Lukas, Steiner, Michael, Schmalstieg, Dieter, Steinberger, Markus
Gaussian Splatting has emerged as a high-performance technique for novel view synthesis, enabling real-time rendering and high-quality reconstruction of small scenes. However, scaling to larger environments has so far relied on partitioning the scene into chunks -- a strategy that introduces artifacts at chunk boundaries, complicates training across varying scales, and is poorly suited to unstructured scenarios such as city-scale flyovers combined with street-level views. Moreover, rendering remains fundamentally limited by GPU memory, as all visible chunks must reside in VRAM simultaneously. We introduce A LoD of Gaussians, a framework for training and rendering ultra-large-scale Gaussian scenes on a single consumer-grade GPU -- without partitioning. Our method stores the full scene out-of-core (e.g., in CPU memory) and trains a Level-of-Detail (LoD) representation directly, dynamically streaming only the relevant Gaussians. A hybrid data structure combining Gaussian hierarchies with Sequential Point Trees enables efficient, view-dependent LoD selection, while a lightweight caching and view scheduling system exploits temporal coherence to support real-time streaming and rendering. Together, these innovations enable seamless multi-scale reconstruction and interactive visualization of complex scenes -- from broad aerial views to fine-grained ground-level details.
Supplementary Material for Certified Defense to Image Transformations via Randomized Smoothing A Proof of Theorem 3.2
We now proceed to proof Theorem 3.2. Next we show that Eq. (15) holds. Does there exist a t such that both upper bound coincide? We now show Theorem 3.2 (restarted below): Setting Theorem 3.2 up to the last sentence, which in turn is a direct consequence of Lemma 2. Theorem In this section, we elaborate on the details of Step 2 in Section 6. Because we don't have any constraints for the pixel values Here, we present the algorithm used to compute the inverse of a transformation.
Lower Bounds on Adversarial Robustness for Multiclass Classification with General Loss Functions
Trillos, Camilo Andrés García, Trillos, Nicolás García
We consider adversarially robust classification in a multiclass setting under arbitrary loss functions and derive dual and barycentric reformulations of the corresponding learner-agnostic robust risk minimization problem. We provide explicit characterizations for important cases such as the cross-entropy loss, loss functions with a power form, and the quadratic loss, extending in this way available results for the 0-1 loss. These reformulations enable efficient computation of sharp lower bounds for adversarial risks and facilitate the design of robust classifiers beyond the 0-1 loss setting. Our paper uncovers interesting connections between adversarial robustness, $α$-fair packing problems, and generalized barycenter problems for arbitrary positive measures where Kullback-Leibler and Tsallis entropies are used as penalties. Our theoretical results are accompanied with illustrative numerical experiments where we obtain tighter lower bounds for adversarial risks with the cross-entropy loss function.
Adapting Whisper for Parameter-efficient Code-Switching Speech Recognition via Soft Prompt Tuning
Yang, Hongli, Peng, Yizhou, Huang, Hao, Li, Sheng
Large-scale multilingual ASR models like Whisper excel in high-resource settings but face challenges in low-resource scenarios, such as rare languages and code-switching (CS), due to computational costs and catastrophic forgetting. We explore Soft Prompt Tuning (SPT), a parameter-efficient method to enhance CS ASR while preserving prior knowledge. We evaluate two strategies: (1) full fine-tuning (FFT) of both soft prompts and the entire Whisper model, demonstrating improved cross-lingual capabilities compared to traditional methods, and (2) adhering to SPT's original design by freezing model parameters and only training soft prompts. Additionally, we introduce SPT4ASR, a combination of different SPT variants. Experiments on the SEAME and ASRU2019 datasets show that deep prompt tuning is the most effective SPT approach, and our SPT4ASR methods achieve further error reductions in CS ASR, maintaining parameter efficiency similar to LoRA, without degrading performance on existing languages.