Country
pLSTM: parallelizable Linear Source Transition Mark networks Korbinian Pöppel 1,2 Richard Freinschlag 1 Thomas Schmied 1 Wei Lin
Modern recurrent architectures, such as xLSTM and Mamba, have recently challenged the Transformer in language modeling. However, their structure constrains their applicability to sequences only or requires processing multi-dimensional data structures, such as images or molecular graphs, in a pre-defined sequential order. In contrast, Multi-Dimensional RNNs (MDRNNs) are well suited for data with a higher level structure, like 2D grids, trees, and directed acyclic graphs (DAGs). In this work, we extend the notion of multi-dimensionality to linear RNNs. We introduce parallelizable Linear Source Transition Mark networks (pLSTMs) using Source, Transition, and Mark gates that act on the linegraph of a general DAG. This enables parallelization in analogy to parallel associative scans and the chunkwise-recurrent form of sequential linear RNNs, but for DAGs. For regular grids (1D and 2D), like images, this scheme can be efficiently implemented using einsum operations, concatenations, and padding in logarithmic time.
FraPPE: Fast and Efficient Preference-based Pure Exploration
Preference-based Pure Exploration (PrePEx) aims to identify with a given confidence level the set of Pareto optimal arms in a vector-valued (aka multi-objective) bandit, where the reward vectors are ordered via a (given) preference cone C. Though PrePEx and its variants are well-studied, there does not exist a computationally efficient algorithm that can optimally track the existing lower bound (Shukla and Basu, 2024) for arbitrary preference cones. We successfully fill this gap by efficiently solving the minimisation and maximisation problems in the lower bound. First, we derive three structural properties of the lower bound that yield a computationally tractable reduction of the minimisation problem. Then, we deploy a Frank-Wolfe optimiser to accelerate the maximisation problem in the lower bound. Together, these techniques solve the maxmin optimisation problem in O(KL2) time for a bandit instance with K arms and L dimensional reward, which is a significant acceleration over the literature. We further prove that our proposed PrePEx algorithm, FraPPE, asymptotically achieves the optimal sample complexity. Finally, we perform numerical experiments across synthetic and real datasets demonstrating that FraPPE achieves the lowest sample complexities to identify the exact Pareto set among the existing algorithms.
Generalization Bound of Gradient Flow through Training Trajectory and Data-dependent Kernel
Gradient-based optimization methods have shown remarkable empirical success, yet their theoretical generalization properties remain only partially understood. In this paper, we establish a generalization bound for gradient flow that aligns with the classical Rademacher complexity bounds for kernel methods-specifically those based on the RKHS norm and kernel trace-through a data-dependent kernel called the loss path kernel (LPK).
FBI says Russian hackers hijacked old Wi-Fi routers
This material may not be published, broadcast, rewritten, or redistributed. Quotes displayed in real-time or delayed by at least 15 minutes. Market data provided by Factset . Powered and implemented by FactSet Digital Solutions . Mutual Fund and ETF data provided by LSEG . Grandparents are identity theft's biggest payday Do not click fake'account recovery' Amazon email Is Apple Intelligence on your iPhone really secure? Americans need protection against'warrantless surveillance': Rep Chip Roy Spencer Pratt's use of AI to boost campaign sparks debate China approves world's first commercial brain chip Kurt Knutsson unveils his top Father's Day gift picks FBI releases list of'most wanted fraudsters' as crackdown continues Fox News Flash top headlines are here.
Accident Anticipation via Temporal Occurrence Prediction
Accident anticipation aims to predict potential collisions in an online manner, enabling timely alerts to enhance road safety. Existing methods typically predict frame-level risk scores as indicators of hazard. However, these approaches rely on ambiguous binary supervision--labeling all frames in accident videos as positive--despite the fact that risk varies continuously over time, leading to unreliable learning and false alarms. To address this, we propose a novel paradigm that shifts the prediction target from current-frame risk scoring to directly estimating accident scores at multiple future time steps (e.g., 0.1s-2.0s
ALMGuard: Safety Shortcuts and Where to Find Them as Guardrails for Audio-Language Models
Recent advances in Audio-Language Models (ALMs) have significantly improved multimodal understanding capabilities. However, the introduction of the audio modality also brings new and unique vulnerability vectors. Previous studies have proposed jailbreak attacks that specifically target ALMs, revealing that defenses directly transferred from traditional audio adversarial attacks or text-based Large Language Model (LLM) jailbreaks are largely ineffective against these ALM-specific threats. To address this issue, we propose ALMGuard, the first defense framework tailored to ALMs. Based on the assumption that safety-aligned shortcuts naturally exist in ALMs, we design a method to identify universal Shortcut Activation Perturbations (SAPs) that serve as triggers that activate the safety shortcuts to safeguard ALMs at inference time. To better sift out effective triggers while preserving the model's utility on benign tasks, we further propose Mel-Gradient Sparse Mask (M-GSM), which restricts perturbations to Mel-frequency bins that are sensitive to jailbreaks but insensitive to speech understanding. Both theoretical analyses and empirical results demonstrate the robustness of our method against both seen and unseen attacks. Overall, ALMGuard reduces the average success rate of advanced ALM-specific jailbreak attacks to 4.6% across four models, while maintaining comparable utility on benign benchmarks, establishing it as the new state of the art.
Hippocampal-like Sequential Editing for Continual Knowledge Updates in Large Language Models
Large language models (LLMs) are now pivotal in real-world applications. Model editing has emerged as a promising paradigm for efficiently modifying LLMs without full retraining. However, current editing approaches face significant limitations due to parameter drift, which stems from inconsistencies between newly edited knowledge and the model's existing knowledge. In sequential editing scenarios, cumulative drifts progressively lead to model collapse characterized by general capability degradation and balance between acquiring new knowledge and catastrophic forgetting of existing knowledge. Drawing inspiration from the hippocampal trisynaptic circuit for continual memorizing and forgetting, we propose a Hippocampal-like Sequential Editing (HSE) framework that designs the unlearning of obsolete knowledge, domain-specific knowledge update separation and replay for edited knowledge. Specifically, the HSE framework designs three core mechanisms: (1) Machine unlearning selectively erases outdated knowledge to facilitate integration of new information, (2) Fisher information matrix-guided parameter updates prevents cross-domain knowledge interference, and (3) Parameter replay consolidates long-term editing memory through lightweight and global replay of editing data in a parametric form. Theoretical analysis demonstrates that HSE achieves smaller generalization error bounds, more stable convergence and higher computational efficiency.
The Persistence of Neural Collapse Despite Low-Rank Bias
Neural collapse (NC) and its multi-layer variant, deep neural collapse (DNC), describe a structured geometry that occurs in the features and weights of trained deep networks. Recent theoretical work by Sukenik et al. using a deep unconstrained feature model (UFM) suggests that DNC is suboptimal under mean squared error (MSE) loss. They heuristically argue that this is due to low-rank bias induced by L2 regularization. In this work, we extend this result to deep UFMs trained with cross-entropy loss, showing that high-rank structures--including DNC--are not generally optimal. We characterize the associated low-rank bias, proving a fixed bound on the number of non-negligible singular values at global minima as network depth increases. We further analyze the loss surface, demonstrating that DNC is more prevalent in the landscape than other critical configurations, which we argue explains its frequent empirical appearance. Our results are validated through experiments in deep UFMs and deep neural networks.