Goto

Collaborating Authors

 asa


The Exponentially Weighted Signature

Bloch, Alexandre, Cohen, Samuel N., Lyons, Terry, Mouterde, Joël, Walker, Benjamin

arXiv.org Machine Learning

The signature is a canonical representation of a multidimensional path over an interval. However, it treats all historical information uniformly, offering no intrinsic mechanism for contextualising the relevance of the past. To address this, we introduce the Exponentially Weighted Signature (EWS), generalising the Exponentially Fading Memory (EFM) signature from diagonal to general bounded linear operators. These operators enable cross-channel coupling at the level of temporal weighting together with richer memory dynamics including oscillatory, growth, and regime-dependent behaviour, while preserving the algebraic strengths of the classical signature. We show that the EWS is the unique solution to a linear controlled differential equation on the tensor algebra, and that it generalises both state-space models and the Laplace and Fourier transforms of the path. The group-like structure of the EWS enables efficient computation and makes the framework amenable to gradient-based learning, with the full semigroup action parametrised by and learned through its generator. We use this framework to empirically demonstrate the expressivity gap between the EWS and both the signature and EFM on two SDE-based regression tasks.


Disney advert banned for showing 'disturbing' severed body

BBC News

Disney advert banned for showing'disturbing' severed body A menacing Disney advert featuring a severed body has been banned by the advertising regulator, which said it was likely to frighten and cause distress to children. The Advertising Standards Authority (ASA) found the entertainment giant had broken its rules with its advert for the Predator Badlands film. Parents complained that the digital poster, which featured a large alien holding aloft the severed body of a smaller, human figure, was inappropriate and disturbing for young children. Disney said the severed body was actually that of a robot, and the fact it had been cut in two further emphasised its non-human nature. The advert, which was seen on the roadside in Giffnock, Glasgow, was promoting the Disney sci-fi film ahead of its release in November.



Nike, Superdry and Lacoste ads banned over misleading green claims

BBC News

Adverts for Nike, Superdry and Lacoste have been banned for making misleading claims about their green credentials. The UK's advertising watchdog challenged the brands over the use of the word sustainable in paid-for Google ads which were not backed up by evidence of their sustainability. The Advertising Standards Authority (ASA) identified three adverts from the retailers promising customers sustainable materials, sustainable style and sustainable clothing. The UK's advertising code states that the basis of claims about environmental sustainability must be clear and supported by a high level of substantiation. In each case, it asked the companies for evidence to back up the claims about the sustainability of the products.


Teaching According to Students' Aptitude: Personalized Mathematics Tutoring via Persona-, Memory-, and Forgetting-Aware LLMs

Wu, Yang, Yao, Rujing, Zhang, Tong, Shi, Yufei, Jiang, Zhuoren, Li, Zhushan, Liu, Xiaozhong

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are increasingly integrated into intelligent tutoring systems to provide human-like and adaptive instruction. However, most existing approaches fail to capture how students' knowledge evolves dynamically across their proficiencies, conceptual gaps, and forgetting patterns. This challenge is particularly acute in mathematics tutoring, where effective instruction requires fine-grained scaffolding precisely calibrated to each student's mastery level and cognitive retention. To address this issue, we propose T ASA (Teaching According to Students' Aptitude), a student-aware tutoring framework that integrates persona, memory, and forgetting dynamics for personalized mathematics learning. Specifically, T ASA maintains a structured student persona capturing proficiency profiles and an event memory recording prior learning interactions. By incorporating a continuous forgetting curve with knowledge tracing, T ASA dynamically updates each student's mastery state and generates contextually appropriate, difficulty-calibrated questions and explanations. Empirical results demonstrate that T ASA achieves superior learning outcomes and more adaptive tutoring behavior compared to representative baselines, underscoring the importance of modeling temporal forgetting and learner profiles in LLM-based tutoring systems.


Hotel adverts banned over misleadingly cheap rooms

BBC News

Adverts by four of Britain's biggest hotel and travel firms have been banned for stating misleading minimum prices for rooms. The Advertising Standards Authority (ASA) upheld complaints against the Hilton hotel group, Travelodge, Booking.com and Accor over their use of eye-catching so-called from prices. The watchdog found only a small number of rooms actually available to book at the promoted price and concluded the adverts overstated the deals. It said this was unfair on those looking for good deals or seeking to make informed choices about where to book. ASA operations manager Emily Henwood said: Advertised prices must match what's really available.


Mixture-of-Schedulers: An Adaptive Scheduling Agent as a Learned Router for Expert Policies

Wang, Xinbo, Jia, Shian, Huang, Ziyang, Cao, Jing, Song, Mingli

arXiv.org Artificial Intelligence

Modern operating system schedulers employ a single, static policy, which struggles to deliver optimal performance across the diverse and dynamic workloads of contemporary systems. This "one-policy-fits-all" approach leads to significant compromises in fairness, throughput, and latency, particularly with the rise of heterogeneous hardware and varied application architectures. This paper proposes a new paradigm: dynamically selecting the optimal policy from a portfolio of specialized schedulers rather than designing a single, monolithic one. We present the Adaptive Scheduling Agent (ASA), a lightweight framework that intelligently matches workloads to the most suitable "expert" scheduling policy at runtime. ASA's core is a novel, low-overhead offline/online approach. First, an offline process trains a universal, hardware-agnostic machine learning model to recognize abstract workload patterns from system behaviors. Second, at runtime, ASA continually processes the model's predictions using a time-weighted probability voting algorithm to identify the workload, then makes a scheduling decision by consulting a pre-configured, machine-specific mapping table to switch to the optimal scheduler via Linux's sched_ext framework. This decoupled architecture allows ASA to adapt to new hardware platforms rapidly without expensive retraining of the core recognition model. Our evaluation, based on a novel benchmark focused on user-experience metrics, demonstrates that ASA consistently outperforms the default Linux scheduler (EEVDF), achieving superior results in 86.4% of test scenarios. Furthermore, ASA's selections are near-optimal, ranking among the top three schedulers in 78.6% of all scenarios. This validates our approach as a practical path toward more intelligent, adaptive, and responsive operating system schedulers.


Optimizing Native Sparse Attention with Latent Attention and Local Global Alternating Strategies

Hu, Yuxuan, Tan, Jianchao, Zhang, Jiaqi, Zan, Wen, Sun, Pingwei, Lu, Yifan, Sun, Yerui, Xie, Yuchen, Cai, Xunliang, Zhang, Jing

arXiv.org Artificial Intelligence

In this work, we conduct a systematic analysis of Native Sparse Attention (NSA) and propose targeted improvements that enhance long-context modeling. A key insight is that alternating between local (sliding-window) and global (compression, selective) attention across layers, rather than using fixed patterns, enables more effective propagation of long-range dependencies and substantially boosts performance on long-sequence tasks. Meanwhile, we further refine NSA's branches with Latent Attention that the sliding-window branch is enhanced with Multi-head Latent Attention (MLA) while compression and selective branches adopt Group-head Latent Attention (GLA). These changes reduce KV-cache memory by 50\% versus NSA while improving the model's common-sense reasoning and long-text understanding capabilities. Experiments on models from 340M to 1.3B parameters (trained on 15B and 100B tokens) show our method matches or exceeds full attention and native sparse attention in both common-sense reasoning and long-context understanding tasks.



BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation

Gu, Youping, Li, Xiaolong, Hu, Yuhao, Chen, Minqi, Zhuang, Bohan

arXiv.org Artificial Intelligence

Diffusion Transformers currently lead the field in high-quality video generation, but their slow iterative denoising process and prohibitive quadratic attention costs for long sequences create significant inference bottlenecks. While both step distillation and sparse attention mechanisms have shown promise as independent acceleration strategies, effectively combining these approaches presents critical challenges--training-free integration yields suboptimal results, while separately training sparse attention after step distillation requires prohibitively expensive high-quality video data. To overcome these limitations, we propose BLADE, an innovative data-free joint training framework that introduces: (1) an Adaptive Block-Sparse Attention (ASA) mechanism for dynamically generating content-aware sparsity masks to focus computation on salient spatiotemporal features, and (2) a sparsity-aware step distillation paradigm, built upon Trajectory Distribution Matching (TDM), directly incorporates sparsity into the distillation process rather than treating it as a separate compression step and features fast convergence. We validate BLADE on text-to-video models like CogVideoX-5B and Wan2.1-1.3B, and our framework demonstrates remarkable efficiency gains across different scales. BLADE achieves a 14.10 end-to-end inference acceleration over a 50-step baseline. Moreover, on models such as CogVideoX-5B with short video sequence lengths, our framework delivers a robust 8.89 speedup. Crucially, the acceleration is accompanied by a consistent quality improvement. Project is available at http://ziplab.co/BLADE-Homepage/. Diffusion models have emerged as the state-of-the-art for a wide array of generative tasks (Dhariwal & Nichol, 2021), achieving unprecedented quality in image synthesis (Cao et al., 2024; Esser et al., 2024; Labs et al., 2025) and now pushing the frontier in the complex domain of video generation (Blattmann et al., 2023; Xing et al., 2024).