AITopics | sal

Collaborating Authors

sal

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SALS: Sparse Attention in Latent Space for KV Cache Compression

Neural Information Processing SystemsJun-9-2026, 14:35:35 GMT

Large Language Models (LLMs) capable of handling extended contexts are in high demand, yet their inference remains challenging due to substantial Key-Value (KV) cache size and high memory bandwidth requirements. Previous research has demonstrated that KV cache exhibits low-rank characteristics within the hidden dimension, suggesting the potential for effective compression. However, due to the widely adopted Rotary Position Embedding (RoPE) mechanism in modern LLMs, naive low -rank compression suffers severe accuracy degradation or creates a new speed bottleneck, as the low-rank cache must first be reconstructed in order to apply RoPE. In this paper, we introduce two key insights: first, the application of RoPE to the key vectors increases their variance, which in turn results in a higher rank; second, after the key vectors are transformed into the latent space, they largely maintain their representation across most layers. Based on these insights, we propose the Sparse Attention in Latent Space (SALS) framework.

artificial intelligence, large language model, natural language, (8 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.38)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

SupplementaryMaterials: ImprovingDeepLearning InterpretabilitybySaliencyGuidedTraining

Neural Information Processing SystemsFeb-19-2026, 10:23:58 GMT

This would be particularly useful for large datasets like imagenet. Table 2 shows the area under accuracydrop curve(AUC) on MNIST Figure 4for gradient when training traditionally,training using saliencyguided procedure andfine-tuning (smaller AUCindicates better performance).

artificial intelligence, machine learning, trad, (14 more...)

Neural Information Processing Systems

Country: Oceania > Australia > New South Wales > Sydney (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.32)

Add feedback

Unlocking the Power of Boltzmann Machines by Parallelizable Sampler and Efficient Temperature Estimation

Kubo, Kentaro, Goto, Hayato

arXiv.org Machine LearningDec-3-2025

Boltzmann machines (BMs) are powerful energy-based generative models, but their heavy training cost has largely confined practical use to Restricted BMs (RBMs) trained with an efficient learning method called contrastive divergence. More accurate learning typically requires Markov chain Monte Carlo (MCMC) Boltzmann sampling, but it is time-consuming due to the difficulty of parallelization for more expressive models. To address this limitation, we first propose a new Boltzmann sampler inspired by a quantum-inspired combinatorial optimization called simulated bifurcation (SB). This SB-inspired approach, which we name Langevin SB (LSB), enables parallelized sampling while maintaining accuracy comparable to MCMC. Furthermore, this is applicable not only to RBMs but also to BMs with general couplings. However, LSB cannot control the inverse temperature of the output Boltzmann distribution, which hinders learning and degrades performance. To overcome this limitation, we also developed an efficient method for estimating the inverse temperature during the learning process, which we call conditional expectation matching (CEM). By combining LSB and CEM, we establish an efficient learning framework for BMs with greater expressive power than RBMs. We refer to this framework as sampler-adaptive learning (SAL). SAL opens new avenues for energy-based generative modeling beyond RBMs.

boltzmann machine, eff, srbm, (16 more...)

arXiv.org Machine Learning

2512.02323

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

SALS: Sparse Attention in Latent Space for KV cache Compression

Mu, Junlin, Huang, Hantao, Zhang, Jihang, Yu, Minghui, Wang, Tao, Li, Yidong

arXiv.org Artificial IntelligenceOct-29-2025

Large Language Models capable of handling extended contexts are in high demand, yet their inference remains challenging due to substantial Key-Value cache size and high memory bandwidth requirements. Previous research has demonstrated that KV cache exhibits low-rank characteristics within the hidden dimension, suggesting the potential for effective compression. However, due to the widely adopted Rotary Position Embedding mechanism in modern LLMs, naive low-rank compression suffers severe accuracy degradation or creates a new speed bottleneck, as the low-rank cache must first be reconstructed in order to apply RoPE. In this paper, we introduce two key insights: first, the application of RoPE to the key vectors increases their variance, which in turn results in a higher rank; second, after the key vectors are transformed into the latent space, they largely maintain their representation across most layers. Based on these insights, we propose the Sparse Attention in Latent Space framework. SALS projects the KV cache into a compact latent space via low-rank projection, and performs sparse token selection using RoPE-free query-key interactions in this space. By reconstructing only a small subset of important tokens, it avoids the overhead of full KV cache reconstruction. We comprehensively evaluate SALS on various tasks using two large-scale models: LLaMA2-7b-chat and Mistral-7b, and additionally verify its scalability on the RULER-128k benchmark with LLaMA3.1-8B-Instruct. Experimental results demonstrate that SALS achieves SOTA performance by maintaining competitive accuracy. Under different settings, SALS achieves 6.4-fold KV cache compression and 5.7-fold speed-up in the attention operator compared to FlashAttention2 on the 4K sequence. For the end-to-end throughput performance, we achieves 1.4-fold and 4.5-fold improvement compared to GPT-fast on 4k and 32K sequences, respectively.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2510.24273

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Soundness-Aware Level: A Microscopic Signature that Predicts LLM Reasoning Potential

Wu, Xuansheng, Pan, Xiaoman, Yao, Wenlin, Chen, Jianshu

arXiv.org Artificial IntelligenceOct-22-2025

Reinforcement learning with verifiable rewards (RLVR) can elicit strong reasoning in large language models (LLMs), while their performance after RLVR varies dramatically across different base models. This raises a fundamental question: what microscopic property of pre-trained models leads to this variation? To investigate, we formalize reasoning as chains of Horn clauses ("if-then" rules) built from features extracted from the LLM's latent space via cross-layer sparse autoencoders (SAEs). We estimate the transition probabilities between its features, and further categorize each rule by its semantic soundness level (e.g., strict, plausible, noisy) with an LLM. Our key discovery is that high-potential models are inherently soundness-aware: their internal probability distributions systematically shift across rules' soundness levels, becoming highly distinct for "strict" versus "noisy" rules. In contrast, weaker models are soundness-agnostic, collapsing to one distribution regardless of soundness levels. To quantify this, we introduce the Soundness-Aware Level (SAL), a microscopic metric using the Jensen-Shannon Divergence to measure the separation between these distributions. We show that SAL's predictions of post-RLVR reasoning performance follow a precise empirical law (R^2=0.87) across diverse model families (Qwen, Mistral, Llama, DeepSeek) and scales (0.5B-14B). This reveals that a model's reasoning potential is tied to its intrinsic, pre-trained ability to distinguish sound knowledge from unsound ones. These findings underscore the critical role of model pre-training in shaping reasoning and offer a practical metric grounded in the model's internal mechanisms for selecting/designing stronger base models.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2510.15216

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

How Far Are We from True Unlearnability?

Ye, Kai, Su, Liangcai, Qian, Chenxiong

arXiv.org Artificial IntelligenceSep-11-2025

High-quality data plays an indispensable role in the era of large models, but the use of unauthorized data for model training greatly damages the interests of data owners. To overcome this threat, several unlearnable methods have been proposed, which generate unlearnable examples (UEs) by compromising the training availability of data. Clearly, due to unknown training purposes and the powerful representation learning capabilities of existing models, these data are expected to be unlearnable for models across multiple tasks, i.e., they will not help improve the model's performance. However, unexpectedly, we find that on the multi-task dataset Taskonomy, UEs still perform well in tasks such as semantic segmentation, failing to exhibit cross-task unlearnability. This phenomenon leads us to question: How far are we from attaining truly unlearnable examples? We attempt to answer this question from the perspective of model optimization. To this end, we observe the difference in the convergence process between clean and poisoned models using a simple model architecture. Subsequently, from the loss landscape we find that only a part of the critical parameter optimization paths show significant differences, implying a close relationship between the loss landscape and unlearnability. Consequently, we employ the loss landscape to explain the underlying reasons for UEs and propose Sharpness-Aware Learnability (SAL) to quantify the unlearnability of parameters based on this explanation. Furthermore, we propose an Unlearnable Distance (UD) to measure the unlearnability of data based on the SAL distribution of parameters in clean and poisoned models. Finally, we conduct benchmark tests on mainstream unlearnable methods using the proposed UD, aiming to promote community awareness of the capability boundaries of existing unlearnable methods.

artificial intelligence, machine learning, unlearnability, (15 more...)

arXiv.org Artificial Intelligence

2509.08058

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

HRS: Hybrid Representation Framework with Scheduling Awareness for Time Series Forecasting in Crowdsourced Cloud-Edge Platforms

Zhang, Tiancheng, Zhang, Cheng, Liu, Shuren, Wang, Xiaofei, Huang, Shaoyuan, Wang, Wenyu

arXiv.org Artificial IntelligenceAug-20-2025

With the rapid proliferation of streaming services, network load exhibits highly time-varying and bursty behavior, posing serious challenges for maintaining Quality of Service (QoS) in Crowdsourced Cloud-Edge Platforms (CCPs). While CCPs leverage Predict-then-Schedule architecture to improve QoS and profitability, accurate load forecasting remains challenging under traffic surges. Existing methods either minimize mean absolute error, resulting in underprovisioning and potential Service Level Agreement (SLA) violations during peak periods, or adopt conservative overprovision-ing strategies, which mitigate SLA risks at the expense of increased resource expenditure. To address this dilemma, we propose HRS, a H ybrid R epresentation framework with S cheduling awareness that integrates numerical and image-based representations to better capture extreme load dynamics. We further introduce a Scheduling-A ware Loss (SAL) that captures the asymmetric impact of prediction errors, guiding predictions that better support scheduling decisions. Extensive experiments on four real-world datasets demonstrate that HRS consistently outperforms ten baselines and achieves state-of-the-art performance, reducing SLA violation rates by 63.1% and total profit loss by 32.3%. Our code is available at [28].

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2508.12839

Country: Asia > China (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (0.61)

Add feedback

NDF (cars) SAL[4 ] (cars) GT Ours GT Ours GT Ours Steps vs Error

Neural Information Processing SystemsAug-17-2025, 07:58:42 GMT

We thank all reviewers for their useful feedback. Reviewers acknowledge that NDFs are "well founded", "simple and general, which promises wider applicability"[ There is no evidence that it should work competitively on all classes together. We find that NDF outperforms all baselines. "Is there a test/train split, and are quantitative statistics provided on the test We follow the suggestion and added training time. We also transferred key information into the main paper.

artificial intelligence, machine learning, ndf, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.36)

Add feedback

Stress-Aware Resilient Neural Training

Shakarami, Ashkan, Yeganeh, Yousef, Farshad, Azade, Nicole, Lorenzo, Ghidoni, Stefano, Navab, Nassir

arXiv.org Artificial IntelligenceAug-4-2025

This paper introduces Stress-Aware Learning, a resilient neural training paradigm in which deep neural networks dynamically adjust their optimization behavior - whether under stable training regimes or in settings with uncertain dynamics - based on the concept of Temporary (Elastic) and Permanent (Plastic) Deformation, inspired by structural fatigue in materials science. To instantiate this concept, we propose Plastic Deformation Optimizer, a stress-aware mechanism that injects adaptive noise into model parameters whenever an internal stress signal - reflecting stagnation in training loss and accuracy - indicates persistent optimization difficulty. This enables the model to escape sharp minima and converge toward flatter, more generalizable regions of the loss landscape. Experiments across six architectures, four optimizers, and seven vision benchmarks demonstrate improved robustness and generalization with minimal computational overhead. The code and 3D visuals will be available on GitHub: https://github.com/Stress-Aware-Learning/SAL.

intervention, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2508.00098

Country:

North America (0.28)
Europe (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Modal Logic for Stratified Becoming: Actualization Beyond Possible Worlds

Nepvou, Alexandre Le

arXiv.org Artificial IntelligenceJun-24-2025

This article develops a novel framework for modal logic based on the idea of stratified actualization, rather than the classical model of global possible worlds. Traditional Kripke semantics treat modal operators as quantification over fully determinate alternatives, neglecting the local, dynamic, and often asymmetric nature of actualization processes. We propose a system Stratified Actualization Logic (SAL) in which modalities are indexed by levels of ontological stability, interpreted as admissibility regimes. Each modality operates over a structured layer of possibility, grounded in the internal coherence of transitions between layers. We formally define the syntax and semantics of SAL, introduce its axioms, and prove soundness and completeness. Applications are discussed in connection with temporal becoming, quantum decoherence domains, and modal metaphysics. The result is a logic that captures the ontological structure of actualization without recourse to abstract possible worlds, offering a stratified alternative to standard modal realism.

artificial intelligence, logic & formal reasoning, logic programming, (17 more...)

arXiv.org Artificial Intelligence

2506.17276

Country: Europe (0.28)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)

Add feedback