AITopics | sparse policy

Collaborating Authors

sparse policy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Fat-to-Thin Policy Optimization: Offline RL with Sparse Policies

Zhu, Lingwei, Wang, Han, Nagai, Yukie

arXiv.org Artificial IntelligenceJan-24-2025

Sparse continuous policies are distributions that can choose some actions at random yet keep strictly zero probability for the other actions, which are radically different from the Gaussian. They have important real-world implications, e.g. in modeling safety-critical tasks like medicine. The combination of offline reinforcement learning and sparse policies provides a novel paradigm that enables learning completely from logged datasets a safety-aware sparse policy. However, sparse policies can cause difficulty with the existing offline algorithms which require evaluating actions that fall outside of the current support. In this paper, we propose the first offline policy optimization algorithm that tackles this challenge: Fat-to-Thin Policy Optimization (FtTPO). Specifically, we maintain a fat (heavy-tailed) proposal policy that effectively learns from the dataset and injects knowledge to a thin (sparse) policy, which is responsible for interacting with the environment. We instantiate FtTPO with the general $q$-Gaussian family that encompasses both heavy-tailed and sparse policies and verify that it performs favorably in a safety-critical treatment simulation and the standard MuJoCo suite. Our code is available at \url{https://github.com/lingweizhu/fat2thin}.

machine learning, reinforcement learning, sparse policy, (15 more...)

arXiv.org Artificial Intelligence

2501.14373

Country:

North America > Canada > Alberta (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
North America > United States > New York (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Optimal Policy Sparsification and Low Rank Decomposition for Deep Reinforcement Learning

Goddla, Vikram

arXiv.org Artificial IntelligenceMar-10-2024

Deep reinforcement learning(DRL) has shown significant promise in a wide range of applications including computer games and robotics. Yet, training DRL policies consume extraordinary computing resources resulting in dense policies which are prone to overfitting. Moreover, inference with dense DRL policies limit their practical applications, especially in edge computing. Techniques such as pruning and singular value decomposition have been used with deep learning models to achieve sparsification and model compression to limit overfitting and reduce memory consumption. However, these techniques resulted in sub-optimal performance with notable decay in rewards. $L_1$ and $L_2$ regularization techniques have been proposed for neural network sparsification and sparse auto-encoder development, but their implementation in DRL environments has not been apparent. We propose a novel $L_0$-norm-regularization technique using an optimal sparsity map to sparsify DRL policies and promote their decomposition to a lower rank without decay in rewards. We evaluated our $L_0$-norm-regularization technique across five different environments (Cartpole-v1, Acrobat-v1, LunarLander-v2, SuperMarioBros-7.1.v0 and Surgical Robot Learning) using several on-policy and off-policy algorithms. We demonstrated that the $L_0$-norm-regularized DRL policy in the SuperMarioBros environment achieved 93% sparsity and gained 70% compression when subjected to low-rank decomposition, while significantly outperforming the dense policy. Additionally, the $L_0$-norm-regularized DRL policy in the Surgical Robot Learning environment achieved a 36% sparsification and gained 46% compression when decomposed to a lower rank, while being performant. The results suggest that our custom $L_0$-norm-regularization technique for sparsification of DRL policies is a promising avenue to reduce computational resources and limit overfitting.

reinforcement, sparse policy, sparsication, (17 more...)

arXiv.org Artificial Intelligence

2403.06313

Country: North America > United States > California > Los Angeles County > Beverly Hills (0.04)

Genre: Research Report > New Finding (0.87)

Industry:

Health & Medicine (0.93)
Leisure & Entertainment > Games > Computer Games (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference

Dong, Harry, Yang, Xinyu, Zhang, Zhenyu, Wang, Zhangyang, Chi, Yuejie, Chen, Beidi

arXiv.org Artificial IntelligenceFeb-14-2024

Many computational factors limit broader deployment of large language models. In this paper, we focus on a memory bottleneck imposed by the key-value (KV) cache, a computational shortcut that requires storing previous KV pairs during decoding. While existing KV cache methods approach this problem by pruning or evicting large swaths of relatively less important KV pairs to dramatically reduce the memory footprint of the cache, they can have limited success in tasks that require recollecting a majority of previous tokens. To alleviate this issue, we propose LESS, a simple integration of a (nearly free) constant sized cache with eviction-based cache methods, such that all tokens can be queried at later decoding steps. Its ability to retain information throughout time shows merit on a variety of tasks where we demonstrate LESS can help reduce the performance gap from caching everything, sometimes even matching it, all while being efficient.

arxiv preprint arxiv, cache, sparse policy, (14 more...)

arXiv.org Artificial Intelligence

2402.09398

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback