AITopics | cach

Collaborating Authors

cach

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LeadCache: Regret-OptimalCachinginNetworks

Neural Information Processing SystemsFeb-7-2026, 21:04:27 GMT

We consider an online prediction problem in the context of network caching.

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Asia > India (0.14)
North America > United States (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)

Add feedback

How squirrels actually find all their buried nuts

Popular ScienceNov-7-2025, 14:00:00 GMT

Every fall, squirrels hide hundreds of acorns--and use smell, memory, and even theft to get them back. Every fall, squirrels stash hundreds of nuts to survive the colder winter months. Breakthroughs, discoveries, and DIY tips sent every weekday. As someone who routinely "hides" things from myself--car keys, receipts, even my phone while I'm actively talking on it--I felt instantly validated by Sarah Silverman's joke that squirrels forget where they bury 80% of their nuts. "And that's how trees are planted!"

artificial intelligence, perlut, squirrel, (15 more...)

Popular Science

Country:

North America > United States > New Jersey (0.05)
Asia > Middle East > UAE > Dubai Emirate > Dubai (0.05)

Genre: Research Report (0.35)

Industry:

Retail (0.70)
Media > Photography (0.48)

Technology: Information Technology > Artificial Intelligence (0.50)

Add feedback

Improving the Serving Performance of Multi-LoRA Large Language Models via Efficient LoRA and KV Cache Management

Zhang, Hang, Shi, Jiuchen, Wang, Yixiao, Chen, Quan, Shan, Yizhou, Guo, Minyi

arXiv.org Artificial IntelligenceMay-8-2025

Multiple Low-Rank Adapters (Multi-LoRAs) are gaining popularity for task-specific Large Language Model (LLM) applications. For multi-LoRA serving, caching hot KV caches and LoRA adapters in high bandwidth memory of accelerations can improve inference performance. However, existing Multi-LoRA inference systems fail to optimize serving performance like Time-To-First-Toke (TTFT), neglecting usage dependencies when caching LoRAs and KVs. We therefore propose FASTLIBRA, a Multi-LoRA caching system to optimize the serving performance. FASTLIBRA comprises a dependency-aware cache manager and a performance-driven cache swapper. The cache manager maintains the usage dependencies between LoRAs and KV caches during the inference with a unified caching pool. The cache swapper determines the swap-in or out of LoRAs and KV caches based on a unified cost model, when the HBM is idle or busy, respectively. Experimental results show that ELORA reduces the TTFT by 63.4% on average, compared to state-of-the-art works.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.03756

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CacheFocus: Dynamic Cache Re-Positioning for Efficient Retrieval-Augmented Generation

Lee, Kun-Hui, Park, Eunhwan, Han, Donghoon, Na, Seung-Hoon

arXiv.org Artificial IntelligenceFeb-16-2025

Large Language Models (LLMs) excel across a variety of language tasks yet are constrained by limited input lengths and high computational costs. Existing approaches\textemdash such as relative positional encodings (e.g., RoPE, ALiBi) and sliding window mechanisms\textemdash partially alleviate these issues but often require additional training or suffer from performance degradation with longer inputs. In this paper, we introduce \textbf{\textit{CacheFocus}}, a method that enhances length normalization and reduces inference latency without any further training. Our approach leverages query-independent, offline caching to efficiently reuse a Context KV Cache Store. We address the amplification of abnormal token distributions problem by re-positioning cached keys and introducing Layer-Adaptive Cache Pruning to discard low-relevance caches during pre-filling. Additionally, our Adaptive Positional Allocation Strategy dynamically reassigns cache positions to maximize the use of the available positional encoding range. Experiments on the Natural Questions and TriviaQA datasets demonstrate that CacheFocus outperforms alternative methods even when inputs exceed the $4$K limit of the \texttt{LLaMA-2} model, emphasizing its practical effectiveness for long-context LLMs. Moreover, even with large maximum input length of \texttt{Qwen2}, the performance of CacheFocus shows that it maintains consistent performance even as the number of documents increases, effectively managing long-text generation without degradation.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.11101

Country:

North America > United States (0.14)
Asia > Thailand > Bangkok > Bangkok (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Parallel Key-Value Cache Fusion for Position Invariant RAG

Oh, Philhoon, Shin, Jinwoo, Thorne, James

arXiv.org Artificial IntelligenceJan-13-2025

Recent advancements in Large Language Models (LLMs) underscore the necessity of Retrieval Augmented Generation (RAG) to leverage external information. However, LLMs are sensitive to the position of relevant information within contexts and tend to generate incorrect responses when such information is placed in the middle, known as `Lost in the Middle' phenomenon. In this paper, we introduce a framework that generates consistent outputs for decoder-only models, irrespective of the input context order. Experimental results for three open domain question answering tasks demonstrate position invariance, where the model is not sensitive to input context order, and superior robustness to irrelevent passages compared to prevailing approaches for RAG pipelines.

computational linguistic, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2501.07523

Country:

South America > Argentina (0.05)
Asia > Thailand > Bangkok > Bangkok (0.04)
North America > United States > New York > New York County > New York City (0.04)
(8 more...)

Genre:

Research Report (0.64)
Personal > Honors (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Compressed Sensor Caching and Collaborative Sparse Data Recovery with Anchor Alignment

Yang, Yi-Jen, Yang, Ming-Hsun, Wu, Jwo-Yuh, Hong, Y. -W. Peter

arXiv.org Artificial IntelligenceJun-14-2024

This work examines the compressed sensor caching problem in wireless sensor networks and devises efficient distributed sparse data recovery algorithms to enable collaboration among multiple caches. In this problem, each cache is only allowed to access measurements from a small subset of sensors within its vicinity to reduce both cache size and data acquisition overhead. To enable reliable data recovery with limited access to measurements, we propose a distributed sparse data recovery method, called the collaborative sparse recovery by anchor alignment (CoSR-AA) algorithm, where collaboration among caches is enabled by aligning their locally recovered data at a few anchor nodes. The proposed algorithm is based on the consensus alternating direction method of multipliers (ADMM) algorithm but with message exchange that is reduced by considering the proposed anchor alignment strategy. Then, by the deep unfolding of the ADMM iterations, we further propose the Deep CoSR-AA algorithm that can be used to significantly reduce the number of iterations. We obtain a graph neural network architecture where message exchange is done more efficiently by an embedded autoencoder. Simulations are provided to demonstrate the effectiveness of the proposed collaborative recovery algorithms in terms of the improved reconstruction quality and the reduced communication overhead due to anchor alignment.

algorithm, cach, sensor, (15 more...)

arXiv.org Artificial Intelligence

2406.10137

Country: Asia > Taiwan (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Communications > Networks > Sensor Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

You Only Cache Once: Decoder-Decoder Architectures for Language Models

Sun, Yutao, Dong, Li, Zhu, Yi, Huang, Shaohan, Wang, Wenhui, Ma, Shuming, Zhang, Quanlu, Wang, Jianyong, Wei, Furu

arXiv.org Artificial IntelligenceMay-9-2024

We introduce a decoder-decoder architecture, YOCO, for large language models, which only caches key-value pairs once. It consists of two components, i.e., a cross-decoder stacked upon a self-decoder. The self-decoder efficiently encodes global key-value (KV) caches that are reused by the cross-decoder via cross-attention. The overall model behaves like a decoder-only Transformer, although YOCO only caches once. The design substantially reduces GPU memory demands, yet retains global attention capability. Additionally, the computation flow enables prefilling to early exit without changing the final output, thereby significantly speeding up the prefill stage. Experimental results demonstrate that YOCO achieves favorable performance compared to Transformer in various settings of scaling up model size and number of training tokens. We also extend YOCO to 1M context length with near-perfect needle retrieval accuracy. The profiling results show that YOCO improves inference memory, prefill latency, and throughput by orders of magnitude across context lengths and model sizes. Code is available at https://aka.ms/YOCO.

arxiv preprint arxiv, language model, transformer, (14 more...)

arXiv.org Artificial Intelligence

2405.05254

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > China > Guangxi Province > Nanning (0.04)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Alfresco Repository Caches Unfolded

#artificialintelligenceJul-7-2022, 11:00:39 GMT

Alfresco repository Caches optimisation can have significant impact on the performance of your Alfresco deployment. This post provides an overview on how the repository caches are implemented by Alfresco. The Alfresco repository leverages and provides in-memory caches. Memory caching (often simply referred to as caching) is a technique in which computer applications temporarily store data in a computer's main memory (i.e., random access memory, or RAM) to enable fast retrievals of that data. The RAM that is used for the temporary storage is known as the cache.

cach, cache, repository cach, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence (0.68)
Information Technology > Hardware > Memory (0.34)

Add feedback

Research Paves the Way for Honey-Based Neuromorphic Computing

#artificialintelligenceApr-8-2022, 21:21:06 GMT

Researchers at Washington State University have built a proof-of-concept device that includes one of the crucial circuits for neuromorphic computing - the memristor - built out of an unlikely medium: honey. The researchers hope their research paves the way for biodegradable, sustainable, organic-based computing systems that are orders of magnitude more efficient than conventional computing architectures. To build the device, the researchers processed true, bee-sourced honey into a solid form held between two metal electrodes, much like how your brain's synapses lay between pairs of neuron. The device was then tested for its ability to quickly switch on and off at speeds ranging between their biological counterparts' 100 and 500 nanoseconds - and it succeeded. "This is a very small device with a simple structure, but it has very similar functionalities to a human neuron," said Feng Zhao, associate professor of WSU's School of Engineering and Computer Science, in the announcement.

computing element, honey-based neuromorphic computing, neuromorphic system, (12 more...)

#artificialintelligence

Country:

North America > United States > Washington (0.25)
Asia > Japan (0.05)

Technology: Information Technology > Artificial Intelligence (0.31)

Add feedback

Birds get angry when their favourite snacks are swapped in magic trick

New ScientistAug-18-2021, 00:01:59 GMT

Jays react angrily when shown a cup-and-balls-style magic trick in which their favourite snack is swapped for a less appealing one. Their responses show cognitive abilities that may come into play when they pilfer food caches hidden by other birds. Eurasian jays (Garrulus glandarius) have impressive memories and show some capacity for imagining the beliefs and intentions of others, known as theory of mind. As such, Alexandra Schnell and her colleagues at the University of Cambridge wondered whether jays would be sensitive to cognitive illusions designed to fool humans. First, they tested six birds to find out which food each one preferred from a choice of worms, cheese and peanuts.

favourite snack, magic trick, schnell, (2 more...)

New Scientist

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.26)

Industry:

Leisure & Entertainment (0.75)
Health & Medicine > Consumer Health (0.37)

Technology: Information Technology > Artificial Intelligence > Cognitive Science (0.57)

Add feedback