datastore
- Asia > Middle East > Jordan (0.04)
- Asia > Singapore (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (10 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Europe > Austria > Vienna (0.14)
- North America > Dominican Republic (0.04)
- Europe > Bulgaria (0.04)
- (10 more...)
Scaling Retrieval-Based Language Models with a Trillion-Token Datastore
Scaling laws with respect to the amount of training data and the number of parameters allow us to predict the cost-benefit trade-offs of pretraining language models (LMs) in different configurations. In this paper, we consider another dimension of scaling: the amount of data available at inference time. Specifically, we find that increasing the size of the datastore used by a retrieval-based LM monotonically improves language modeling and several downstream tasks without obvious saturation, such that a smaller model augmented with a large datastore outperforms a larger LM-only model on knowledge-intensive tasks. By plotting compute-optimal scaling curves with varied datastore, model, and pretraining data sizes, we show that using larger datastores can significantly improve model performance for the same training compute budget. We carry out our study by constructing a 1.4 trillion-token datastore named MassiveDS, which is the largest and the most diverse open-sourced datastore for retrieval-based LMs to date, and designing an efficient pipeline for studying datastore scaling in an accessible manner. Finally, we analyze the effect of improving the retriever, datastore quality filtering, and other design choices on our observed scaling trends. Overall, our results show that datastore size should be considered as an integral part of LM efficiency and performance trade-offs.
Text-Only Training for Image Captioning with Retrieval Augmentation and Modality Gap Correction
Fonseca, Rui, Martins, Bruno, Rocha, Gil
Image captioning has drawn considerable attention from the natural language processing and computer vision fields. Aiming to reduce the reliance on curated data, several studies have explored image captioning without any humanly-annotated image-text pairs for training, although existing methods are still outperformed by fully supervised approaches. This paper proposes TOMCap, i.e., an improved text-only training method that performs captioning without the need for aligned image-caption pairs. The method is based on prompting a pre-trained language model decoder with information derived from a CLIP representation, after undergoing a process to reduce the modality gap. W e specifically tested the combined use of retrieved examples of captions, and latent vector representations, to guide the generation process. Through extensive experiments, we show that TOMCap outperforms other training-free and text-only methods. W e also analyze the impact of different choices regarding the configuration of the retrieval-augmentation and modality gap reduction components.
- Asia > Middle East > Jordan (0.04)
- Asia > Singapore (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (10 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Europe > Austria > Vienna (0.14)
- North America > Dominican Republic (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- (9 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Communications (0.94)
READER: Retrieval-Assisted Drafter for Efficient LLM Inference
Divilkovskiy, Maxim, Malygin, Vitaly, Zlobin, Sergey, Ilyushin, Stanislav, Isali, Sultan, Kalugin, Vasily, Aitassova, Nuriza, Yi, Fei, Zeng, Weidi
Autoregressive Language Models instantiate a factorized likelihood over token sequences, yet their strictly sequential decoding process imposes an intrinsic lower bound on inference latency. This bottleneck has emerged as a central obstacle to the scalable deployment of large-scale generative models. Existing acceleration techniques partially mitigate token-level latency by relying on auxiliary draft models or introducing an additional training phase, but fail to address the dominant memory and communication costs. We present READER, a provably lossless speculative decoding framework that bypasses the training of the auxiliary draft model. READER formalizes speculative decoding as a stochastic tree construction problem and exploits the empirical redundancy structure of natural language to generate high-probability candidate continuations. Our method revisits the problem of constructing draft trees, establishing substantial statistical improvements over stochastic draft-tree methods and providing a complexity-theoretic analysis that characterizes the optimality frontier of speculative decoding under bounded computation and memory resources. Beyond the single-sequence regime traditionally considered in prior work, we introduce a memory-optimal key-value cache-serving strategy that guarantees amortized sublinear overhead in the batch dimension, allowing READER to scale to realistic inference workloads. Comprehensive experiments demonstrate up to 6.13x wall-clock speedup on single-prompt inference and up to 5.92x on batched inference, consistently surpassing prior speculative decoding baselines, while preserving exact output equivalence, with even more pronounced gains in retrieval-augmented generation pipelines. Our results close a key gap between theoretical parallelism limits and practical LLM inference, suggesting a new standard for efficient deployment.
- North America > Mexico > Mexico City > Mexico City (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Europe > Italy > Tuscany > Florence (0.04)
Rethinking Reasoning in LLMs: Neuro-Symbolic Local RetoMaton Beyond ICL and CoT
Mamidala, Rushitha Santhoshi, Chhabra, Anshuman, Mali, Ankur
Prompt-based reasoning strategies such as Chain-of-Thought (CoT) and In-Context Learning (ICL) have become widely used for eliciting reasoning capabilities in large language models (LLMs). However, these methods rely on fragile, implicit mechanisms often yielding inconsistent outputs across seeds, formats, or minor prompt variations making them fundamentally unreliable for tasks requiring stable, interpretable reasoning. In contrast, automata-based neuro-symbolic frameworks like RetoMaton offer a more structured and trustworthy alternative by grounding retrieval in symbolic memory with deterministic transitions. In this work, we extend RetoMaton by replacing its global datastore with a local, task-adaptive Weighted Finite Automaton (WFA), constructed directly from external domain corpora. This local automaton structure promotes robust, context-aware retrieval while preserving symbolic traceability and low inference overhead. Unlike prompting, which entangles context and memory in opaque ways, our approach leverages the explicit structure of WFAs to provide verifiable and modular retrieval behavior, making it better suited for domain transfer and interoperability. We evaluate this local RetoMaton variant on two pretrained LLMs LLaMA-3.2-1B and Gemma-3-1B-PT across three reasoning tasks: TriviaQA (reading comprehension), GSM8K (multi-step math), and MMLU (domain knowledge). Compared to the base model and prompting-based methods, augmenting these setups with local RetoMaton consistently improves performance while enabling transparent and reproducible retrieval dynamics. Our results highlight a promising shift toward trustworthy, symbolic reasoning in modern LLMs via lightweight, automaton-guided memory.
- Africa > Mali (0.06)
- North America > United States > Florida > Hillsborough County > Tampa (0.04)
- Asia > Singapore (0.04)
- (2 more...)
CONCAP: Seeing Beyond English with Concepts Retrieval-Augmented Captioning
Ibrahim, George, Ramos, Rita, Kementchedjhieva, Yova
Multilingual vision-language models have made significant strides in image captioning, yet they still lag behind their English counterparts due to limited multilingual training data and costly large-scale model parameterization. Retrieval-augmented generation (RAG) offers a promising alternative by conditioning caption generation on retrieved examples in the target language, reducing the need for extensive multilingual training. However, multilingual RAG captioning models often depend on retrieved captions translated from English, which can introduce mismatches and linguistic biases relative to the source language. We introduce CONCAP, a multilingual image captioning model that integrates retrieved captions with image-specific concepts, enhancing the contextualization of the input image and grounding the captioning process across different languages. Experiments on the XM3600 dataset indicate that CONCAP enables strong performance on low- and mid-resource languages, with highly reduced data requirements. Our findings highlight the effectiveness of concept-aware retrieval augmentation in bridging multilingual performance gaps.
- North America > Canada (0.04)
- North America > United States (0.04)
- Europe > Portugal > Lisbon > Lisbon (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
TokenShapley: Token Level Context Attribution with Shapley Value
Xiao, Yingtai, Zhu, Yuqing, Samyoun, Sirat, Zhang, Wanrong, Wang, Jiachen T., Du, Jian
Large language models (LLMs) demonstrate strong capabilities in in-context learning, but verifying the correctness of their generated responses remains a challenge. Prior work has explored attribution at the sentence level, but these methods fall short when users seek attribution for specific keywords within the response, such as numbers, years, or names. To address this limitation, we propose TokenShapley, a novel token-level attribution method that combines Shapley value-based data attribution with KNN-based retrieval techniques inspired by recent advances in KNN-augmented LLMs. By leveraging a precomputed datastore for contextual retrieval and computing Shapley values to quantify token importance, TokenShapley provides a fine-grained data attribution approach. Extensive evaluations on four benchmarks show that TokenShapley outperforms state-of-the-art baselines in token-level attribution, achieving an 11-23% improvement in accuracy.
- Europe > France (0.04)
- North America > Canada > Nova Scotia > Halifax Regional Municipality > Halifax (0.04)
- Asia (0.04)
- Leisure & Entertainment > Sports > Football (1.00)
- Health & Medicine (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.69)