AITopics | Kadekodi, Rohan

Collaborating Authors

Kadekodi, Rohan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval

Lin, Chien-Yu, Kamahori, Keisuke, Liu, Yiyu, Shi, Xiaoxiang, Kashyap, Madhav, Gu, Yile, Shao, Rulin, Ye, Zihao, Zhu, Kan, Wang, Stephanie, Krishnamurthy, Arvind, Kadekodi, Rohan, Ceze, Luis, Kasikci, Baris

arXiv.org Artificial IntelligenceFeb-28-2025

Retrieval-augmented generation (RAG) extends large language models (LLMs) with external data sources to enhance factual correctness and domain coverage. Modern RAG pipelines rely on large datastores, leading to system challenges in latency-sensitive deployments, especially when limited GPU memory is available. To address these challenges, we propose TeleRAG, an efficient inference system that reduces RAG latency with minimal GPU memory requirements. The core innovation of TeleRAG is lookahead retrieval, a prefetching mechanism that anticipates required data and transfers it from CPU to GPU in parallel with LLM generation. By leveraging the modularity of RAG pipelines, the inverted file index (IVF) search algorithm and similarities between queries, TeleRAG optimally overlaps data movement and computation. Experimental results show that TeleRAG reduces end-to-end RAG inference latency by up to 1.72x on average compared to state-of-the-art systems, enabling faster, more memory-efficient deployments of advanced RAG applications.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.20969

Country:

Asia (0.28)
North America > United States (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs

Zhu, Kan, Tang, Tian, Xu, Qinyu, Gu, Yile, Zeng, Zhichen, Kadekodi, Rohan, Zhao, Liangyu, Li, Ang, Krishnamurthy, Arvind, Kasikci, Baris

arXiv.org Artificial IntelligenceFeb-17-2025

Long-context models are essential for many applications but face inefficiencies in loading large KV caches during decoding. Prior methods enforce fixed token budgets for sparse attention, assuming a set number of tokens can approximate full attention. However, these methods overlook variations in the importance of attention across heads, layers, and contexts. To address these limitations, we propose Tactic, a sparsity-adaptive and calibration-free sparse attention mechanism that dynamically selects tokens based on their cumulative attention scores rather than a fixed token budget. By setting a target fraction of total attention scores, Tactic ensures that token selection naturally adapts to variations in attention sparsity. To efficiently approximate this selection, Tactic leverages clustering-based sorting and distribution fitting, allowing it to accurately estimate token importance with minimal computational overhead. We show that Tactic outperforms existing sparse attention algorithms, achieving superior accuracy and up to 7.29x decode attention speedup. This improvement translates to an overall 1.58x end-to-end inference speedup, making Tactic a practical and effective solution for long-context LLM inference in accuracy-sensitive applications.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2502.12216

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Rand-NSG: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node

Subramanya, Suhas Jayaram, Devvrit, Fnu, Simhadri, Harsha Vardhan, Krishnawamy, Ravishankar, Kadekodi, Rohan

Neural Information Processing SystemsMar-19-2020, 02:17:00 GMT

Current state-of-the-art approximate nearest neighbor search (ANNS) algorithms generate indices that must be stored in main memory for fast high-recall search. This makes them expensive and limits the size of the dataset. We present a new graph-based indexing and search system called DiskANN that can index, store, and search a billion point database on a single workstation with just 64GB RAM and an inexpensive solid-state drive (SSD). Contrary to current wisdom, we demonstrate that the SSD-based indices built by DiskANN can meet all three desiderata for large-scale ANNS: high-recall, low query latency and high density (points indexed per node). On the billion point SIFT1B bigann dataset, DiskANN serves 5000 queries a second with 3ms mean latency and 95% 1-recall@1 on a 16 core machine, where state-of-the-art billion-point ANNS algorithms with similar memory footprint like FAISS and IVFOADC G P plateau at around 50% 1-recall@1.

artificial intelligence, natural language, nearest neighbor search, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.64)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.64)

Add feedback