AITopics | Ramchandran, Kannan

Collaborating Authors

Ramchandran, Kannan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Why Do Multi-Agent LLM Systems Fail?

Cemri, Mert, Pan, Melissa Z., Yang, Shuyi, Agrawal, Lakshya A., Chopra, Bhavya, Tiwari, Rishabh, Keutzer, Kurt, Parameswaran, Aditya, Klein, Dan, Ramchandran, Kannan, Zaharia, Matei, Gonzalez, Joseph E., Stoica, Ion

arXiv.org Artificial IntelligenceMar-17-2025

Despite growing enthusiasm for Multi-Agent Systems (MAS), where multiple LLM agents collaborate to accomplish tasks, their performance gains across popular benchmarks remain minimal compared to single-agent frameworks. This gap highlights the need to analyze the challenges hindering MAS effectiveness. In this paper, we present the first comprehensive study of MAS challenges. We analyze five popular MAS frameworks across over 150 tasks, involving six expert human annotators. We identify 14 unique failure modes and propose a comprehensive taxonomy applicable to various MAS frameworks. This taxonomy emerges iteratively from agreements among three expert annotators per study, achieving a Cohen's Kappa score of 0.88. These fine-grained failure modes are organized into 3 categories, (i) specification and system design failures, (ii) inter-agent misalignment, and (iii) task verification and termination. To support scalable evaluation, we integrate MASFT with LLM-as-a-Judge. We also explore if identified failures could be easily prevented by proposing two interventions: improved specification of agent roles and enhanced orchestration strategies. Our findings reveal that identified failures require more complex solutions, highlighting a clear roadmap for future research. We open-source our dataset and LLM annotator.

artificial intelligence, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2503.13657

Country: North America (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.67)
Information Technology (0.67)
Leisure & Entertainment > Games (0.46)
Media > Music (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Online Assortment and Price Optimization Under Contextual Choice Models

Erginbas, Yigit Efe, Courtade, Thomas A., Ramchandran, Kannan

arXiv.org Machine LearningMar-14-2025

In online marketplaces, dynamic assortment selection and pricing for sequentially arriving buyers presents a challenge for online learning. Since the preferences of buyers are varying and uncertain, adaptive strategies are essential to meet their needs and maximize the effectiveness of offers. To address this problem, we investigate the application of online learning techniques for contextual assortment selection and pricing. Assortment selection involves the seller choosing a subset of items from a vast catalog to present to buyers, and dynamically assigning prices to the offered items. The overall goal is to maximize revenue over the course of repeated interactions. Dynamic assortment selection and pricing strategies are deployed in a variety of online sectors including e-commerce (e.g., Amazon), food delivery (e.g., Uber Eats), and hospitality (e.g., Airbnb). With similar systems becoming ubiquitous in our daily lives, there is a growing opportunity to deliver tailored product recommendations and pricing adjustments. Therefore, it is crucial to consider data-driven approaches that can enhance user experiences and boost profitability in today's highly competitive digital industry.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

2503.11819

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Services (1.00)
Consumer Products & Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Data Science > Data Mining (0.93)

Add feedback

SPEX: Scaling Feature Interaction Explanations for LLMs

Kang, Justin Singh, Butler, Landon, Agarwal, Abhineet, Erginbas, Yigit Efe, Pedarsani, Ramtin, Ramchandran, Kannan, Yu, Bin

arXiv.org Artificial IntelligenceFeb-19-2025

Large language models (LLMs) have revolutionized machine learning due to their ability to capture complex interactions between input features. Popular post-hoc explanation methods like SHAP provide marginal feature attributions, while their extensions to interaction importances only scale to small input lengths ($\approx 20$). We propose Spectral Explainer (SPEX), a model-agnostic interaction attribution algorithm that efficiently scales to large input lengths ($\approx 1000)$. SPEX exploits underlying natural sparsity among interactions -- common in real-world data -- and applies a sparse Fourier transform using a channel decoding algorithm to efficiently identify important interactions. We perform experiments across three difficult long-context datasets that require LLMs to utilize interactions between inputs to complete the task. For large inputs, SPEX outperforms marginal attribution methods by up to 20% in terms of faithfully reconstructing LLM outputs. Further, SPEX successfully identifies key features and interactions that strongly influence model output. For one of our datasets, HotpotQA, SPEX provides interactions that align with human annotations. Finally, we use our model-agnostic approach to generate explanations to demonstrate abstract reasoning in closed-source LLMs (GPT-4o mini) and compositional reasoning in vision-language models.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.1387

Country:

Europe (0.46)
North America > United States > Oregon (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

From Markov to Laplace: How Mamba In-Context Learns Markov Chains

Bondaschi, Marco, Rajaraman, Nived, Wei, Xiuying, Ramchandran, Kannan, Pascanu, Razvan, Gulcehre, Caglar, Gastpar, Michael, Makkuva, Ashok Vardhan

arXiv.org Artificial IntelligenceFeb-14-2025

While transformer-based language models have driven the AI revolution thus far, their computational complexity has spurred growing interest in viable alternatives, such as structured state space sequence models (SSMs) and Selective SSMs. Among these, Mamba (S6) and its variant Mamba-2 have shown remarkable inference speed ups over transformers while achieving comparable or superior performance on complex language modeling tasks. However, despite these architectural innovations and empirical successes, the fundamental learning capabilities of Mamba remain poorly understood. In this paper, we address this gap by studying in-context learning (ICL) on Markov chains and uncovering a surprising phenomenon: unlike transformers, even a single-layer Mamba efficiently learns the in-context Laplacian smoothing estimator, which is both Bayes and minimax optimal, for all Markovian orders. To explain this, we theoretically characterize the representation capacity of Mamba and reveal the fundamental role of convolution in enabling it to represent the optimal Laplacian smoothing. These theoretical insights align strongly with empirical results and, to the best of our knowledge, represent the first formal connection between Mamba and optimal statistical estimators. Finally, we outline promising research directions inspired by these findings.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.10178

Country: Europe (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.85)

Add feedback

VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data

Zeng, Thomas, Zhang, Shuibai, Wu, Shutong, Classen, Christian, Chae, Daewon, Ewer, Ethan, Lee, Minjae, Kim, Heeju, Kang, Wonjun, Kunde, Jackson, Fan, Ying, Kim, Jungtaek, Koo, Hyung Il, Ramchandran, Kannan, Papailiopoulos, Dimitris, Lee, Kangwook

arXiv.org Artificial IntelligenceFeb-10-2025

In particular, Outcome Reward Models (ORMs) are Process Reward Models (PRMs) have proven used to provide supervision based solely on the correctness effective at enhancing mathematical reasoning of the final outcome. However, ORMs fail to address errors for Large Language Models (LLMs) by leveraging in intermediate steps, limiting their effectiveness for increased inference-time computation. However, complex, multi-step reasoning tasks (Luo et al., 2024; Lightman they are predominantly trained on mathematical et al., 2024; Sun et al., 2024). Because ORMs suffer data and their generalizability to nonmathematical from this limitation, Process Reward Models (PRMs) have domains has not been rigorously been proposed to offer fine-grained, step-by-step feedback studied. In response, this work first shows that on the correctness of each reasoning step (Lightman et al., current PRMs have poor performance in other 2024; Uesato et al., 2022). PRMs have proven highly effective domains. To address this limitation, we introduce during inference, improving the reranking of generated VersaPRM, a multi-domain PRM trained solutions and guiding LLMs through search-based on synthetic reasoning data generated using our algorithms (Wan et al., 2024; Wang et al., 2024a).

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2502.06737

Country:

North America > United States > Wisconsin (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Quantifying Positional Biases in Text Embedding Models

Goel, Samarth, Lee, Reagan J., Ramchandran, Kannan

arXiv.org Artificial IntelligenceJan-1-2025

Embedding models are crucial for tasks in Information Retrieval (IR) and semantic similarity measurement, yet their handling of longer texts and associated positional biases remains underexplored. In this study, we investigate the impact of content position and input size on text embeddings. Our experiments reveal that embedding models, irrespective of their positional encoding mechanisms, disproportionately prioritize the beginning of an input. Ablation studies demonstrate that insertion of irrelevant text or removal at the start of a document reduces cosine similarity between altered and original embeddings by up to 12.3% more than ablations at the end. Regression analysis further confirms this bias, with sentence importance declining as position moves further from the start, even with with content-agnosticity. We hypothesize that this effect arises from pre-processing strategies and chosen positional encoding techniques. These findings quantify the sensitivity of retrieval systems and suggest a new lens towards embedding model robustness.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2412.15241

Country:

North America > United States > California (0.29)
Asia (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.89)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Add feedback

EmbedLLM: Learning Compact Representations of Large Language Models

Zhuang, Richard, Wu, Tianhao, Wen, Zhaojin, Li, Andrew, Jiao, Jiantao, Ramchandran, Kannan

arXiv.org Artificial IntelligenceOct-16-2024

With hundreds of thousands of language models available on Huggingface today, efficiently evaluating and utilizing these models across various downstream, tasks has become increasingly critical. Many existing methods repeatedly learn task-specific representations of Large Language Models (LLMs), which leads to inefficiencies in both time and computational resources. To address this, we propose EmbedLLM, a framework designed to learn compact vector representations, of LLMs that facilitate downstream applications involving many models, such as model routing. We introduce an encoder-decoder approach for learning such embeddings, along with a systematic framework to evaluate their effectiveness. Empirical results show that EmbedLLM outperforms prior methods in model routing both in accuracy and latency. Additionally, we demonstrate that our method can forecast a model's performance on multiple benchmarks, without incurring additional inference cost. Extensive probing experiments validate that the learned embeddings capture key model characteristics, e.g. whether the model is specialized for coding tasks, even without being explicitly trained on them. We open source our dataset, code and embedder to facilitate further research and application.

benchmark, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2410.02223

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Looped Transformers for Length Generalization

Fan, Ying, Du, Yilun, Ramchandran, Kannan, Lee, Kangwook

arXiv.org Artificial IntelligenceSep-25-2024

Recent work has shown that Transformers trained from scratch can successfully solve various arithmetic and algorithmic tasks, such as adding numbers and computing parity. While these Transformers generalize well on unseen inputs of the same length, they struggle with length generalization, i.e., handling inputs of unseen lengths. In this work, we demonstrate that looped Transformers with an adaptive number of steps significantly improve length generalization. We focus on tasks with a known iterative solution, involving multiple iterations of a RASP-L operation - a length-generalizable operation that can be expressed by a finite-sized Transformer. We train looped Transformers using our proposed learning algorithm and observe that they learn highly length-generalizable solutions for various tasks.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2409.15647

Country: North America > United States > Wisconsin (0.14)

Genre: Research Report (0.82)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Toward a Theory of Tokenization in LLMs

Rajaraman, Nived, Jiao, Jiantao, Ramchandran, Kannan

arXiv.org Artificial IntelligenceApr-12-2024

While there has been a large body of research attempting to circumvent tokenization for language modeling (Clark et al., 2022; Xue et al., 2022), the current consensus is that it is a necessary initial step for designing state-of-the-art performant language models. In this paper, we investigate tokenization from a theoretical point of view by studying the behavior of transformers on simple data generating processes. When trained on data drawn from certain simple $k^{\text{th}}$-order Markov processes for $k > 1$, transformers exhibit a surprising phenomenon - in the absence of tokenization, they empirically fail to learn the right distribution and predict characters according to a unigram model (Makkuva et al., 2024). With the addition of tokenization, however, we empirically observe that transformers break through this barrier and are able to model the probabilities of sequences drawn from the source near-optimally, achieving small cross-entropy loss. With this observation as starting point, we study the end-to-end cross-entropy loss achieved by transformers with and without tokenization. With the appropriate tokenization, we show that even the simplest unigram models (over tokens) learnt by transformers are able to model the probability of sequences drawn from $k^{\text{th}}$-order Markov sources near optimally. Our analysis provides a justification for the use of tokenization in practice through studying the behavior of transformers on Markovian data.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2404.08335

Country:

Europe (0.45)
North America > United States > California (0.14)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Add feedback

Learning to Understand: Identifying Interactions via the Mobius Transform

Kang, Justin S., Erginbas, Yigit E., Butler, Landon, Pedarsani, Ramtin, Ramchandran, Kannan

arXiv.org Artificial IntelligenceFeb-4-2024

One of the most fundamental problems in machine learning is finding interpretable representations of the functions we learn. The Mobius transform is a useful tool for this because its coefficients correspond to unique importance scores on sets of input variables. The Mobius Transform is strongly related (and in some cases equivalent) to the concept of Shapley value, which is a widely used game-theoretic notion of importance. This work focuses on the (typical) regime where the fraction of non-zero Mobius coefficients (and thus interactions between inputs) is small compared to the set of all $2^n$ possible interactions between $n$ inputs. When there are $K = O(2^{n \delta})$ with $\delta \leq \frac{1}{3}$ non-zero coefficients chosen uniformly at random, our algorithm exactly recovers the Mobius transform in $O(Kn)$ samples and $O(Kn^2)$ time with vanishing error as $K \rightarrow \infty$, the first non-adaptive algorithm to do so. We also uncover a surprising connection between group testing and the Mobius transform. In the case where all interactions are between at most $t = \Theta(n^{\alpha})$ inputs, for $\alpha < 0.409$, we are able to leverage results from group testing to provide the first algorithm that computes the Mobius transform in $O(Kt\log n)$ sample complexity and $O(K\mathrm{poly}(n))$ time with vanishing error as $K \rightarrow \infty$. Finally, we present a robust version of this algorithm that achieves the same sample and time complexity under some assumptions, but with a factor depending on noise variance. Our work is deeply interdisciplinary, drawing from tools spanning across signal processing, algebra, information theory, learning theory and group testing to address this important problem at the forefront of machine learning.

identifying interaction, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2402.02631

Country: North America > United States > California > Alameda County > Berkeley (0.14)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.46)

Add feedback