AITopics | Parameswaran, Aditya

Plotting

Parameswaran, Aditya

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Why Do Multi-Agent LLM Systems Fail?

Cemri, Mert, Pan, Melissa Z., Yang, Shuyi, Agrawal, Lakshya A., Chopra, Bhavya, Tiwari, Rishabh, Keutzer, Kurt, Parameswaran, Aditya, Klein, Dan, Ramchandran, Kannan, Zaharia, Matei, Gonzalez, Joseph E., Stoica, Ion

arXiv.org Artificial IntelligenceMar-17-2025

Despite growing enthusiasm for Multi-Agent Systems (MAS), where multiple LLM agents collaborate to accomplish tasks, their performance gains across popular benchmarks remain minimal compared to single-agent frameworks. This gap highlights the need to analyze the challenges hindering MAS effectiveness. In this paper, we present the first comprehensive study of MAS challenges. We analyze five popular MAS frameworks across over 150 tasks, involving six expert human annotators. We identify 14 unique failure modes and propose a comprehensive taxonomy applicable to various MAS frameworks. This taxonomy emerges iteratively from agreements among three expert annotators per study, achieving a Cohen's Kappa score of 0.88. These fine-grained failure modes are organized into 3 categories, (i) specification and system design failures, (ii) inter-agent misalignment, and (iii) task verification and termination. To support scalable evaluation, we integrate MASFT with LLM-as-a-Judge. We also explore if identified failures could be easily prevented by proposing two interventions: improved specification of agent roles and enhanced orchestration strategies. Our findings reveal that identified failures require more complex solutions, highlighting a clear roadmap for future research. We open-source our dataset and LLM annotator.

artificial intelligence, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2503.13657

Country: North America (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.67)
Information Technology (0.67)
Leisure & Entertainment > Games (0.46)
Media > Music (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

LLM-Powered Proactive Data Systems

Zeighami, Sepanta, Lin, Yiming, Shankar, Shreya, Parameswaran, Aditya

arXiv.org Artificial IntelligenceFeb-18-2025

With the power of LLMs, we now have the ability to query data that was previously impossible to query, including text, images, and video. However, despite this enormous potential, most present-day data systems that leverage LLMs are reactive, reflecting our community's desire to map LLMs to known abstractions. Most data systems treat LLMs as an opaque black box that operates on user inputs and data as is, optimizing them much like any other approximate, expensive UDFs, in conjunction with other relational operators. Such data systems do as they are told, but fail to understand and leverage what the LLM is being asked to do (i.e. the underlying operations, which may be error-prone), the data the LLM is operating on (e.g., long, complex documents), or what the user really needs. They don't take advantage of the characteristics of the operations and/or the data at hand, or ensure correctness of results when there are imprecisions and ambiguities. We argue that data systems instead need to be proactive: they need to be given more agency -- armed with the power of LLMs -- to understand and rework the user inputs and the data and to make decisions on how the operations and the data should be represented and processed. By allowing the data system to parse, rewrite, and decompose user inputs and data, or to interact with the user in ways that go beyond the standard single-shot query-result paradigm, the data system is able to address user needs more efficiently and effectively. These new capabilities lead to a rich design space where the data system takes more initiative: they are empowered to perform optimization based on the transformation operations, data characteristics, and user intent. We discuss various successful examples of how this framework has been and can be applied in real-world tasks, and present future directions for this ambitious research agenda.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.13016

Genre: Research Report (0.40)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.70)
Law (0.69)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.34)

Add feedback

Helix: Accelerating Human-in-the-loop Machine Learning

Xin, Doris, Ma, Litian, Liu, Jialin, Macke, Stephen, Song, Shuchen, Parameswaran, Aditya

arXiv.org Machine LearningAug-3-2018

Data application developers and data scientists spend an inordinate amount of time iterating on machine learning (ML) workflows -- by modifying the data pre-processing, model training, and post-processing steps -- via trial-and-error to achieve the desired model performance. Existing work on accelerating machine learning focuses on speeding up one-shot execution of workflows, failing to address the incremental and dynamic nature of typical ML development. We propose Helix, a declarative machine learning system that accelerates iterative development by optimizing workflow execution end-to-end and across iterations. Helix minimizes the runtime per iteration via program analysis and intelligent reuse of previous results, which are selectively materialized -- trading off the cost of materialization for potential future benefits -- to speed up future iterations. Additionally, Helix offers a graphical interface to visualize workflow DAGs and compare versions to facilitate iterative development. Through two ML applications, in classification and in structured prediction, attendees will experience the succinctness of Helix programming interface and the speed and ease of iterative development using Helix. In our evaluations, Helix achieved up to an order of magnitude reduction in cumulative run time compared to state-of-the-art machine learning tools.

artificial intelligence, machine learning, workflow, (18 more...)

arXiv.org Machine Learning

doi: 10.14778/3229863.3236234

1808.01095

Country: North America > United States > Wisconsin (0.14)

Genre:

Workflow (0.81)
Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

How Developers Iterate on Machine Learning Workflows -- A Survey of the Applied Machine Learning Literature

Xin, Doris, Ma, Litian, Song, Shuchen, Parameswaran, Aditya

arXiv.org Machine LearningMar-27-2018

Machine learning workflow development is anecdotally regarded to be an iterative process of trial-and-error with humans-in-the-loop. However, we are not aware of quantitative evidence corroborating this popular belief. A quantitative characterization of iteration can serve as a benchmark for machine learning workflow development in practice, and can aid the development of human-in-the-loop machine learning systems. To this end, we conduct a small-scale survey of the applied machine learning literature from five distinct application domains. We collect and distill statistics on the role of iteration within machine learning workflow development, and report preliminary trends and insights from our investigation, as a starting point towards this benchmark. Based on our findings, we finally describe desiderata for effective and versatile human-in-the-loop machine learning systems that can cater to users in diverse domains.

iteration, neural network, survey article, (18 more...)

arXiv.org Machine Learning

1803.10311

Country: North America > United States > Illinois (0.14)

Genre:

Workflow (1.00)
Research Report > New Finding (0.48)

Industry: Education > Curriculum > Subject-Specific Education (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Surpassing Humans and Computers with JELLYBEAN: Crowd-Vision-Hybrid Counting Algorithms

Sarma, Akash Das (Stanford University) | Jain, Ayush (University of Illinois) | Nandi, Arnab (The Ohio State University) | Parameswaran, Aditya (University of Illinois) | Widom, Jennifer (Stanford University)

AAAI ConferencesNov-1-2015

Counting objects is a fundamental image processisng primitive, and has many scientific, health, surveillance, security, and military applications. Existing supervised computer vision techniques typically require large quantities of labeled training data, and even with that, fail to return accurate results in all but the most stylized settings. Using vanilla crowdsourcing, on the other hand, can lead to significant errors, especially on images with many objects. In this paper, we present our JellyBean suite of algorithms, that combines the best of crowds and computer vision to count objects in images, and uses judicious decomposition of images to greatly improve accuracy at low cost. Our algorithms have several desirable properties: (i) they are theoretically optimal or near-optimal , in that they ask as few questions as possible to humans (under certain intuitively reasonable assumptions that we justify in our paper experimentally); (ii) they operate under stand-alone or hybrid modes, in that they can either work independent of computer vision algorithms, or work in concert with them, depending on whether the computer vision techniques are available or useful for the given setting; (iii) they perform very well in practice, returning accurate counts on images that no individual worker or computer vision algorithm can count correctly, while not incurring a high cost.

algorithm, crowdsourcing, social media, (19 more...)

AAAI Conferences

Third AAAI Conference on Human Computation and Crowdsourcing

Country: North America > United States (0.68)

Genre: Research Report (0.46)

Industry: Health & Medicine > Epidemiology (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.40)

Add feedback

Optimal Worker Quality and Answer Estimates in Crowd-Powered Filtering and Rating

Sarma, Akash Das (Stanford University) | Parameswaran, Aditya (University of Illinois (UIUC)) | Widom, Jennifer (Stanford University)

AAAI ConferencesOct-31-2014

We consider the problem of optimally filtering (or rating) a set of items based on predicates (or scoring) requiring human evaluation. Filtering and rating are ubiquitous problems across crowdsourcing applications. We consider the setting where we are given a set of items and a set of worker responses for each item: yes/no in the case of filtering and an integer value in the case of rating. We assume that items have a true inherent value that is unknown, and workers draw their responses from a common, but hidden, error distribution. Our goal is to simultaneously assign a ground truth to the item-set and estimate the worker error distribution. Previous work in this area has focused on heuristics such as Expectation Maximization (EM), providing only a local optima guarantee, while we have developed a general framework that finds a maximum likelihood solution. Our approach extends to a number of variations on the filtering and rating problems.

artificial intelligence, machine learning, mapping, (15 more...)

AAAI Conferences

Second AAAI Conference on Human Computation and Crowdsourcing

Country: North America > United States (0.16)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.90)
Information Technology > Artificial Intelligence > Machine Learning (0.90)
Information Technology > Communications > Social Media > Crowdsourcing (0.36)

Add feedback

DataSift: An Expressive and Accurate Crowd-Powered Search Toolkit

Parameswaran, Aditya (Stanford University) | Teh, Ming Han (Stanford University) | Garcia-Molina, Hector (Stanford University) | Widom, Jennifer (Stanford University)

AAAI ConferencesNov-5-2013

Traditional information retrieval systems have limited functionality. For instance, they are not able to adequately support queries containing non-textual fragments such as images or videos, queries that are very long or ambiguous, or semantically-rich queries over non-textual corpora. In this paper, we present DataSift, an expressive and accurate crowd-powered search toolkit that can connect to any corpus. We provide a number of alternative configurations for DataSift using crowdsourced and automated components, and demonstrate gains of 2–3x on precision over traditional retrieval schemes using experiments on real corpora. We also present our results on determining suitable values for parameters in those configurations, along with a number of interesting insights learned along the way.

accurate crowd-powered search toolkit, datasift

AAAI Conferences

First AAAI Conference on Human Computation and Crowdsourcing

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.53)

Add feedback

Minimizing Uncertainty in Pipelines

Dalvi, Nilesh, Parameswaran, Aditya, Rastogi, Vibhor

Neural Information Processing SystemsDec-31-2012

In this paper, we consider the problem of debugging large pipelines by human labeling. We represent the execution of a pipeline using a directed acyclic graph of AND and OR nodes, where each node represents a data item produced by some operator in the pipeline. We assume that each operator assigns a confidence to each of its output data. We want to reduce the uncertainty in the output by issuing queries to a human expert, where a query consists of checking if a given data item is correct. In this paper, we consider the problem of asking the optimal set of queries to minimize the resulting output uncertainty. We perform a detailed evaluation of the complexity of the problem for various classes of graphs. We give efficient algorithms for the problem for trees, and show that, for a general dag, the problem is intractable.

artificial intelligence, machine learning, node, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin (0.14)
North America > United States > Rhode Island (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback