AITopics | Parameswaran, Aditya G.

Collaborating Authors

Parameswaran, Aditya G.

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing

Shankar, Shreya, Chambers, Tristan, Shah, Tarak, Parameswaran, Aditya G., Wu, Eugene

arXiv.org Artificial IntelligenceDec-8-2024

Analyzing unstructured data has been a persistent challenge in data processing. Large Language Models (LLMs) have shown promise in this regard, leading to recent proposals for declarative frameworks for LLM-powered processing of unstructured data. However, these frameworks focus on reducing cost when executing user-specified operations using LLMs, rather than improving accuracy, executing most operations as-is (in a single LLM call). This is problematic for complex tasks and data, where LLM outputs for user-defined operations are often inaccurate, even with optimized prompts. For example, an LLM may struggle to identify {\em all} instances of specific clauses, like force majeure or indemnification, in lengthy legal documents, requiring decomposition of the data, the task, or both. We present DocETL, a system that optimizes complex document processing pipelines, while accounting for LLM shortcomings. DocETL offers a declarative interface for users to define such pipelines and uses an agent-based approach to automatically optimize them, leveraging novel agent-based rewrites (that we call rewrite directives), as well as an optimization and evaluation framework. We introduce (i) logical rewriting of pipelines, tailored for LLM-based tasks, (ii) an agent-guided plan evaluation mechanism that synthesizes and orchestrates task-specific validation prompts, and (iii) an optimization algorithm that efficiently finds promising plans, considering the latencies of agent-based plan generation and evaluation. Our evaluation on four different unstructured document analysis tasks demonstrates that DocETL finds plans with outputs that are 25 to 80% more accurate than well-engineered baselines, addressing a critical gap in unstructured data analysis. DocETL is open-source at docetl.org, and as of November 2024, has amassed over 1.3k GitHub Stars, with users spanning a variety of domains.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.12189

Genre: Research Report (1.00)

Industry:

Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.93)
Health & Medicine (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences

Shankar, Shreya, Zamfirescu-Pereira, J. D., Hartmann, Björn, Parameswaran, Aditya G., Arawjo, Ian

arXiv.org Artificial IntelligenceApr-18-2024

Due to the cumbersome nature of human evaluation and limitations of code-based evaluation, Large Language Models (LLMs) are increasingly being used to assist humans in evaluating LLM outputs. Yet LLM-generated evaluators simply inherit all the problems of the LLMs they evaluate, requiring further human validation. We present a mixed-initiative approach to ``validate the validators'' -- aligning LLM-generated evaluation functions (be it prompts or code) with human requirements. Our interface, EvalGen, provides automated assistance to users in generating evaluation criteria and implementing assertions. While generating candidate implementations (Python functions, LLM grader prompts), EvalGen asks humans to grade a subset of LLM outputs; this feedback is used to select implementations that better align with user grades. A qualitative study finds overall support for EvalGen but underscores the subjectivity and iterative process of alignment. In particular, we identify a phenomenon we dub \emph{criteria drift}: users need criteria to grade outputs, but grading outputs helps users define criteria. What is more, some criteria appears \emph{dependent} on the specific LLM outputs observed (rather than independent criteria that can be defined \emph{a priori}), raising serious questions for approaches that assume the independence of evaluation from observation of model outputs. We present our interface and implementation details, a comparison of our algorithm with a baseline approach, and implications for the design of future LLM evaluation assistants.

assertion, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2404.12272

Country: North America > United States > New York > New York County > New York City (0.14)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (0.93)

Industry:

Health & Medicine (0.67)
Law (0.67)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Revisiting Prompt Engineering via Declarative Crowdsourcing

Parameswaran, Aditya G., Shankar, Shreya, Asawa, Parth, Jain, Naman, Wang, Yujie

arXiv.org Artificial IntelligenceAug-7-2023

Large language models (LLMs) are incredibly powerful at comprehending and generating data in the form of text, but are brittle and error-prone. There has been an advent of toolkits and recipes centered around so-called prompt engineering-the process of asking an LLM to do something via a series of prompts. However, for LLM-powered data processing workflows, in particular, optimizing for quality, while keeping cost bounded, is a tedious, manual process. We put forth a vision for declarative prompt engineering. We view LLMs like crowd workers and leverage ideas from the declarative crowdsourcing literature-including leveraging multiple prompting strategies, ensuring internal consistency, and exploring hybrid-LLM-non-LLM approaches-to make prompt engineering a more principled process. Preliminary case studies on sorting, entity resolution, and imputation demonstrate the promise of our approach

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2308.03854

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Information Technology (0.35)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback