AITopics | Srivastava, Saurabh

Collaborating Authors

Srivastava, Saurabh

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Instruction-Tuning LLMs for Event Extraction with Annotation Guidelines

Srivastava, Saurabh, Pati, Sweta, Yao, Ziyu

arXiv.org Artificial IntelligenceFeb-22-2025

In this work, we study the effect of annotation guidelines -- textual descriptions of event types and arguments, when instruction-tuning large language models for event extraction. We conducted a series of experiments with both human-provided and machine-generated guidelines in both full- and low-data settings. Our results demonstrate the promise of annotation guidelines when there is a decent amount of training data and highlight its effectiveness in improving cross-schema generalization and low-frequency event-type performance.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.16377

Country:

North America > United States > Ohio (0.14)
North America > United States > Colorado (0.14)

Genre: Research Report > New Finding (0.86)

Industry: Law (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap

Srivastava, Saurabh, B, Annarose M, P, Anto V, Menon, Shashank, Sukumar, Ajay, T, Adwaith Samod, Philipose, Alan, Prince, Stevin, Thomas, Sooraj

arXiv.org Artificial IntelligenceFeb-29-2024

We propose a framework for robust evaluation of reasoning capabilities of language models, using functional variants of benchmarks. Models that solve a reasoning test should exhibit no difference in performance over the static version of a problem compared to a snapshot of the functional variant. We have rewritten the relevant fragment of the MATH benchmark into its functional variant MATH(), with functionalization of other benchmarks to follow. When evaluating current state-of-the-art models over snapshots of MATH(), we find a reasoning gap -- the percentage difference between the static and functional accuracies. We find reasoning gaps from 58.35% to 80.31% among the state-of-the-art closed and open weights models that perform well on static benchmarks, with the caveat that the gaps are likely to be smaller with more sophisticated prompting strategies. Here we show that models which anecdotally have good reasoning performance over real-world tasks, have quantifiable lower gaps, motivating the open problem of building "gap 0" models. Code for evaluation and new evaluation datasets, three MATH() snapshots, are publicly available at https://github.com/consequentai/fneval/.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2402.1945

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.83)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

MAILEX: Email Event and Argument Extraction

Srivastava, Saurabh, Singh, Gaurav, Matsumoto, Shou, Raz, Ali, Costa, Paulo, Poore, Joshua, Yao, Ziyu

arXiv.org Artificial IntelligenceOct-20-2023

In this work, we present the first dataset, MailEx, for performing event extraction from conversational email threads. To this end, we first proposed a new taxonomy covering 10 event types and 76 arguments in the email domain. Our final dataset includes 1.5K email threads and ~4K emails, which are annotated with totally ~8K event instances. To understand the task challenges, we conducted a series of experiments comparing three types of approaches, i.e., fine-tuned sequence labeling, fine-tuned generative extraction, and few-shot in-context learning. Our results showed that the task of email event extraction is far from being addressed, due to challenges lying in, e.g., extracting non-continuous, shared trigger spans, extracting non-named entity arguments, and modeling the email conversational history. Our work thus suggests more future investigations in this domain-specific event extraction task.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2305.13469

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Instance Needs More Care: Rewriting Prompts for Instances Yields Better Zero-Shot Performance

Srivastava, Saurabh, Huang, Chengyue, Fan, Weiguo, Yao, Ziyu

arXiv.org Artificial IntelligenceOct-5-2023

Enabling large language models (LLMs) to perform tasks in zero-shot has been an appealing goal owing to its labor-saving (i.e., requiring no task-specific annotations); as such, zero-shot prompting approaches also enjoy better task generalizability. To improve LLMs' zero-shot performance, prior work has focused on devising more effective task instructions (e.g., ``let's think step by step'' ). However, we argue that, in order for an LLM to solve them correctly in zero-shot, individual test instances need more carefully designed and customized instructions. To this end, we propose PRoMPTd, an approach that rewrites the task prompt for each individual test input to be more specific, unambiguous, and complete, so as to provide better guidance to the task LLM. We evaluated PRoMPTd on eight datasets covering tasks including arithmetics, logical reasoning, and code generation, using GPT-4 as the task LLM. Notably, PRoMPTd achieves an absolute improvement of around 10% on the complex MATH dataset and 5% on the code generation task on HumanEval, outperforming conventional zero-shot methods. In addition, we also showed that the rewritten prompt can provide better interpretability of how the LLM resolves each test instance, which can potentially be leveraged as a defense mechanism against adversarial prompting. The source code and dataset can be obtained from https://github.com/salokr/PRoMPTd

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2310.02107

Genre: Research Report > New Finding (0.67)

Industry: Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Learning to Simulate Natural Language Feedback for Interactive Semantic Parsing

Yan, Hao, Srivastava, Saurabh, Tai, Yintao, Wang, Sida I., Yih, Wen-tau, Yao, Ziyu

arXiv.org Artificial IntelligenceJun-4-2023

Interactive semantic parsing based on natural language (NL) feedback, where users provide feedback to correct the parser mistakes, has emerged as a more practical scenario than the traditional one-shot semantic parsing. However, prior work has heavily relied on human-annotated feedback data to train the interactive semantic parser, which is prohibitively expensive and not scalable. In this work, we propose a new task of simulating NL feedback for interactive semantic parsing. We accompany the task with a novel feedback evaluator. The evaluator is specifically designed to assess the quality of the simulated feedback, based on which we decide the best feedback simulator from our proposed variants. On a text-to-SQL dataset, we show that our feedback simulator can generate high-quality NL feedback to boost the error correction ability of a specific parser. In low-data settings, our feedback simulator can help achieve comparable error correction performance as trained using the costly, full set of human annotations.

artificial intelligence, computational linguistic, natural language, (15 more...)

arXiv.org Artificial Intelligence

2305.08195

Country:

Europe (1.00)
Asia (0.93)
North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

PUTWorkbench: Analysing Privacy in AI-intensive Systems

Srivastava, Saurabh, Namboodiri, Vinay P., Prabhakar, T. V.

arXiv.org Artificial IntelligenceFeb-5-2019

AI intensive systems that operate upon user data face the challenge of balancing data utility with privacy concerns. We propose the idea and present the prototype of an open-source tool called Privacy Utility Trade-off (PUT) Workbench which seeks to aid software practitioners to take such crucial decisions. We pick a simple privacy model that doesn't require any background knowledge in Data Science and show how even that can achieve significant results over standard and real-life datasets. The tool and the source code is made freely available for extensions and usage.

artificial intelligence, data mining, dataset, (18 more...)

arXiv.org Artificial Intelligence

1902.0158

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

QA RT : A System for Real-Time Holistic Quality Assurance for Contact Center Dialogues

Roy, Shourya (Xerox Research Centre India) | Mariappan, Ragunathan (Xerox Research Centre India) | Dandapat, Sandipan (Xerox Research Centre India) | Srivastava, Saurabh (Xerox Research Centre India) | Galhotra, Sainyam (University of Massachussets, Amherst) | Peddamuthu, Balaji (Xerox Research Centre India)

AAAI ConferencesApr-19-2016

Quality assurance (QA) and customer satisfaction (C-Sat) analysis are two commonly used practices to measure goodness of dialogues between agents and customers in contact centers. The practices however have a few shortcomings. QA puts sole emphasis on agents’ organizational compliance aspect whereas C-Sat attempts to measure customers’ satisfaction only based on post dialogue surveys. As a result, outcome of independent QA and C-Sat analysis may not always be in correspondence. Secondly, both processes are retrospective in nature and hence, evidences of bad past dialogues (and consequently bad customer experiences) can only be found after hours or days or weeks depending on their periodicity. Finally, human intensive nature of these practices lead to time and cost overhead while being able to analyze only a small fraction of dialogues. In this paper, we introduce an automatic real-time quality assurance system for contact centers — QART (pronounced cart). QART performs multi-faceted analysis on dialogue utterances, as they happen, using sophisticated statistical and rule-based natural language processing (NLP) techniques. It covers various aspects inspired by today’s QA and C-Sat practices as well as introduces novel incremental dialogue summarization capability. QART front-end is an interactive dashboard providing views of ongoing dialogues at different granularity enabling agents’ supervisors to monitor and take corrective actions as needed. We demonstrate effectiveness of different back-end modules as well as the overall system by experimental results on a real-life contact center chat dataset.

Add feedback