AITopics | Mashetty, Santosh

Collaborating Authors

Mashetty, Santosh

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models

Parmar, Mihir, Patel, Nisarg, Varshney, Neeraj, Nakamura, Mutsumi, Luo, Man, Mashetty, Santosh, Mitra, Arindam, Baral, Chitta

arXiv.org Artificial IntelligenceJun-6-2024

Recently developed large language models (LLMs) have been shown to perform remarkably well on a wide range of language understanding tasks. But, can they really "reason" over the natural language? This question has been receiving significant research attention and many reasoning skills such as commonsense, numerical, and qualitative have been studied. However, the crucial skill pertaining to 'logical reasoning' has remained underexplored. Existing work investigating this reasoning ability of LLMs has focused only on a couple of inference rules (such as modus ponens and modus tollens) of propositional and first-order logic. Addressing the above limitation, we comprehensively evaluate the logical reasoning ability of LLMs on 25 different reasoning patterns spanning over propositional, first-order, and non-monotonic logics. To enable systematic evaluation, we introduce LogicBench, a natural language question-answering dataset focusing on the use of a single inference rule. We conduct detailed analysis with a range of LLMs such as GPT-4, ChatGPT, Gemini, Llama-2, and Mistral using chain-of-thought prompting. Experimental results show that existing LLMs do not fare well on LogicBench; especially, they struggle with instances involving complex reasoning and negations. Furthermore, they sometimes overlook contextual information necessary for reasoning to arrive at the correct conclusion. We believe that our work and findings facilitate future research for evaluating and enhancing the logical reasoning ability of LLMs. Data and code are available at https://github.com/Mihir3009/LogicBench.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2404.15522

Country:

Europe (1.00)
Asia (0.67)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.65)

Industry:

Information Technology (0.67)
Leisure & Entertainment (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Context-NER : Contextual Phrase Generation at Scale

Gupta, Himanshu, Verma, Shreyas, Mashetty, Santosh, Mishra, Swaroop

arXiv.org Artificial IntelligenceJun-8-2023

Named Entity Recognition (NER) has seen significant progress in recent years, with numerous state-of-the-art (SOTA) models achieving high performance. However, very few studies have focused on the generation of entities' context. In this paper, we introduce CONTEXT-NER, a task that aims to generate the relevant context for entities in a sentence, where the context is a phrase describing the entity but not necessarily present in the sentence. To facilitate research in this task, we also present the EDGAR10-Q dataset, which consists of annual and quarterly reports from the top 1500 publicly traded companies. The dataset is the largest of its kind, containing 1M sentences, 2.8M entities, and an average of 35 tokens per sentence, making it a challenging dataset. We propose a baseline approach that combines a phrase generation algorithm with inferencing using a 220M language model, achieving a ROUGE-L score of 27% on the test split. Additionally, we perform a one-shot inference with ChatGPT, which obtains a 30% ROUGE-L, highlighting the difficulty of the dataset. We also evaluate models such as T5 and BART, which achieve a maximum ROUGE-L of 49% after supervised finetuning on EDGAR10-Q. We also find that T5-large, when pre-finetuned on EDGAR10-Q, achieve SOTA results on downstream finance tasks such as Headline, FPB, and FiQA SA, outperforming vanilla version by 10.81 points. To our surprise, this 66x smaller pre-finetuned model also surpasses the finance-specific LLM BloombergGPT-50B by 15 points. We hope that our dataset and generated artifacts will encourage further research in this direction, leading to the development of more sophisticated language models for financial text analysis

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2109.08079

Country:

Europe (1.00)
Asia > Middle East > UAE (0.14)
North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Maryland (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Law (1.00)
Information Technology (1.00)
Government (1.00)
Banking & Finance > Trading (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Instruction Tuned Models are Quick Learners

Gupta, Himanshu, Sawant, Saurabh Arjun, Mishra, Swaroop, Nakamura, Mutsumi, Mitra, Arindam, Mashetty, Santosh, Baral, Chitta

arXiv.org Artificial IntelligenceMay-17-2023

Instruction tuning of language models has demonstrated the ability to enhance model generalization to unseen tasks via in-context learning using a few examples. However, typical supervised learning still requires a plethora of downstream training data for finetuning. Often in real-world situations, there is a scarcity of data available for finetuning, falling somewhere between few shot inference and fully supervised finetuning. In this work, we demonstrate the sample efficiency of instruction tuned models over various tasks by estimating the minimal downstream training data required by them to perform transfer learning and match the performance of state-of-the-art (SOTA) supervised models. We conduct experiments on 119 tasks from Super Natural Instructions (SuperNI) in both the single task learning (STL) and multi task learning (MTL) settings. Our findings reveal that, in the STL setting, instruction tuned models equipped with 25% of the downstream train data surpass the SOTA performance on the downstream tasks. In the MTL setting, an instruction tuned model trained on only 6% of downstream training data achieve SOTA, while using 100% of the training data results in a 3.69% points improvement (ROUGE-L 74.68) over the previous SOTA. We conduct an analysis on T5 vs Tk-Instruct by developing several baselines to demonstrate that instruction tuning aids in increasing both sample efficiency and transfer learning. Additionally, we observe a consistent ~4% performance increase in both settings when pre-finetuning is performed with instructions. Finally, we conduct a categorical study and find that contrary to previous results, tasks in the question rewriting and title generation categories suffer from instruction tuning.

artificial intelligence, instruction, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2306.05539

Country:

North America > United States (1.00)
Europe (1.00)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Banking & Finance (1.00)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback