AITopics | Mukherjee, Sagnik

Collaborating Authors

Mukherjee, Sagnik

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Premise-Augmented Reasoning Chains Improve Error Identification in Math reasoning with LLMs

Mukherjee, Sagnik, Chinta, Abhinav, Kim, Takyoung, Sharma, Tarun Anoop, Hakkani-Tür, Dilek

arXiv.org Artificial IntelligenceFeb-12-2025

Chain-of-Thought (CoT) prompting enhances mathematical reasoning in large language models (LLMs) by enabling detailed step-by-step solutions. However, due to the verbosity of LLMs, the resulting reasoning chains can be long, making it harder to verify the reasoning steps and trace issues resulting from dependencies between the steps that may be farther away in the sequence of steps. Importantly, mathematical reasoning allows each step to be derived from a small set of premises, which are a subset of the preceding steps in the reasoning chain. In this paper, we present a framework that identifies the premises for each step, to improve the evaluation of reasoning. We restructure conventional linear reasoning chains into Premise Augmented Reasoning Chains (PARC) by introducing premise links, resulting in a directed acyclic graph where the nodes are the steps and the edges are the premise links. Through experiments with a PARC-based dataset that we built, namely PERL (Premises and ERrors identification in LLMs), we demonstrate that LLMs can reliably identify premises within complex reasoning chains. In particular, even open-source LLMs achieve 90% recall in premise identification. We also show that PARC helps to identify errors in reasoning chains more reliably. The accuracy of error identification improves by 6% to 16% absolute when step-by-step verification is carried out in PARC under the premises. Our findings highlight the utility of premise-centric representations in addressing complex problem-solving tasks and open new avenues for improving the reliability of LLM-based reasoning evaluations.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2502.02362

Country:

North America > United States (0.14)
North America > Canada (0.14)
Asia > Thailand (0.14)

Genre:

Workflow (1.00)
Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Infogent: An Agent-Based Framework for Web Information Aggregation

Reddy, Revanth Gangi, Mukherjee, Sagnik, Kim, Jeonghwan, Wang, Zhenhailong, Hakkani-Tur, Dilek, Ji, Heng

arXiv.org Artificial IntelligenceOct-24-2024

Despite seemingly performant web agents on the task-completion benchmarks, most existing methods evaluate the agents based on a presupposition: the web navigation task consists of linear sequence of actions with an end state that marks task completion. In contrast, our work focuses on web navigation for information aggregation, wherein the agent must explore different websites to gather information for a complex query. We consider web information aggregation from two different perspectives: (i) Direct API-driven Access relies on a text-only view of the Web, leveraging external tools such as Google Search API to navigate the web and a scraper to extract website contents. (ii) Interactive Visual Access uses screenshots of the webpages and requires interaction with the browser to navigate and access information. Motivated by these diverse information access settings, we introduce Infogent, a novel modular framework for web information aggregation involving three distinct components: Navigator, Extractor and Aggregator. Experiments on different information access settings demonstrate Infogent beats an existing SOTA multi-agent search framework by 7% under Direct API-Driven Access on FRAMES, and improves over an existing information-seeking web agent by 4.3% under Interactive Visual Access on AssistantBench.

information, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2410.19054

Country: North America > United States > Illinois (0.14)

Genre:

Workflow (0.93)
Research Report (0.64)

Industry:

Information Technology > Security & Privacy (0.46)
Information Technology > Services (0.34)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(2 more...)

Add feedback

Cultural Conditioning or Placebo? On the Effectiveness of Socio-Demographic Prompting

Mukherjee, Sagnik, Adilazuarda, Muhammad Farid, Sitaram, Sunayana, Bali, Kalika, Aji, Alham Fikri, Choudhury, Monojit

arXiv.org Artificial IntelligenceJun-20-2024

Socio-demographic prompting is a commonly employed approach to study cultural biases in LLMs as well as for aligning models to certain cultures. In this paper, we systematically probe four LLMs (Llama 3, Mistral v0.2, GPT-3.5 Turbo and GPT-4) with prompts that are conditioned on culturally sensitive and non-sensitive cues, on datasets that are supposed to be culturally sensitive (EtiCor and CALI) or neutral (MMLU and ETHICS). We observe that all models except GPT-4 show significant variations in their responses on both kinds of datasets for both kinds of prompts, casting doubt on the robustness of the culturally-conditioned prompting as a method for eliciting cultural bias in models or as an alignment strategy. The work also calls rethinking the control experiment design to tease apart the cultural conditioning of responses from "placebo effect", i.e., random perturbations of model responses due to arbitrary tokens in the prompt.

large language model, machine learning, mistral-7b-instruct-v0, (16 more...)

arXiv.org Artificial Intelligence

2406.11661

Country:

Asia (1.00)
Africa (0.68)
North America > United States (0.28)
Europe > Middle East > Malta (0.14)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.92)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards Measuring and Modeling "Culture" in LLMs: A Survey

Adilazuarda, Muhammad Farid, Mukherjee, Sagnik, Lavania, Pradhyumna, Singh, Siddhant, Aji, Alham Fikri, O'Neill, Jacki, Modi, Ashutosh, Choudhury, Monojit

arXiv.org Artificial IntelligenceJun-19-2024

We present a survey of more than 90 recent papers that aim to study cultural representation and inclusion in large language models (LLMs). We observe that none of the studies explicitly define "culture, which is a complex, multifaceted concept; instead, they probe the models on some specially designed datasets which represent certain aspects of "culture". We call these aspects the proxies of culture, and organize them across two dimensions of demographic and semantic proxies. We also categorize the probing methods employed. Our analysis indicates that only certain aspects of ``culture,'' such as values and objectives, have been studied, leaving several other interesting and important facets, especially the multitude of semantic domains (Thompson et al., 2020) and aboutness (Hershcovich et al., 2022), unexplored. Two other crucial gaps are the lack of robustness of probing techniques and situated studies on the impact of cultural mis- and under-representation in LLM-based applications.

computational linguistic, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2403.15412

Country:

North America > United States (0.93)
Europe > Middle East > Malta (0.14)
Asia > Middle East > UAE (0.14)

Genre: Overview (1.00)

Industry:

Media (0.67)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback