AITopics | Murugesan, Keerthiram

Collaborating Authors

Murugesan, Keerthiram

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Cross-Examiner: Evaluating Consistency of Large Language Model-Generated Explanations

Villa, Danielle, Chang, Maria, Murugesan, Keerthiram, Uceda-Sosa, Rosario, Ramamurthy, Karthikeyan Natesan

arXiv.org Artificial IntelligenceMar-11-2025

Large Language Models (LLMs) are often asked to explain their outputs to enhance accuracy and transparency. However, evidence suggests that these explanations can misrepresent the models' true reasoning processes. One effective way to identify inaccuracies or omissions in these explanations is through consistency checking, which typically involves asking follow-up questions. This paper introduces, cross-examiner, a new method for generating follow-up questions based on a model's explanation of an initial question. Our method combines symbolic information extraction with language model-driven question generation, resulting in better follow-up questions than those produced by LLMs alone. Additionally, this approach is more flexible than other methods and can generate a wider variety of follow-up questions.

artificial intelligence, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.08815

Country: Europe > Ireland (0.14)

Genre: Research Report (0.50)

Industry: Education (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational Agents

Ngong, Ivoline, Kadhe, Swanand, Wang, Hao, Murugesan, Keerthiram, Weisz, Justin D., Dhurandhar, Amit, Ramamurthy, Karthikeyan Natesan

arXiv.org Artificial IntelligenceFeb-22-2025

Conversational agents are increasingly woven into individuals' personal lives, yet users often underestimate the privacy risks involved. The moment users share information with these agents (e.g., LLMs), their private information becomes vulnerable to exposure. In this paper, we characterize the notion of contextual privacy for user interactions with LLMs. It aims to minimize privacy risks by ensuring that users (sender) disclose only information that is both relevant and necessary for achieving their intended goals when interacting with LLMs (untrusted receivers). Through a formative design user study, we observe how even "privacy-conscious" users inadvertently reveal sensitive information through indirect disclosures. Based on insights from this study, we propose a locally-deployable framework that operates between users and LLMs, and identifies and reformulates out-of-context information in user prompts. Our evaluation using examples from ShareGPT shows that lightweight models can effectively implement this framework, achieving strong gains in contextual privacy while preserving the user's intended interaction goals through different approaches to classify information relevant to the intended goals.

information, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2502.18509

Country:

North America > United States (0.28)
Europe (0.28)

Genre:

Research Report (0.81)
Questionnaire & Opinion Survey (0.69)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

NGQA: A Nutritional Graph Question Answering Benchmark for Personalized Health-aware Nutritional Reasoning

Zhang, Zheyuan, Li, Yiyang, Le, Nhi Ha Lan, Wang, Zehong, Ma, Tianyi, Galassi, Vincent, Murugesan, Keerthiram, Moniz, Nuno, Geyer, Werner, Chawla, Nitesh V, Zhang, Chuxu, Ye, Yanfang

arXiv.org Artificial IntelligenceDec-19-2024

Diet plays a critical role in human health, yet tailoring dietary reasoning to individual health conditions remains a major challenge. Nutrition Question Answering (QA) has emerged as a popular method for addressing this problem. However, current research faces two critical limitations. On one hand, the absence of datasets involving user-specific medical information severely limits \textit{personalization}. This challenge is further compounded by the wide variability in individual health needs. On the other hand, while large language models (LLMs), a popular solution for this task, demonstrate strong reasoning abilities, they struggle with the domain-specific complexities of personalized healthy dietary reasoning, and existing benchmarks fail to capture these challenges. To address these gaps, we introduce the Nutritional Graph Question Answering (NGQA) benchmark, the first graph question answering dataset designed for personalized nutritional health reasoning. NGQA leverages data from the National Health and Nutrition Examination Survey (NHANES) and the Food and Nutrient Database for Dietary Studies (FNDDS) to evaluate whether a food is healthy for a specific user, supported by explanations of the key contributing nutrients. The benchmark incorporates three question complexity settings and evaluates reasoning across three downstream tasks. Extensive experiments with LLM backbones and baseline models demonstrate that the NGQA benchmark effectively challenges existing models. In sum, NGQA addresses a critical real-world problem while advancing GraphQA research with a novel domain-specific benchmark.

large language model, machine learning, question answering, (21 more...)

arXiv.org Artificial Intelligence

2412.15547

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Endocrinology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Granite Guardian

Padhi, Inkit, Nagireddy, Manish, Cornacchia, Giandomenico, Chaudhury, Subhajit, Pedapati, Tejaswini, Dognin, Pierre, Murugesan, Keerthiram, Miehling, Erik, Cooper, Martín Santillán, Fraser, Kieran, Zizzo, Giulio, Hameed, Muhammad Zaid, Purcell, Mark, Desmond, Michael, Pan, Qian, Ashktorab, Zahra, Vejsbjerg, Inge, Daly, Elizabeth M., Hind, Michael, Geyer, Werner, Rawat, Ambrish, Varshney, Kush R., Sattigeri, Prasanna

arXiv.org Artificial IntelligenceDec-16-2024

We introduce the Granite Guardian models, a suite of safeguards designed to provide risk detection for prompts and responses, enabling safe and responsible use in combination with any large language model (LLM). These models offer comprehensive coverage across multiple risk dimensions, including social bias, profanity, violence, sexual content, unethical behavior, jailbreaking, and hallucination-related risks such as context relevance, groundedness, and answer relevance for retrieval-augmented generation (RAG). Trained on a unique dataset combining human annotations from diverse sources and synthetic data, Granite Guardian models address risks typically overlooked by traditional risk detection models, such as jailbreaks and RAG-specific issues. With AUC scores of 0.871 and 0.854 on harmful content and RAG-hallucination-related benchmarks respectively, Granite Guardian is the most generalizable and competitive model available in the space. Released as open-source, Granite Guardian aims to promote responsible AI development across the community.

granite guardian, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2412.07724

Country:

Europe (0.93)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.82)

Industry:

Media (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MOPI-HFRS: A Multi-objective Personalized Health-aware Food Recommendation System with LLM-enhanced Interpretation

Zhang, Zheyuan, Wang, Zehong, Ma, Tianyi, Taneja, Varun Sameer, Nelson, Sofia, Le, Nhi Ha Lan, Murugesan, Keerthiram, Ju, Mingxuan, Chawla, Nitesh V, Zhang, Chuxu, Ye, Yanfang

arXiv.org Artificial IntelligenceDec-11-2024

The prevalence of unhealthy eating habits has become an increasingly concerning issue in the United States. However, major food recommendation platforms (e.g., Yelp) continue to prioritize users' dietary preferences over the healthiness of their choices. Although efforts have been made to develop health-aware food recommendation systems, the personalization of such systems based on users' specific health conditions remains under-explored. In addition, few research focus on the interpretability of these systems, which hinders users from assessing the reliability of recommendations and impedes the practical deployment of these systems. In response to this gap, we first establish two large-scale personalized health-aware food recommendation benchmarks at the first attempt. We then develop a novel framework, Multi-Objective Personalized Interpretable Health-aware Food Recommendation System (MOPI-HFRS), which provides food recommendations by jointly optimizing the three objectives: user preference, personalized healthiness and nutritional diversity, along with an large language model (LLM)-enhanced reasoning module to promote healthy dietary knowledge through the interpretation of recommended results. Specifically, this holistic graph learning framework first utilizes two structure learning and a structure pooling modules to leverage both descriptive features and health data. Then it employs Pareto optimization to achieve designed multi-facet objectives. Finally, to further promote the healthy dietary knowledge and awareness, we exploit an LLM by utilizing knowledge-infusion, prompting the LLMs with knowledge obtained from the recommendation model for interpretation.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.08847

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Consumer Health (1.00)
Information Technology (0.93)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)
Health & Medicine > Therapeutic Area > Endocrinology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Combinatorial Multi-armed Bandits: Arm Selection via Group Testing

Mukherjee, Arpan, Ubaru, Shashanka, Murugesan, Keerthiram, Shanmugam, Karthikeyan, Tajer, Ali

arXiv.org Machine LearningOct-14-2024

This paper considers the problem of combinatorial multi-armed bandits with semi-bandit feedback and a cardinality constraint on the super-arm size. Existing algorithms for solving this problem typically involve two key sub-routines: (1) a parameter estimation routine that sequentially estimates a set of base-arm parameters, and (2) a super-arm selection policy for selecting a subset of base arms deemed optimal based on these parameters. State-of-the-art algorithms assume access to an exact oracle for super-arm selection with unbounded computational power. At each instance, this oracle evaluates a list of score functions, the number of which grows as low as linearly and as high as exponentially with the number of arms. This can be prohibitive in the regime of a large number of arms. This paper introduces a novel realistic alternative to the perfect oracle. This algorithm uses a combination of group-testing for selecting the super arms and quantized Thompson sampling for parameter estimation. Under a general separability assumption on the reward function, the proposed algorithm reduces the complexity of the super-arm-selection oracle to be logarithmic in the number of base arms while achieving the same regret order as the state-of-the-art algorithms that use exact oracles. This translates to at least an exponential reduction in complexity compared to the oracle-based approaches.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Machine Learning

2410.10679

Country:

Asia (0.67)
North America > United States (0.28)
Europe > United Kingdom > Scotland (0.14)
North America > Canada > Quebec (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.54)

Add feedback

CTBench: A Comprehensive Benchmark for Evaluating Language Model Capabilities in Clinical Trial Design

Neehal, Nafis, Wang, Bowen, Debopadhaya, Shayom, Dan, Soham, Murugesan, Keerthiram, Anand, Vibha, Bennett, Kristin P.

arXiv.org Artificial IntelligenceJun-25-2024

CTBench is introduced as a benchmark to assess language models (LMs) in aiding clinical study design. Given study-specific metadata, CTBench evaluates AI models' ability to determine the baseline features of a clinical trial (CT), which include demographic and relevant features collected at the trial's start from all participants. These baseline features, typically presented in CT publications (often as Table 1), are crucial for characterizing study cohorts and validating results. Baseline features, including confounders and covariates, are also necessary for accurate treatment effect estimation in studies involving observational data. CTBench consists of two datasets: "CT-Repo," containing baseline features from 1,690 clinical trials sourced from clinicaltrials.gov, and "CT-Pub," a subset of 100 trials with more comprehensive baseline features gathered from relevant publications. Two LM-based evaluation methods are developed to compare the actual baseline feature lists against LM-generated responses. "ListMatch-LM" and "ListMatch-BERT" use GPT-4o and BERT scores (at various thresholds), respectively, for evaluation. To establish baseline results, advanced prompt engineering techniques using LLaMa3-70B-Instruct and GPT-4o in zero-shot and three-shot learning settings are applied to generate potential baseline features. The performance of GPT-4o as an evaluator is validated through human-in-the-loop evaluations on the CT-Pub dataset, where clinical experts confirm matches between actual and LM-generated features. The results highlight a promising direction with significant potential for improvement, positioning CTBench as a useful tool for advancing research on AI in CT design and potentially enhancing the efficacy and robustness of CTs.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2406.17888

Country: North America > United States > New York (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations

Achintalwar, Swapnaja, Garcia, Adriana Alvarado, Anaby-Tavor, Ateret, Baldini, Ioana, Berger, Sara E., Bhattacharjee, Bishwaranjan, Bouneffouf, Djallel, Chaudhury, Subhajit, Chen, Pin-Yu, Chiazor, Lamogha, Daly, Elizabeth M., DB, Kirushikesh, de Paula, Rogério Abreu, Dognin, Pierre, Farchi, Eitan, Ghosh, Soumya, Hind, Michael, Horesh, Raya, Kour, George, Lee, Ja Young, Madaan, Nishtha, Mehta, Sameep, Miehling, Erik, Murugesan, Keerthiram, Nagireddy, Manish, Padhi, Inkit, Piorkowski, David, Rawat, Ambrish, Raz, Orna, Sattigeri, Prasanna, Strobelt, Hendrik, Swaminathan, Sarathkrishna, Tillmann, Christoph, Trivedi, Aashka, Varshney, Kush R., Wei, Dennis, Witherspooon, Shalisha, Zalmanovici, Marcel

arXiv.org Artificial IntelligenceJun-13-2024

Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output to biased and toxic generations. Due to several limiting factors surrounding LLMs (training cost, API access, data availability, etc.), it may not always be feasible to impose direct safety constraints on a deployed model. Therefore, an efficient and reliable alternative is required. To this end, we present our ongoing efforts to create and deploy a library of detectors: compact and easy-to-build classification models that provide labels for various harms. In addition to the detectors themselves, we discuss a wide range of uses for these detector models - from acting as guardrails to enabling effective AI governance. We also deep dive into inherent challenges in their development and discuss future work aimed at making the detectors more reliable and broadening their scope.

detector, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2403.06009

Country:

Asia (1.00)
North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(2 more...)

Genre:

Overview (0.46)
Research Report (0.40)

Industry:

Information Technology (0.69)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

STARLING: Self-supervised Training of Text-based Reinforcement Learning Agent with Large Language Models

Basavatia, Shreyas, Murugesan, Keerthiram, Ratnakar, Shivam

arXiv.org Artificial IntelligenceJun-9-2024

Interactive fiction games have emerged as an important application to improve the generalization capabilities of language-based reinforcement learning (RL) agents. Existing environments for interactive fiction games are domain-specific or time-consuming to generate and do not train the RL agents to master a specific set of skills. In this work, we introduce an interactive environment for self-supervised RL, STARLING, for text-based games that bootstraps the text-based RL agents with automatically generated games (based on the seed set of game ideas) to boost the performance and generalization capabilities to reach a goal of the target environment. These games let the agent hone their skills on a predefined set of tasks. We create and test an environment with 100 games, generated using this automated framework that uses large language models (GPT-3) and an interactive fiction game engine (based on Inform7) to provide the user with the ability to generate more games under minimal human supervision. Experimental results based on both the human participants and baseline text-based RL agents reveal that current state-of-the-art text-based RL agents cannot use previously learned skills in new situations at the level humans can. These results enforce STARLING's potential to serve as a sandbox environment for further research in self-supervised text-based RL.

machine learning, natural language, reinforcement learning, (21 more...)

arXiv.org Artificial Intelligence

2406.05872

Country: Europe > Sweden (0.14)

Genre: Research Report (0.82)

Industry:

Education (1.00)
Leisure & Entertainment > Games > Computer Games (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Facilitating Human-LLM Collaboration through Factuality Scores and Source Attributions

Do, Hyo Jin, Ostrand, Rachel, Weisz, Justin D., Dugan, Casey, Sattigeri, Prasanna, Wei, Dennis, Murugesan, Keerthiram, Geyer, Werner

arXiv.org Artificial IntelligenceMay-30-2024

While humans increasingly rely on large language models (LLMs), they are susceptible to generating inaccurate or false information, also known as "hallucinations". Technical advancements have been made in algorithms that detect hallucinated content by assessing the factuality of the model's responses and attributing sections of those responses to specific source documents. However, there is limited research on how to effectively communicate this information to users in ways that will help them appropriately calibrate their trust toward LLMs. To address this issue, we conducted a scenario-based study (N=104) to systematically compare the impact of various design strategies for communicating factuality and source attribution on participants' ratings of trust, preferences, and ease in validating response accuracy. Our findings reveal that participants preferred a design in which phrases within a response were color-coded based on the computed factuality scores. Additionally, participants increased their trust ratings when relevant sections of the source material were highlighted or responses were annotated with reference numbers corresponding to those sources, compared to when they received no annotation in the source material. Our study offers practical design guidelines to facilitate human-LLM collaboration and it promotes a new human role to carefully evaluate and take responsibility for their use of LLM outputs.

artificial intelligence, large language model, natural language, (13 more...)

arXiv.org Artificial Intelligence

2405.20434

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)
Research Report > Experimental Study (0.93)

Industry: Media > News (0.93)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback