AITopics | Das, Rocktim Jyoti

Collaborating Authors

Das, Rocktim Jyoti

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MALT: Improving Reasoning with Multi-Agent LLM Training

Motwani, Sumeet Ramesh, Smith, Chandler, Das, Rocktim Jyoti, Rybchuk, Markian, Torr, Philip H. S., Laptev, Ivan, Pizzati, Fabio, Clark, Ronald, de Witt, Christian Schroeder

arXiv.org Artificial IntelligenceDec-2-2024

Enabling effective collaboration among LLMs is a crucial step toward developing autonomous systems capable of solving complex problems. While LLMs are typically used as single-model generators, where humans critique and refine their outputs, the potential for jointly-trained collaborative models remains largely unexplored. Despite promising results in multi-agent communication and debate settings, little progress has been made in training models to work together on tasks. In this paper, we present a first step toward "Multi-agent LLM training" (MALT) on reasoning problems. Our approach employs a sequential multi-agent setup with heterogeneous LLMs assigned specialized roles: a generator, verifier, and refinement model iteratively solving problems. We propose a trajectory-expansion-based synthetic data generation process and a credit assignment strategy driven by joint outcome based rewards. This enables our post-training setup to utilize both positive and negative trajectories to autonomously improve each model's specialized capabilities as part of a joint sequential system. We evaluate our approach across MATH, GSM8k, and CQA, where MALT on Llama 3.1 8B models achieves relative improvements of 14.14%, 7.12%, and 9.40% respectively over the same baseline model. This demonstrates an early advance in multi-agent cooperative capabilities for performance on mathematical and common sense reasoning questions. More generally, our work provides a concrete direction for research around multi-agent LLM training approaches.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2412.01928

Country: Asia (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation

Singh, Harsh, Das, Rocktim Jyoti, Han, Mingfei, Nakov, Preslav, Laptev, Ivan

arXiv.org Artificial IntelligenceNov-26-2024

Large Language Models (LLMs) have demonstrated remarkable planning abilities across various domains, including robotics manipulation and navigation. While recent efforts in robotics have leveraged LLMs both for high-level and low-level planning, these approaches often face significant challenges, such as hallucinations in long-horizon tasks and limited adaptability due to the generation of plans in a single pass without real-time feedback. To address these limitations, we propose a novel multi-agent LLM framework, Multi-Agent Large Language Model for Manipulation (MALMM) that distributes high-level planning and low-level control code generation across specialized LLM agents, supervised by an additional agent that dynamically manages transitions. By incorporating observations from the environment after each step, our framework effectively handles intermediate failures and enables adaptive re-planning. Unlike existing methods, our approach does not rely on pre-trained skill policies or in-context learning examples and generalizes to a variety of new tasks. We evaluate our approach on nine RLBench tasks, including long-horizon tasks, and demonstrate its ability to solve robotics manipulation in a zero-shot setting, thereby overcoming key limitations of existing LLM-based manipulation methods.

container, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.17636

Genre: Workflow (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

MediTOD: An English Dialogue Dataset for Medical History Taking with Comprehensive Annotations

Saley, Vishal Vivek, Saha, Goonjan, Das, Rocktim Jyoti, Raghu, Dinesh, Mausam, null

arXiv.org Artificial IntelligenceOct-18-2024

Medical task-oriented dialogue systems can assist doctors by collecting patient medical history, aiding in diagnosis, or guiding treatment selection, thereby reducing doctor burnout and expanding access to medical services. However, doctor-patient dialogue datasets are not readily available, primarily due to privacy regulations. Moreover, existing datasets lack comprehensive annotations involving medical slots and their different attributes, such as symptoms and their onset, progression, and severity. These comprehensive annotations are crucial for accurate diagnosis. Finally, most existing datasets are non-English, limiting their utility for the larger research community. In response, we introduce MediTOD, a new dataset of doctor-patient dialogues in English for the medical history-taking task. Collaborating with doctors, we devise a questionnaire-based labeling scheme tailored to the medical domain. Then, medical professionals create the dataset with high-quality comprehensive annotations, capturing medical slots and their attributes. We establish benchmarks in supervised and few-shot settings on MediTOD for natural language understanding, policy learning, and natural language generation subtasks, evaluating models from both TOD and biomedical domains. We make MediTOD publicly available for future research.

large language model, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2410.14204

Country:

Europe (0.28)
Asia > India (0.28)

Genre: Research Report (0.63)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Consumer Health (0.93)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Synergizing In-context Learning with Hints for End-to-end Task-oriented Dialog Systems

Saley, Vishal Vivek, Das, Rocktim Jyoti, Raghu, Dinesh, Mausam, null

arXiv.org Artificial IntelligenceJul-3-2024

End-to-end Task-Oriented Dialog (TOD) systems typically require extensive training datasets to perform well. In contrast, large language model (LLM) based TOD systems can excel even with limited data due to their ability to learn tasks through in-context exemplars. However, these models lack alignment with the style of responses in training data and often generate comprehensive responses, making it difficult for users to grasp the information quickly. In response, we propose SyncTOD that synergizes LLMs with task-specific hints to improve alignment in low-data settings. SyncTOD employs small auxiliary models to provide hints and select exemplars for in-context prompts. With ChatGPT, SyncTOD achieves superior performance compared to LLM-based baselines and SoTA models in low-data settings, while retaining competitive performance in full-data settings.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2405.15585

Country:

Asia > China > Hong Kong (0.14)
Asia > India > NCT (0.14)

Genre: Research Report (0.64)

Industry:

Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.94)
Consumer Products & Services > Restaurants (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Can a Multichoice Dataset be Repurposed for Extractive Question Answering?

Lynn, Teresa, Altakrori, Malik H., Magdy, Samar Mohamed, Das, Rocktim Jyoti, Lyu, Chenyang, Nasr, Mohamed, Samih, Younes, Aji, Alham Fikri, Nakov, Preslav, Godbole, Shantanu, Roukos, Salim, Florian, Radu, Habash, Nizar

arXiv.org Artificial IntelligenceApr-26-2024

The rapid evolution of Natural Language Processing (NLP) has favored major languages such as English, leaving a significant gap for many others due to limited resources. This is especially evident in the context of data annotation, a task whose importance cannot be underestimated, but which is time-consuming and costly. Thus, any dataset for resource-poor languages is precious, in particular when it is task-specific. Here, we explore the feasibility of repurposing existing datasets for a new NLP task: we repurposed the Belebele dataset (Bandarkar et al., 2023), which was designed for multiple-choice question answering (MCQA), to enable extractive QA (EQA) in the style of machine reading comprehension. We present annotation guidelines and a parallel EQA dataset for English and Modern Standard Arabic (MSA). We also present QA evaluation results for several monolingual and cross-lingual QA pairs including English, MSA, and five Arabic dialects. Our aim is to enable others to adapt our approach for the 120+ other language variants in Belebele, many of which are deemed under-resourced. We also conduct a thorough analysis and share our insights from the process, which we hope will contribute to a deeper understanding of the challenges and the opportunities associated with task reformulation in NLP research.

artificial intelligence, natural language, question answering, (19 more...)

arXiv.org Artificial Intelligence

2404.17342

Country:

North America > United States (1.00)
Europe (1.00)
Asia > Pakistan (0.67)
Africa > Middle East > Libya (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (1.00)
Education (1.00)
Leisure & Entertainment (0.67)
Government > Regional Government > Asia Government > Pakistan Government (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)

Add feedback

EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models

Das, Rocktim Jyoti, Hristov, Simeon Emilov, Li, Haonan, Dimitrov, Dimitar Iliyanov, Koychev, Ivan, Nakov, Preslav

arXiv.org Artificial IntelligenceMar-15-2024

We introduce EXAMS-V, a new challenging multi-discipline multimodal multilingual exam benchmark for evaluating vision language models. It consists of 20,932 multiple-choice questions across 20 school disciplines covering natural science, social science, and other miscellaneous studies, e.g., religion, fine arts, business, etc. EXAMS-V includes a variety of multimodal features such as text, images, tables, figures, diagrams, maps, scientific symbols, and equations. The questions come in 11 languages from 7 language families. Unlike existing benchmarks, EXAMS-V is uniquely curated by gathering school exam questions from various countries, with a variety of education systems. This distinctive approach calls for intricate reasoning across diverse languages and relies on region-specific knowledge. Solving the problems in the dataset requires advanced perception and joint reasoning over the text and the visual content of the image. Our evaluation results demonstrate that this is a challenging dataset, which is difficult even for advanced vision-text models such as GPT-4V and Gemini; this underscores the inherent complexity of the dataset and its significance as a future benchmark.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2403.10378

Country: Asia (0.68)

Genre: Research Report > New Finding (0.48)

Industry:

Education (1.00)
Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Factuality of Large Language Models in the Year 2024

Wang, Yuxia, Wang, Minghan, Manzoor, Muhammad Arslan, Liu, Fei, Georgiev, Georgi, Das, Rocktim Jyoti, Nakov, Preslav

arXiv.org Artificial IntelligenceFeb-9-2024

Large language models (LLMs), especially when instruction-tuned for chat, have become part of our daily lives, freeing people from the process of searching, extracting, and integrating information from multiple sources by offering a straightforward answer to a variety of questions in a single place. Unfortunately, in many cases, LLM responses are factually incorrect, which limits their applicability in real-world scenarios. As a result, research on evaluating and improving the factuality of LLMs has attracted a lot of research attention recently. In this survey, we critically analyze existing work with the aim to identify the major challenges and their associated causes, pointing out to potential solutions for improving the factuality of LLMs, and analyzing the obstacles to automated factuality evaluation for open-ended text generation. We further offer an outlook on where future research should go.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2402.0242

Genre:

Overview (1.00)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Exploring Distributional Shifts in Large Language Models for Code Analysis

Arakelyan, Shushan, Das, Rocktim Jyoti, Mao, Yi, Ren, Xiang

arXiv.org Artificial IntelligenceDec-5-2023

We systematically study how three large language models with code capabilities - CodeT5, Codex, and ChatGPT - generalize to out-of-domain data. We consider two fundamental applications - code summarization, and code generation. We split data into domains following its natural boundaries - by an organization, by a project, and by a module within the software project. We establish that samples from each new domain present all the models with a significant challenge of distribution shift. We study how established methods adapt models to better generalize to new domains. Our experiments show that while multitask learning alone is a reasonable baseline, combining it with few-shot finetuning on examples retrieved from training data can achieve very strong performance. Moreover, this solution can outperform direct finetuning for very low-data scenarios. Finally, we consider variations of this approach to create a more broadly applicable method to adapt to multiple domains at once. We find that for code generation, a model adapted to multiple domains simultaneously performs on par with those adapted to a single domain

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2303.09128

Country:

Europe (1.00)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Government (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Beyond Size: How Gradients Shape Pruning Decisions in Large Language Models

Das, Rocktim Jyoti, Ma, Liqun, Shen, Zhiqiang

arXiv.org Artificial IntelligenceNov-8-2023

Large Language Models (LLMs) with a billion or more parameters are prime targets for network pruning, which aims to reduce a portion of the network weights without compromising performance. Prior approaches such as Weights Magnitude, SparseGPT, and Wanda, either concentrated solely on weights or integrated weights with activations for sparsity. However, they overlooked the informative gradients derived from pretrained large language models. In this paper, we present a novel sparsity-centric pruning method for pretrained LLMs, termed Gradient-based Language Model Pruner (GBLM-Pruner). GBLM-Pruner leverages the first-order term of the Taylor expansion, operating in a training-free manner by harnessing properly normalized gradients from a few calibration samples to determine the importance pruning score, and substantially outperforms competitive counterparts like SparseGPT and Wanda in multiple benchmarks. Intriguing, after incorporating gradients, the unstructured pruning method tends to reveal some structural patterns post-pruning, which mirrors the geometric interdependence inherent in the LLMs' parameter structure. Additionally, GBLM-Pruner functions without any subsequent retraining or weight updates to maintain its simplicity as other counterparts. Extensive evaluations on LLaMA-1 and LLaMA-2 across various language benchmarks and perplexity show that GBLM-Pruner surpasses magnitude pruning, Wanda (weights+activations) and SparseGPT (weights+activations+weight update) by significant margins. Our code and models are available at https://github.com/RocktimJyotiDas/GBLM-Pruner.

artificial intelligence, large language model, natural language, (2 more...)

arXiv.org Artificial Intelligence

2311.04902

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

DKAF: KB Arbitration for Learning Task-Oriented Dialog Systems with Dialog-KB Inconsistencies

Saley, Vishal Vivek, Das, Rocktim Jyoti, Raghu, Dinesh, Mausam, null

arXiv.org Artificial IntelligenceMay-26-2023

Task-oriented dialog (TOD) agents often ground their responses on external knowledge bases (KBs). These KBs can be dynamic and may be updated frequently. Existing approaches for learning TOD agents assume the KB snapshot contemporary to each individual dialog is available during training. However, in real-world scenarios, only the latest KB snapshot is available during training and as a result, the train dialogs may contain facts conflicting with the latest KB. These dialog-KB inconsistencies in the training data may potentially confuse the TOD agent learning algorithm. In this work, we define the novel problem of learning a TOD agent with dialog-KB inconsistencies in the training data. We propose a Dialog-KB Arbitration Framework (DKAF) which reduces the dialog-KB inconsistencies by predicting the contemporary KB snapshot for each train dialog. These predicted KB snapshots are then used for training downstream TOD agents. As there are no existing datasets with dialog-KB inconsistencies, we systematically introduce inconsistencies in two publicly available dialog datasets. We show that TOD agents trained with DKAF perform better than existing baselines on both these datasets

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.16697

Country:

Asia > China > Hong Kong (0.14)
Asia > India > NCT (0.14)
North America > United States > Pennsylvania (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Industry:

Law > Alternative Dispute Resolution (0.61)
Consumer Products & Services > Restaurants (0.46)
Consumer Products & Services > Hotels (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.83)

Add feedback