AITopics | Dutta, Abhishek

Collaborating Authors

Dutta, Abhishek

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Unveiling Reasoning Thresholds in Language Models: Scaling, Fine-Tuning, and Interpretability through Attention Maps

Hsiao, Yen-Che, Dutta, Abhishek

arXiv.org Artificial IntelligenceFeb-20-2025

This study investigates the in-context learning capabilities of various decoder-only transformer-based language models with different model sizes and training data, including GPT2, SmolLM2, OpenELM, TinyLlama, Stable LM, and Gemma 2. We identify a critical parameter threshold (~1.6 billion), beyond which reasoning performance improves significantly in tasks such as commonsense reasoning in multiple-choice question answering and deductive reasoning. Specifically, models above this threshold achieve better success rates in chain-of-thought (CoT) prompting for deductive reasoning tasks, especially those requiring longer reasoning chains, such as proof by contradiction and disjunction elimination. To address limitations in sub-threshold models, we demonstrate that fine-tuning with task-specific exemplars substantially enhances reasoning performance, enabling accurate CoT generation even without additional exemplars in the prompt for tasks with shorter reasoning chains. Finally, our analysis of attention maps reveals that models capable of generating correct CoTs exhibit higher token-level attention scores on subsequent correct tokens and the correct parts of speech, providing interpretability insights into reasoning processes. These findings collectively advance understanding of reasoning capabilities in decoder-only transformer-based models. The code is available at: https://github.com/AnnonymousForPapers/CoT_Reasoning_Test.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2502.1512

Country: North America > United States > Connecticut > Tolland County > Storrs (0.14)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Adaptive Reasoning and Acting in Medical Language Agents

Dutta, Abhishek, Hsiao, Yen-Che

arXiv.org Artificial IntelligenceOct-13-2024

This paper presents an innovative large language model (LLM) agent framework for enhancing diagnostic accuracy in simulated clinical environments using the AgentClinic benchmark. The proposed automatic correction enables doctor agents to iteratively refine their reasoning and actions following incorrect diagnoses, fostering improved decision-making over time. Experiments show that the implementation of the adaptive LLM-based doctor agents achieve correct diagnoses through dynamic interactions with simulated patients. The evaluations highlight the capacity of autonomous agents to adapt and improve in complex medical scenarios. Future enhancements will focus on refining the algorithm and expanding its applicability across a wider range of tasks and different large language models.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2410.1002

Country: North America > United States > Connecticut (0.29)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.70)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.69)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Efficient transformer with reinforced position embedding for language models

Hsiao, Yen-Che, Dutta, Abhishek

arXiv.org Artificial IntelligenceOct-6-2024

In this paper, we propose an efficient transformer architecture that uses reinforced positional embedding to obtain superior performance with half the number of encoder decoder layers. We demonstrate that concatenating positional encoding with trainable token embeddings, normalizing columns in the token embedding matrix, and using the normalized token embedding matrix as the value of the attention layer improve the training and validation loss and the training time in an encoder-decoder Transformer model for a Portuguese-English translation task with 10 epochs or 12 hours of training across 10 trials. Our method, with roughly a threefold parameter reduction compared to the baseline model, yields a mean training loss of 1.21, a mean validation loss of 1.51, and an average training time of 1352.27 Additionally, we evaluated our proposed architecture and the baseline across 14 diverse translation datasets from TensorFlow. The results indicate that our method consistently achieves lower or comparable training and validation losses, suggesting enhanced learning efficiency.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.04731

Country: North America > United States > Connecticut > Tolland County > Storrs (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Machine Learning: Algorithms, Models, and Applications

Sen, Jaydip, Mehtab, Sidra, Sen, Rajdeep, Dutta, Abhishek, Kherwa, Pooja, Ahmed, Saheel, Berry, Pranay, Khurana, Sahil, Singh, Sonali, Cadotte, David W. W, Anderson, David W., Ost, Kalum J., Akinbo, Racheal S., Daramola, Oladunni A., Lainjo, Bongs

arXiv.org Artificial IntelligenceJan-6-2022

Recent times are witnessing rapid development in machine learning algorithm systems, especially in reinforcement learning, natural language processing, computer and robot vision, image processing, speech, and emotional processing and understanding. In tune with the increasing importance and relevance of machine learning models, algorithms, and their applications, and with the emergence of more innovative uses cases of deep learning and artificial intelligence, the current volume presents a few innovative research works and their applications in real world, such as stock trading, medical and healthcare systems, and software automation. The chapters in the book illustrate how machine learning and deep learning algorithms and models are designed, optimized, and deployed. The volume will be useful for advanced graduate and doctoral students, researchers, faculty members of universities, practicing data scientists and data engineers, professionals, and consultants working on the broad areas of machine learning, deep learning, and artificial intelligence.

consumer health, diagnostic medicine, machine learning, (30 more...)

arXiv.org Artificial Intelligence

doi: 10.5772/intechopen.94615

2201.01943

Country:

Africa (1.00)
Asia > India (0.93)
Asia > China (0.92)
(4 more...)

Genre:

Summary/Review (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
(3 more...)

Industry:

Transportation (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
(15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback