AITopics | Kumar, Shanu

Collaborating Authors

Kumar, Shanu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

READ: Reinforcement-based Adversarial Learning for Text Classification with Limited Labeled Data

Sharma, Rohit, Kumar, Shanu, Kumar, Avinash

arXiv.org Artificial IntelligenceJan-14-2025

Pre-trained transformer models such as BERT have shown massive gains across many text classification tasks. However, these models usually need enormous labeled data to achieve impressive performances. Obtaining labeled data is often expensive and time-consuming, whereas collecting unlabeled data using some heuristics is relatively much cheaper for any task. Therefore, this paper proposes a method that encapsulates reinforcement learning-based text generation and semi-supervised adversarial learning approaches in a novel way to improve the model's performance. Our method READ, Reinforcement-based Adversarial learning, utilizes an unlabeled dataset to generate diverse synthetic text through reinforcement learning, improving the model's generalization capability using adversarial learning. Our experimental results show that READ outperforms the existing state-of-art methods on multiple datasets.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2501.08035

Country:

Asia (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Navigating the Cultural Kaleidoscope: A Hitchhiker's Guide to Sensitivity in Large Language Models

Banerjee, Somnath, Layek, Sayan, Shrawgi, Hari, Mandal, Rajarshi, Halder, Avik, Kumar, Shanu, Basu, Sagnik, Agrawal, Parag, Hazra, Rima, Mukherjee, Animesh

arXiv.org Artificial IntelligenceDec-23-2024

As LLMs are increasingly deployed in global applications, the importance of cultural sensitivity becomes paramount, ensuring that users from diverse backgrounds feel respected and understood. Cultural harm can arise when these models fail to align with specific cultural norms, resulting in misrepresentations or violations of cultural values. This work addresses the challenges of ensuring cultural sensitivity in LLMs, especially in small-parameter models that often lack the extensive training data needed to capture global cultural nuances. We present two key contributions: (1) A cultural harm test dataset, created to assess model outputs across different cultural contexts through scenarios that expose potential cultural insensitivities, and (2) A culturally aligned preference dataset, aimed at restoring cultural sensitivity through fine-tuning based on feedback from diverse annotators. These datasets facilitate the evaluation and enhancement of LLMs, ensuring their ethical and safe deployment across different cultural landscapes. Our results show that integrating culturally aligned feedback leads to a marked improvement in model behavior, significantly reducing the likelihood of generating culturally insensitive or harmful content. Ultimately, this work paves the way for more inclusive and respectful AI systems, fostering a future where LLMs can safely and ethically navigate the complexities of diverse cultural landscapes.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2410.1288

Country:

Asia > Japan (1.00)
Asia > China (0.93)
Europe > Spain (0.68)
North America > United States > California (0.27)

Genre:

Research Report > New Finding (1.00)
Personal > Interview (0.93)

Industry:

Media > News (1.00)
Law > Criminal Law (1.00)
Law > Civil Rights & Constitutional Law (1.00)
(8 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Socio-Culturally Aware Evaluation Framework for LLM-Based Content Moderation

Kumar, Shanu, Kholkar, Gauri, Mendke, Saish, Sadana, Anubhav, Agrawal, Parag, Dandapat, Sandipan

arXiv.org Artificial IntelligenceDec-18-2024

With the growth of social media and large language models, content moderation has become crucial. Many existing datasets lack adequate representation of different groups, resulting in unreliable assessments. To tackle this, we propose a socio-culturally aware evaluation framework for LLM-driven content moderation and introduce a scalable method for creating diverse datasets using persona-based generation. Our analysis reveals that these datasets provide broader perspectives and pose greater challenges for LLMs than diversity-focused generation methods without personas. This challenge is especially pronounced in smaller LLMs, emphasizing the difficulties they encounter in moderating such diverse content.

large language model, machine learning, persona, (20 more...)

arXiv.org Artificial Intelligence

2412.13578

Country:

North America > United States (1.00)
Europe (1.00)
Africa (0.68)
Asia > Middle East > Iraq (0.28)

Genre: Research Report (0.63)

Industry:

Media > News (1.00)
Leisure & Entertainment (1.00)
Law > Civil Rights & Constitutional Law (1.00)
(11 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Enhancing Zero-shot Chain of Thought Prompting via Uncertainty-Guided Strategy Selection

Kumar, Shanu, Mendke, Saish, Rahman, Karody Lubna Abdul, Kurasa, Santosh, Agrawal, Parag, Dandapat, Sandipan

arXiv.org Artificial IntelligenceDec-6-2024

Chain-of-thought (CoT) prompting has significantly enhanced the capability of large language models (LLMs) by structuring their reasoning processes. However, existing methods face critical limitations: handcrafted demonstrations require extensive human expertise, while trigger phrases are prone to inaccuracies. In this paper, we propose the Zero-shot Uncertainty-based Selection (ZEUS) method, a novel approach that improves CoT prompting by utilizing uncertainty estimates to select effective demonstrations without needing access to model parameters. Unlike traditional methods, ZEUS offers high sensitivity in distinguishing between helpful and ineffective questions, ensuring more precise and reliable selection. Our extensive evaluation shows that ZEUS consistently outperforms existing CoT strategies across four challenging reasoning benchmarks, demonstrating its robustness and scalability.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.00353

Country:

Asia (1.00)
North America > Mexico (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Add feedback

SCULPT: Systematic Tuning of Long Prompts

Kumar, Shanu, Venkata, Akhila Yesantarao, Khandelwal, Shubhanshu, Santra, Bishal, Agrawal, Parag, Gupta, Manish

arXiv.org Artificial IntelligenceOct-28-2024

As large language models become increasingly central to solving complex tasks, the challenge of optimizing long, unstructured prompts has become critical. Existing optimization techniques often struggle to effectively handle such prompts, leading to suboptimal performance. We introduce SCULPT (Systematic Tuning of Long Prompts), a novel framework that systematically refines long prompts by structuring them hierarchically and applying an iterative actor-critic mechanism. To enhance robustness and generalizability, SCULPT utilizes two complementary feedback mechanisms: Preliminary Assessment, which assesses the prompt's structure before execution, and Error Assessment, which diagnoses and addresses errors post-execution. By aggregating feedback from these mechanisms, SCULPT avoids overfitting and ensures consistent improvements in performance. Our experimental results demonstrate significant accuracy gains and enhanced robustness, particularly in handling erroneous and misaligned prompts. SCULPT consistently outperforms existing approaches, establishing itself as a scalable solution for optimizing long prompts across diverse and real-world tasks.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.20788

Country:

Asia (0.68)
North America > United States (0.67)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

SafeInfer: Context Adaptive Decoding Time Safety Alignment for Large Language Models

Banerjee, Somnath, Tripathy, Soham, Layek, Sayan, Kumar, Shanu, Mukherjee, Animesh, Hazra, Rima

arXiv.org Artificial IntelligenceJun-18-2024

Safety-aligned language models often exhibit fragile and imbalanced safety mechanisms, increasing the likelihood of generating unsafe content. In addition, incorporating new knowledge through editing techniques to language models can further compromise safety. To address these issues, we propose SafeInfer, a context-adaptive, decoding-time safety alignment strategy for generating safe responses to user queries. SafeInfer comprises two phases: the safety amplification phase, which employs safe demonstration examples to adjust the model's hidden states and increase the likelihood of safer outputs, and the safety-guided decoding phase, which influences token selection based on safety-optimized distributions, ensuring the generated content complies with ethical guidelines. Further, we present HarmEval, a novel benchmark for extensive safety evaluations, designed to address potential misuse scenarios in accordance with the policies of leading AI tech giants.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2406.12274

Country:

North America > Canada (0.14)
Asia > India (0.14)
North America > United States (0.14)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Health & Medicine (1.00)
Government (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

DiTTO: A Feature Representation Imitation Approach for Improving Cross-Lingual Transfer

Kumar, Shanu, Soujanya, Abbaraju, Dandapat, Sandipan, Sitaram, Sunayana, Choudhury, Monojit

arXiv.org Artificial IntelligenceMar-4-2023

Zero-shot cross-lingual transfer is promising, however has been shown to be sub-optimal, with inferior transfer performance across low-resource languages. In this work, we envision languages as domains for improving zero-shot transfer by jointly reducing the feature incongruity between the source and the target language and increasing the generalization capabilities of pre-trained multilingual transformers. We show that our approach, DiTTO, significantly outperforms the standard zero-shot fine-tuning method on multiple datasets across all languages using solely unlabeled instances in the target language. Empirical results show that jointly reducing feature incongruity for multiple target languages is vital for successful cross-lingual transfer. Moreover, our model enables better cross-lingual transfer than standard fine-tuning methods, even in the few-shot setting.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2303.02357

Country:

Asia (0.93)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.77)

Add feedback

Attending to Discriminative Certainty for Domain Adaptation

Kurmi, Vinod Kumar, Kumar, Shanu, Namboodiri, Vinay P

arXiv.org Machine LearningJun-8-2019

In this paper, we aim to solve for unsupervised domain adaptation of classifiers where we have access to label information for the source domain while these are not available for a target domain. While various methods have been proposed for solving these including adversarial discriminator based methods, most approaches have focused on the entire image based domain adaptation. In an image, there would be regions that can be adapted better, for instance, the foreground object may be similar in nature. To obtain such regions, we propose methods that consider the probabilistic certainty estimate of various regions and specify focus on these during classification for adaptation. We observe that just by incorporating the probabilistic certainty of the discriminator while training the classifier, we are able to obtain state of the art results on various datasets as compared against all the recent methods. We provide a thorough empirical analysis of the method by providing ablation analysis, statistical significance test, and visualization of the attention maps and t-SNE embeddings. These evaluations convincingly demonstrate the effectiveness of the proposed approach.

artificial intelligence, discriminator, neural network, (19 more...)

arXiv.org Machine Learning

1906.03502

Country: Asia > India (0.14)

Genre: Research Report > Experimental Study (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision (0.96)
Information Technology > Artificial Intelligence > Natural Language (0.93)
(3 more...)

Add feedback

Adversarial Adaptation of Scene Graph Models for Understanding Civic Issues

Kumar, Shanu, Atreja, Shubham, Singh, Anjali, Jain, Mohit

arXiv.org Artificial IntelligenceJan-29-2019

Citizen engagement and technology usage are two emerging trends driven by smart city initiatives. Governments around the world are adopting technology for faster resolution of civic issues. Typically, citizens report issues, such as broken roads, garbage dumps, etc. through web portals and mobile apps, in order for the government authorities to take appropriate actions. Several mediums -- text, image, audio, video -- are used to report these issues. Through a user study with 13 citizens and 3 authorities, we found that image is the most preferred medium to report civic issues. However, analyzing civic issue related images is challenging for the authorities as it requires manual effort. Moreover, previous works have been limited to identifying a specific set of issues from images. In this work, given an image, we propose to generate a Civic Issue Graph consisting of a set of objects and the semantic relations between them, which are representative of the underlying civic issue. We also release two multi-modal (text and images) datasets, that can help in further analysis of civic issues from images. We present a novel approach for adversarial training of existing scene graph models that enables the use of scene graphs for new applications in the absence of any labelled training data. We conduct several experiments to analyze the efficacy of our approach, and using human evaluation, we establish the appropriateness of our model at representing different civic issues.

civic issue, deep learning, neural network, (20 more...)

arXiv.org Artificial Intelligence

1901.10124

Country:

North America > United States (0.70)
Asia (0.46)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > Experimental Study (0.67)
Research Report > New Finding (0.46)

Industry: Government (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback