AITopics | Sharma, Kartik

Collaborating Authors

Sharma, Kartik

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Thousand Words or An Image: Studying the Influence of Persona Modality in Multimodal LLMs

Broomfield, Julius, Sharma, Kartik, Kumar, Srijan

arXiv.org Artificial IntelligenceFeb-27-2025

Large language models (LLMs) have recently demonstrated remarkable advancements in embodying diverse personas, enhancing their effectiveness as conversational agents and virtual assistants. Consequently, LLMs have made significant strides in processing and integrating multimodal information. However, even though human personas can be expressed in both text and image, the extent to which the modality of a persona impacts the embodiment by the LLM remains largely unexplored. In this paper, we investigate how do different modalities influence the expressiveness of personas in multimodal LLMs. To this end, we create a novel modality-parallel dataset of 40 diverse personas varying in age, gender, occupation, and location. This consists of four modalities to equivalently represent a persona: image-only, text-only, a combination of image and small text, and typographical images, where text is visually stylized to convey persona-related attributes. We then create a systematic evaluation framework with 60 questions and corresponding metrics to assess how well LLMs embody each persona across its attributes and scenarios. Comprehensive experiments on $5$ multimodal LLMs show that personas represented by detailed text show more linguistic habits, while typographical images often show more consistency with the persona. Our results reveal that LLMs often overlook persona-specific details conveyed through images, highlighting underlying limitations and paving the way for future research to bridge this gap. We release the data and code at https://github.com/claws-lab/persona-modality .

large language model, machine learning, persona, (19 more...)

arXiv.org Artificial Intelligence

2502.20504

Country:

Europe (1.00)
North America > Canada (0.28)
Asia > China (0.28)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.42)

Industry:

Banking & Finance (0.94)
Health & Medicine (0.93)
Transportation (0.68)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Personalized Layer Selection for Graph Neural Networks

Sharma, Kartik, Mohan, Vineeth Rakesh, Dou, Yingtong, Kumar, Srijan, Das, Mahashweta

arXiv.org Artificial IntelligenceJan-24-2025

Graph Neural Networks (GNNs) combine node attributes over a fixed granularity of the local graph structure around a node to predict its label. However, different nodes may relate to a node-level property with a different granularity of its local neighborhood, and using the same level of smoothing for all nodes can be detrimental to their classification. In this work, we challenge the common fact that a single GNN layer can classify all nodes of a graph by training GNNs with a distinct personalized layer for each node. Inspired by metric learning, we propose a novel algorithm, MetSelect1, to select the optimal representation layer to classify each node. In particular, we identify a prototype representation of each class in a transformed GNN layer and then, classify using the layer where the distance is smallest to a class prototype after normalizing with that layer's variance. Results on 10 datasets and 3 different GNNs show that we significantly improve the node classification accuracy of GNNs in a plug-and-play manner. We also find that using variable layers for prediction enables GNNs to be deeper and more robust to poisoning attacks. We hope this work can inspire future works to learn more adaptive and personalized graph representations.

artificial intelligence, machine learning, node, (15 more...)

arXiv.org Artificial Intelligence

2501.14964

Country: North America > United States (0.29)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

OG-RAG: Ontology-Grounded Retrieval-Augmented Generation For Large Language Models

Sharma, Kartik, Kumar, Peeyush, Li, Yunqing

arXiv.org Artificial IntelligenceDec-11-2024

This paper presents OG-RAG, an Ontology-Grounded Retrieval Augmented Generation method designed to enhance LLM-generated responses by anchoring retrieval processes in domain-specific ontologies. While LLMs are widely used for tasks like question answering and search, they struggle to adapt to specialized knowledge, such as industrial workflows or knowledge work, without expensive fine-tuning or sub-optimal retrieval methods. Existing retrieval-augmented models, such as RAG, offer improvements but fail to account for structured domain knowledge, leading to suboptimal context generation. Ontologies, which conceptually organize domain knowledge by defining entities and their interrelationships, offer a structured representation to address this gap. OG-RAG constructs a hypergraph representation of domain documents, where each hyperedge encapsulates clusters of factual knowledge grounded using domain-specific ontology. An optimization algorithm then retrieves the minimal set of hyperedges that constructs a precise, conceptually grounded context for the LLM. This method enables efficient retrieval while preserving the complex relationships between entities. OG-RAG applies to domains where fact-based reasoning is essential, particularly in tasks that require workflows or decision-making steps to follow predefined rules and procedures. These include industrial workflows in healthcare, legal, and agricultural sectors, as well as knowledge-driven tasks such as news journalism, investigative research, consulting and more. Our evaluations demonstrate that OG-RAG increases the recall of accurate facts by 55% and improves response correctness by 40% across four different LLMs. Additionally, OG-RAG enables 30% faster attribution of responses to context and boosts fact-based reasoning accuracy by 27% compared to baseline methods.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2412.15235

Country:

North America (0.28)
Asia > India (0.28)

Genre:

Workflow (0.89)
Research Report > New Finding (0.46)

Industry:

Health & Medicine (1.00)
Materials > Chemicals > Agricultural Chemicals (0.96)
Food & Agriculture > Agriculture > Pest Control (0.71)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

$\textit{Who Speaks Matters}$: Analysing the Influence of the Speaker's Ethnicity on Hate Classification

Malik, Ananya, Sharma, Kartik, Ng, Lynnette Hui Xian, Bhatt, Shaily

arXiv.org Artificial IntelligenceOct-27-2024

Large Language Models (LLMs) offer a lucrative promise for scalable content moderation, including hate speech detection. However, they are also known to be brittle and biased against marginalised communities and dialects. This requires their applications to high-stakes tasks like hate speech detection to be critically scrutinized. In this work, we investigate the robustness of hate speech classification using LLMs, particularly when explicit and implicit markers of the speaker's ethnicity are injected into the input. For the explicit markers, we inject a phrase that mentions the speaker's identity. For the implicit markers, we inject dialectal features. By analysing how frequently model outputs flip in the presence of these markers, we reveal varying degrees of brittleness across 4 popular LLMs and 5 ethnicities. We find that the presence of implicit dialect markers in inputs causes model outputs to flip more than the presence of explicit markers. Further, the percentage of flips varies across ethnicities. Finally, we find that larger models are more robust. Our findings indicate the need for exercising caution in deploying LLMs for high-stakes tasks like hate speech detection.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.2049

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)

Add feedback

Mysterious Projections: Multimodal LLMs Gain Domain-Specific Visual Capabilities Without Richer Cross-Modal Projections

Verma, Gaurav, Choi, Minje, Sharma, Kartik, Watson-Daniels, Jamelle, Oh, Sejoon, Kumar, Srijan

arXiv.org Artificial IntelligenceFeb-26-2024

Multimodal large language models (MLLMs) like LLaVA and GPT-4(V) enable general-purpose conversations about images with the language modality. As off-the-shelf MLLMs may have limited capabilities on images from domains like dermatology and agriculture, they must be fine-tuned to unlock domain-specific applications. The prevalent architecture of current open-source MLLMs comprises two major modules: an image-language (cross-modal) projection network and a large language model. It is desirable to understand the roles of these two modules in modeling domain-specific visual attributes to inform the design of future models and streamline the interpretability efforts on the current models. To this end, via experiments on 4 datasets and under 2 fine-tuning settings, we find that as the MLLM is fine-tuned, it indeed gains domain-specific visual capabilities, but the updates do not lead to the projection extracting relevant domain-specific visual attributes. Our results indicate that the domain-specific visual attributes are modeled by the LLM, even when only the projection is fine-tuned. Through this study, we offer a potential reinterpretation of the role of cross-modal projections in MLLM architectures. Projection webpage: https://claws-lab.github.io/projection-in-MLLMs/

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2402.16832

Genre: Research Report > New Finding (0.89)

Industry: Health & Medicine > Therapeutic Area > Dermatology (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Survey on Explainability of Graph Neural Networks

Kakkad, Jaykumar, Jannu, Jaspal, Sharma, Kartik, Aggarwal, Charu, Medya, Sourav

arXiv.org Artificial IntelligenceJun-2-2023

Graph neural networks (GNNs) are powerful graph-based deep-learning models that have gained significant attention and demonstrated remarkable performance in various domains, including natural language processing, drug discovery, and recommendation systems. However, combining feature information and combinatorial graph structures has led to complex non-linear GNN models. Consequently, this has increased the challenges of understanding the workings of GNNs and the underlying reasons behind their predictions. To address this, numerous explainability methods have been proposed to shed light on the inner mechanism of the GNNs. Explainable GNNs improve their security and enhance trust in their recommendations. This survey aims to provide a comprehensive overview of the existing explainability techniques for GNNs. We create a novel taxonomy and hierarchy to categorize these methods based on their objective and methodology. We also discuss the strengths, limitations, and application scenarios of each category. Furthermore, we highlight the key evaluation metrics and datasets commonly used to assess the explainability of GNNs. This survey aims to assist researchers and practitioners in understanding the existing landscape of explainability methods, identifying gaps, and fostering further advancements in interpretable graph-based machine learning.

explanation, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2306.01958

Country:

Europe (0.67)
North America > United States > Illinois (0.14)

Genre: Overview (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

NoisyTwins: Class-Consistent and Diverse Image Generation through StyleGANs

Rangwani, Harsh, Bansal, Lavish, Sharma, Kartik, Karmali, Tejan, Jampani, Varun, Babu, R. Venkatesh

arXiv.org Artificial IntelligenceApr-12-2023

StyleGANs are at the forefront of controllable image generation as they produce a latent space that is semantically disentangled, making it suitable for image editing and manipulation. However, the performance of StyleGANs severely degrades when trained via class-conditioning on large-scale long-tailed datasets. We find that one reason for degradation is the collapse of latents for each class in the $\mathcal{W}$ latent space. With NoisyTwins, we first introduce an effective and inexpensive augmentation strategy for class embeddings, which then decorrelates the latents based on self-supervision in the $\mathcal{W}$ space. This decorrelation mitigates collapse, ensuring that our method preserves intra-class diversity with class-consistency in image generation. We show the effectiveness of our approach on large-scale real-world long-tailed datasets of ImageNet-LT and iNaturalist 2019, where our method outperforms other methods by $\sim 19\%$ on FID, establishing a new state-of-the-art.

artificial intelligence, machine learning, noisytwin, (16 more...)

arXiv.org Artificial Intelligence

2304.05866

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Representation Learning in Continuous-Time Dynamic Signed Networks

Sharma, Kartik, Raghavendra, Mohit, Lee, Yeon Chang, M, Anand Kumar, Kumar, Srijan

arXiv.org Artificial IntelligenceFeb-5-2023

Signed networks allow us to model conflicting relationships and interactions, such as friend/enemy and support/oppose. These signed interactions happen in real-time. Modeling such dynamics of signed networks is crucial to understanding the evolution of polarization in the network and enabling effective prediction of the signed structure (i.e., link signs and signed weights) in the future. However, existing works have modeled either (static) signed networks or dynamic (unsigned) networks but not dynamic signed networks. Since both sign and dynamics inform the graph structure in different ways, it is non-trivial to model how to combine the two features. In this work, we propose a new Graph Neural Network (GNN)-based approach to model dynamic signed networks, named SEMBA: Signed link's Evolution using Memory modules and Balanced Aggregation. Here, the idea is to incorporate the signs of temporal interactions using separate modules guided by balance theory and to evolve the embeddings from a higher-order neighborhood. Experiments on 4 real-world datasets and 4 different tasks demonstrate that SEMBA consistently and significantly outperforms the baselines by up to $80\%$ on the tasks of predicting signs of future links while matching the state-of-the-art performance on predicting the existence of these links in the future. We find that this improvement is due specifically to the superior performance of SEMBA on the minority negative class.

artificial intelligence, machine learning, node, (20 more...)

arXiv.org Artificial Intelligence

2207.03408

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

A Survey of Graph Neural Networks for Social Recommender Systems

Sharma, Kartik, Lee, Yeon-Chang, Nambi, Sivagami, Salian, Aditya, Shah, Shlok, Kim, Sang-Wook, Kumar, Srijan

arXiv.org Artificial IntelligenceDec-12-2022

Exploiting social relations in recommendation works well because of the effects of social homophily [61] and social influence [60]: (1) social homophily indicates that a user tends to connect herself to other users with similar attributes and preferences, and (2) social influence indicates that users with direct or indirect relations tend to influence each other to make themselves become more similar. Accordingly, SocialRS can effectively mitigate the data sparsity problem by exploiting social neighbors to capture the preferences of a sparsely interacting user. Literature has shown that SocialRS can be applied successfully in various recommendation domains (e.g., product [101, 103], music [116-118], location [39, 72, 100], and image [86, 99, 102]), thereby improving user satisfaction. Furthermore, techniques and insights explored from SocialRS can also be exploited in real-world applications other than recommendations. For instance, García-Sánchez et al. [20] leveraged SocialRS to design a decision-making system for marketing (e.g., advertisement), while Gasparetti et al. [21] analyzed SocialRS in terms of community detection. Motivated by such wide applicability, there has been an increasing interest in research on developing accurate 40 SocialRS models. In the early days, research focused on matrix factorization (MF) techniques [28, 54-20 57, 84, 112].

data mining, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2212.04481

Country: Asia (0.46)

Genre: Overview (1.00)

Industry: Information Technology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
(2 more...)

Add feedback

Task and Model Agnostic Adversarial Attack on Graph Neural Networks

Sharma, Kartik, Verma, Samidha, Medya, Sourav, Bhattacharya, Arnab, Ranu, Sayan

arXiv.org Artificial IntelligenceDec-7-2022

Adversarial attacks on Graph Neural Networks (GNNs) reveal their security vulnerabilities, limiting their adoption in safety-critical applications. However, existing attack strategies rely on the knowledge of either the GNN model being used or the predictive task being attacked. Is this knowledge necessary? For example, a graph may be used for multiple downstream tasks unknown to a practical attacker. It is thus important to test the vulnerability of GNNs to adversarial perturbations in a model and task agnostic setting. In this work, we study this problem and show that GNNs remain vulnerable even when the downstream task and model are unknown. The proposed algorithm, TANDIS (Targeted Attack via Neighborhood DIStortion) shows that distortion of node neighborhoods is effective in drastically compromising prediction performance. Although neighborhood distortion is an NP-hard problem, TANDIS designs an effective heuristic through a novel combination of Graph Isomorphism Network with deep Q-learning. Extensive experiments on real datasets and state-of-the-art models show that, on average, TANDIS is up to 50% more effective than state-of-the-art techniques, while being more than 1000 times faster.

artificial intelligence, machine learning, node, (17 more...)

arXiv.org Artificial Intelligence

2112.13267

Country:

North America > United States (0.28)
Asia > India (0.28)

Genre:

Research Report > Promising Solution (0.68)
Research Report > Experimental Study (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback