Goto

Collaborating Authors

 Chakraborty, Manojit


Exploring Large Language Models for Code Explanation

arXiv.org Artificial Intelligence

Automating code documentation through explanatory text can prove highly beneficial in code understanding. Large Language Models (LLMs) have made remarkable strides in Natural Language Processing, especially within software engineering tasks such as code generation and code summarization. This study specifically delves into the task of generating natural-language summaries for code snippets, using various LLMs. The findings indicate that Code LLMs outperform their generic counterparts, and zero-shot methods yield superior results when dealing with datasets with dissimilar distributions between training and testing sets.


Vec2GC -- A Graph Based Clustering Method for Text Representations

arXiv.org Artificial Intelligence

NLP pipelines with limited or no labeled data, rely on unsupervised methods for document processing. Unsupervised approaches typically depend on clustering of terms or documents. In this paper, we introduce a novel clustering algorithm, Vec2GC (Vector to Graph Communities), an end-to-end pipeline to cluster terms or documents for any given text corpus. Our method uses community detection on a weighted graph of the terms or documents, created using text representation learning. Vec2GC clustering algorithm is a density based approach, that supports hierarchical clustering as well.


Detection of Fake Users in SMPs Using NLP and Graph Embeddings

arXiv.org Artificial Intelligence

Daouadi et al. [5] used deep learning methods on features based on the amount of interaction to and from each Social Media Platforms (SMPs) like Facebook, Twitter, Instagram Twitter account along with other set of features used previously, etc. have large user base all around the world that generates huge for fake user detection. Abu-El-Rub and Mueen [1] used trending amount of data every second. This includes a lot of posts by fake hashtags to detect bots interested in political trends. Graph based and spam users, typically used by many organisations around the techniques are used to cluster the collected bots and those are fed globe to have competitive edge over others. In this work, we aim to supervised learning to detect user's agreement/disagreement to at detecting such user accounts in Twitter using a novel approach.