AITopics | Kane, Aditya

Collaborating Authors

Kane, Aditya

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SenTest: Evaluating Robustness of Sentence Encoders

Chavan, Tanmay, Patankar, Shantanu, Kane, Aditya, Gokhale, Omkar, Kale, Geetanjali, Joshi, Raviraj

arXiv.org Artificial IntelligenceNov-29-2023

Contrastive learning has proven to be an effective method for pre-training models using weakly labeled data in the vision domain. Sentence transformers are the NLP counterparts to this architecture, and have been growing in popularity due to their rich and effective sentence representations. Having effective sentence representations is paramount in multiple tasks, such as information retrieval, retrieval augmented generation (RAG), and sentence comparison. Keeping in mind the deployability factor of transformers, evaluating the robustness of sentence transformers is of utmost importance. This work focuses on evaluating the robustness of the sentence encoders. We employ several adversarial attacks to evaluate its robustness. This system uses character-level attacks in the form of random character substitution, word-level attacks in the form of synonym replacement, and sentence-level attacks in the form of intra-sentence word order shuffling. The results of the experiments strongly undermine the robustness of sentence encoders. The models produce significantly different predictions as well as embeddings on perturbed datasets. The accuracy of the models can fall up to 15 percent on perturbed datasets as compared to unperturbed datasets. Furthermore, the experiments demonstrate that these embeddings does capture the semantic and syntactic structure (sentence order) of sentences. However, existing supervised classification strategies fail to leverage this information, and merely function as n-gram detectors.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2311.17722

Country:

Asia (0.48)
Europe (0.46)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.37)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)

Add feedback

Task Arithmetic with LoRA for Continual Learning

Chitale, Rajas, Vaidya, Ankit, Kane, Aditya, Ghotkar, Archana

arXiv.org Artificial IntelligenceNov-4-2023

Continual learning refers to the problem where the training data is available in sequential chunks, termed "tasks". The majority of progress in continual learning has been stunted by the problem of catastrophic forgetting, which is caused by sequential training of the model on streams of data. Moreover, it becomes computationally expensive to sequentially train large models multiple times. To mitigate both of these problems at once, we propose a novel method to continually train transformer-based vision models using low-rank adaptation and task arithmetic. Our method completely bypasses the problem of catastrophic forgetting, as well as reducing the computational requirement for training models on each task. When aided with a small memory of 10 samples per class, our method achieves performance close to full-set finetuning. We present rigorous ablations to support the prowess of our method.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2311.02428

Country:

North America > United States (0.14)
Europe > United Kingdom (0.14)

Genre: Research Report > Promising Solution (0.88)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

My Boli: Code-mixed Marathi-English Corpora, Pretrained Language Models and Evaluation Benchmarks

Chavan, Tanmay, Gokhale, Omkar, Kane, Aditya, Patankar, Shantanu, Joshi, Raviraj

arXiv.org Artificial IntelligenceJul-20-2023

The research on code-mixed data is limited due to the unavailability of dedicated code-mixed datasets and pre-trained language models. In this work, we focus on the low-resource Indian language Marathi which lacks any prior work in code-mixing. We present L3Cube-MeCorpus, a large code-mixed Marathi-English (Mr-En) corpus with 10 million social media sentences for pretraining. We also release L3Cube-MeBERT and MeRoBERTa, code-mixed BERT-based transformer models pre-trained on MeCorpus. Furthermore, for benchmarking, we present three supervised datasets MeHate, MeSent, and MeLID for downstream tasks like code-mixed Mr-En hate speech detection, sentiment analysis, and language identification respectively. These evaluation datasets individually consist of manually annotated \url{~}12,000 Marathi-English code-mixed tweets. Ablations show that the models trained on this novel corpus significantly outperform the existing state-of-the-art BERT models. This is the first work that presents artifacts for code-mixed Marathi research. All datasets and models are publicly released at https://github.com/l3cube-pune/MarathiNLP .

artificial intelligence, natural language, social media, (13 more...)

arXiv.org Artificial Intelligence

2306.1403

Country:

Asia > India (0.14)
Europe > Ukraine (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.35)

Add feedback

Two-stage Pipeline for Multilingual Dialect Detection

Vaidya, Ankit, Kane, Aditya

arXiv.org Artificial IntelligenceMar-28-2023

Dialect Identification is a crucial task for localizing various Large Language Models. This paper outlines our approach to the VarDial 2023 shared task. Here we have to identify three or two dialects from three languages each which results in a 9-way classification for Track-1 and 6-way classification for Track-2 respectively. Our proposed approach consists of a two-stage system and outperforms other participants' systems and previous works in this domain. We achieve a score of 58.54% for Track-1 and 85.61% for Track-2. Our codebase is available publicly (https://github.com/ankit-vaidya19/EACL_VarDial2023).

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2303.03487

Country:

Europe (0.94)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)

Add feedback

A Twitter BERT Approach for Offensive Language Detection in Marathi

Chavan, Tanmay, Patankar, Shantanu, Kane, Aditya, Gokhale, Omkar, Joshi, Raviraj

arXiv.org Artificial IntelligenceDec-20-2022

Automated offensive language detection is essential in combating the spread of hate speech, particularly in social media. This paper describes our work on Offensive Language Identification in low resource Indic language Marathi. The problem is formulated as a text classification task to identify a tweet as offensive or non-offensive. We evaluate different mono-lingual and multi-lingual BERT models on this classification task, focusing on BERT models pre-trained with social media datasets. We compare the performance of MuRIL, MahaTweetBERT, MahaTweetBERT-Hateful, and MahaBERT on the HASOC 2022 test set. We also explore external data augmentation from other existing Marathi hate speech corpus HASOC 2021 and L3Cube-MahaHate. The MahaTweetBERT, a BERT model, pre-trained on Marathi tweets when fine-tuned on the combined dataset (HASOC 2021 + HASOC 2022 + MahaHate), outperforms all models with an F1 score of 98.43 on the HASOC 2022 test set. With this, we also provide a new state-of-the-art result on HASOC 2022 / MOLD v2 test set.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2212.10039

Country:

Europe (0.46)
Asia > India (0.28)

Genre: Research Report (0.64)

Industry: Information Technology (0.69)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Spread Love Not Hate: Undermining the Importance of Hateful Pre-training for Hate Speech Detection

Gokhale, Omkar, Kane, Aditya, Patankar, Shantanu, Chavan, Tanmay, Joshi, Raviraj

arXiv.org Artificial IntelligenceDec-11-2022

Pre-training large neural language models, such as BERT, has led to impressive gains on many natural language processing (NLP) tasks. Although this method has proven to be effective for many domains, it might not always provide desirable benefits. In this paper, we study the effects of hateful pre-training on low-resource hate speech classification tasks. While previous studies on the English language have emphasized its importance, we aim to augment their observations with some non-obvious insights. We evaluate different variations of tweet-based BERT models pre-trained on hateful, non-hateful, and mixed subsets of a 40M tweet dataset. This evaluation is carried out for the Indian languages Hindi and Marathi. This paper is empirical evidence that hateful pre-training is not the best pre-training option for hate speech detection. We show that pre-training on non-hateful text from the target domain provides similar or better results. Further, we introduce HindTweetBERT and MahaTweetBERT, the first publicly available BERT models pre-trained on Hindi and Marathi tweets, respectively. We show that they provide state-of-the-art performance on hate speech classification tasks. We also release hateful BERT for the two languages and a gold hate speech evaluation benchmark HateEval-Hi and HateEval-Mr consisting of manually labeled 2000 tweets each. The models and data are available at https://github.com/l3cube-pune/MarathiNLP .

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2210.04267

Genre: Research Report (1.00)

Industry:

Law (0.68)
Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Temporal Word Meaning Disambiguation using TimeLMs

Godbole, Mihir, Dandavate, Parth, Kane, Aditya

arXiv.org Artificial IntelligenceNov-17-2022

Meaning of words constantly changes given the events in modern civilization. Large Language Models use word embeddings, which are often static and thus cannot cope with this semantic change. Thus,it is important to resolve ambiguity in word meanings. This paper is an effort in this direction, where we explore methods for word sense disambiguation for the EvoNLP shared task. We conduct rigorous ablations for two solutions to this problem. We see that an approach using time-aware language models helps this task. Furthermore, we explore possible future directions to this problem.

artificial intelligence, computational linguistic, natural language, (17 more...)

arXiv.org Artificial Intelligence

2210.08207

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.71)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)

Add feedback

Large Language Models for Multi-label Propaganda Detection

Chavan, Tanmay, Kane, Aditya

arXiv.org Artificial IntelligenceOct-20-2022

The spread of propaganda through the internet has increased drastically over the past years. Lately, propaganda detection has started gaining importance because of the negative impact it has on society. In this work, we describe our approach for the WANLP 2022 shared task which handles the task of propaganda detection in a multi-label setting. The task demands the model to label the given text as having one or more types of propaganda techniques. There are a total of 21 propaganda techniques to be detected. We show that an ensemble of five models performs the best on the task, scoring a micro-F1 score of 59.73%. We also conduct comprehensive ablations and propose various future directions for this work.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2210.08209

Country:

Europe (0.69)
Asia (0.47)

Genre: Research Report (0.50)

Industry:

Government (1.00)
Media > News (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback