AITopics | Kumar, Amit

Collaborating Authors

Kumar, Amit

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Machine Translation by Projecting Text into the Same Phonetic-Orthographic Space Using a Common Encoding

Kumar, Amit, Parida, Shantipriya, Pratap, Ajay, Singh, Anil Kumar

arXiv.org Artificial IntelligenceMay-21-2023

The use of subword embedding has proved to be a major innovation in Neural Machine Translation (NMT). It helps NMT to learn better context vectors for Low Resource Languages (LRLs) so as to predict the target words by better modelling the morphologies of the two languages and also the morphosyntax transfer. Even so, their performance for translation in Indian language to Indian language scenario is still not as good as for resource-rich languages. One reason for this is the relative morphological richness of Indian languages, while another is that most of them fall into the extremely low resource or zero-shot categories. Since most major Indian languages use Indic or Brahmi origin scripts, the text written in them is highly phonetic in nature and phonetically similar in terms of abstract letters and their arrangements. We use these characteristics of Indian languages and their scripts to propose an approach based on common multilingual Latin-based encodings (WX notation) that take advantage of language similarity while addressing the morphological complexity issue in NMT. These multilingual Latin-based encodings in NMT, together with Byte Pair Embedding (BPE) allow us to better exploit their phonetic and orthographic as well as lexical similarities to improve the translation quality by projecting different but similar languages on the same orthographic-phonetic character space. We verify the proposed approach by demonstrating experiments on similar language pairs (Gujarati-Hindi, Marathi-Hindi, Nepali-Hindi, Maithili-Hindi, Punjabi-Hindi, and Urdu-Hindi) under low resource conditions. The proposed approach shows an improvement in a majority of cases, in one case as much as ~10 BLEU points compared to baseline techniques for similar language pairs. We also get up to ~1 BLEU points improvement on distant and zero-shot language pairs.

artificial intelligence, machine translation, natural language, (16 more...)

arXiv.org Artificial Intelligence

2305.12371

Country:

Europe (1.00)
Asia > India (0.69)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Exploiting Multilingualism in Low-resource Neural Machine Translation via Adversarial Learning

Kumar, Amit, Pratap, Ajay, Singh, Anil Kumar

arXiv.org Artificial IntelligenceMar-31-2023

Generative Adversarial Networks (GAN) offer a promising approach for Neural Machine Translation (NMT). However, feeding multiple morphologically languages into a single model during training reduces the NMT's performance. In GAN, similar to bilingual models, multilingual NMT only considers one reference translation for each sentence during model training. This single reference translation limits the GAN model from learning sufficient information about the source sentence representation. Thus, in this article, we propose Denoising Adversarial Auto-encoder-based Sentence Interpolation (DAASI) approach to perform sentence interpolation by learning the intermediate latent representation of the source and target sentences of multilingual language pairs. Apart from latent representation, we also use the Wasserstein-GAN approach for the multilingual NMT model by incorporating the model generated sentences of multiple languages for reward computation. This computed reward optimizes the performance of the GAN-based multilingual model in an effective manner. We demonstrate the experiments on low-resource language pairs and find that our approach outperforms the existing state-of-the-art approaches for multilingual NMT with a performance gain of up to 4 BLEU points. Moreover, we use our trained model on zero-shot language pairs under an unsupervised scenario and show the robustness of the proposed approach.

machine learning, natural language, translation, (16 more...)

arXiv.org Artificial Intelligence

2303.18011

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > Promising Solution (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Exploiting Language Relatedness in Machine Translation Through Domain Adaptation Techniques

Kumar, Amit, Baruah, Rupjyoti, Pratap, Ajay, Swarnkar, Mayank, Singh, Anil Kumar

arXiv.org Artificial IntelligenceMar-3-2023

One of the significant challenges of Machine Translation (MT) is the scarcity of large amounts of data, mainly parallel sentence aligned corpora. If the evaluation is as rigorous as resource-rich languages, both Neural Machine Translation (NMT) and Statistical Machine Translation (SMT) can produce good results with such large amounts of data. However, it is challenging to improve the quality of MT output for low resource languages, especially in NMT and SMT. In order to tackle the challenges faced by MT, we present a novel approach of using a scaled similarity score of sentences, especially for related languages based on a 5-gram KenLM language model with Kneser-ney smoothing technique for filtering in-domain data from out-of-domain corpora that boost the translation quality of MT. Furthermore, we employ other domain adaptation techniques such as multi-domain, fine-tuning and iterative back-translation approach to compare our novel approach on the Hindi-Nepali language pair for NMT and SMT. Our approach succeeds in increasing ~2 BLEU point on multi-domain approach, ~3 BLEU point on fine-tuning for NMT and ~2 BLEU point on iterative back-translation approach.

artificial intelligence, computational linguistic, natural language, (16 more...)

arXiv.org Artificial Intelligence

2303.01793

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report > Promising Solution (0.86)
Overview > Innovation (0.54)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Advaita: Bug Duplicity Detection System

Kumar, Amit, Madanu, Manohar, Prakash, Hari, Jonnavithula, Lalitha, Aravilli, Srinivasa Rao

arXiv.org Artificial IntelligenceJan-23-2020

Bugs are prevalent in software development. To improve software quality, bugs are filed using a bug tracking system. Properties of a reported bug would consist of a headline, description, project, product, component that is affected by the bug and the severity of the bug. Duplicate bugs rate (% of duplicate bugs) are in the range from single digit (1 to 9%) to double digits (40%) based on the product maturity , size of the code and number of engineers working on the project. Duplicate bugs range are between 9% to 39% in some of the open source projects like Eclipse, Firefox etc. Detection of duplicity deals with identifying whether any two bugs convey the same meaning. This detection of duplicates helps in de-duplication. Detecting duplicate bugs help reduce triaging efforts and saves time for developers in fixing the issues. Traditional natural language processing techniques are less accurate in identifying similarity between sentences. Using the bug data present in a bug tracking system, various approaches were explored including several machine learning algorithms, to obtain a viable approach that can identify duplicate bugs, given a pair of sentences(i.e. the respective bug descriptions). This approach considers multiple sets of features viz. basic text statistical features, semantic features and contextual features. These features are extracted from the headline, description and component and are subsequently used to train a classification algorithm.

artificial intelligence, bug report, text processing, (19 more...)

arXiv.org Artificial Intelligence

2001.10376

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)

Add feedback