Kumar, Rahul
ORI: O Routing Intelligence
Shadid, Ahmad, Kumar, Rahul, Mayank, Mohit
Single large language models (LLMs) often fall short when faced with the ever-growing range of tasks, making a single-model approach insufficient. We address this challenge by proposing ORI (O Routing Intelligence), a dynamic framework that leverages a set of LLMs. By intelligently routing incoming queries to the most suitable model, ORI not only improves task-specific accuracy, but also maintains efficiency. Comprehensive evaluations across diverse benchmarks demonstrate consistent accuracy gains while controlling computational overhead. By intelligently routing queries, ORI outperforms the strongest individual models by up to 2.7 points on MMLU and 1.8 points on MuSR, ties the top performance on ARC, and on BBH. These results underscore the benefits of a multi-model strategy and demonstrate how ORI's adaptive architecture can more effectively handle diverse tasks, offering a scalable, high-performance solution for a system of multiple large language models.
Preset-Voice Matching for Privacy Regulated Speech-to-Speech Translation Systems
Platnick, Daniel, Abdelnour, Bishoy, Earl, Eamon, Kumar, Rahul, Rezaei, Zahra, Tsangaris, Thomas, Lagum, Faraj
In recent years, there has been increased demand for speech-to-speech translation (S2ST) systems in industry settings. Although successfully commercialized, cloning-based S2ST systems expose their distributors to liabilities when misused by individuals and can infringe on personality rights when exploited by media organizations. This work proposes a regulated S2ST framework called Preset-Voice Matching (PVM). PVM removes cross-lingual voice cloning in S2ST by first matching the input voice to a similar prior consenting speaker voice in the target-language. With this separation, PVM avoids cloning the input speaker, ensuring PVM systems comply with regulations and reduce risk of misuse. Our results demonstrate PVM can significantly improve S2ST system run-time in multi-speaker settings and the naturalness of S2ST synthesized speech. To our knowledge, PVM is the first explicitly regulated S2ST framework leveraging similarly-matched preset-voices for dynamic S2ST tasks.
Pretraining Data and Tokenizer for Indic LLM
Kumar, Rahul, Kakde, Shubham, Rajput, Divyansh, Ibrahim, Daud, Nahata, Rishabh, Sowjanya, Pidathala, Kumar, Deepak
We present a novel approach to data preparation for developing multilingual Indic large language model. Our meticulous data acquisition spans open-source and proprietary sources, including Common Crawl, Indic books, news articles, and Wikipedia, ensuring a diverse and rich linguistic representation. For each Indic language, we design a custom preprocessing pipeline to effectively eliminate redundant and low-quality text content. Additionally, we perform deduplication on Common Crawl data to address the redundancy present in 70% of the crawled web pages. This study focuses on developing high-quality data, optimizing tokenization for our multilingual dataset for Indic large language models with 3B and 7B parameters, engineered for superior performance in Indic languages. We introduce a novel multilingual tokenizer training strategy, demonstrating our custom-trained Indic tokenizer outperforms the state-of-the-art OpenAI Tiktoken tokenizer, achieving a superior token-to-word ratio for Indic languages.
BookSQL: A Large Scale Text-to-SQL Dataset for Accounting Domain
Kumar, Rahul, Dibbu, Amar Raja, Harsola, Shrutendra, Subrahmaniam, Vignesh, Modi, Ashutosh
Several large-scale datasets (e.g., WikiSQL, Spider) for developing natural language interfaces to databases have recently been proposed. These datasets cover a wide breadth of domains but fall short on some essential domains, such as finance and accounting. Given that accounting databases are used worldwide, particularly by non-technical people, there is an imminent need to develop models that could help extract information from accounting databases via natural language queries. In this resource paper, we aim to fill this gap by proposing a new large-scale Text-to-SQL dataset for the accounting and financial domain: BookSQL. The dataset consists of 100k natural language queries-SQL pairs, and accounting databases of 1 million records. We experiment with and analyze existing state-of-the-art models (including GPT-4) for the Text-to-SQL task on BookSQL. We find significant performance gaps, thus pointing towards developing more focused models for this domain.
meSch: Multi-Agent Energy-Aware Scheduling for Task Persistence
Naveed, Kaleb Ben, Dang, An, Kumar, Rahul, Panagou, Dimitra
This paper develops a scheduling protocol for a team of autonomous robots that operate in long-term persistent tasks. The proposed framework, called meSch, accounts for the robots' limited battery capacity and the presence of a single charging station, and achieves the following contributions: 1) First, it guarantees exclusive use of the charging station by one robot at a time; the approach is online, applicable to general nonlinear robot models, does not require robots to be deployed at different times, and can handle robots with different discharge rates. 2) Second, we consider the scenario when the charging station is mobile and subject to uncertainty. This approach ensures that the robots can rendezvous with the charging station while considering the uncertainty in its position. Finally, we provide the evaluation of the efficacy of meSch in simulation and experimental case studies.
Generative AI-Based Text Generation Methods Using Pre-Trained GPT-2 Model
Pandey, Rohit, Waghela, Hetvi, Rakshit, Sneha, Rangari, Aparna, Singh, Anjali, Kumar, Rahul, Ghosal, Ratnadeep, Sen, Jaydip
A text generation model is a machine learning model that uses neural networks, especially transformers architecture to generate contextually relevant text based on linguistic patterns learned from extensive corpora. The models are trained on a huge amount of textual data so that they can model and learn complex concepts of any language like its grammar, vocabulary, phrases, and styles. Text generation models can increase the productivity of humans in their current business processes. These models are already automating the process of content creation across industries for the generation of reports, summaries, and emails among others. These models are also allowing for a greater level of personalization in communications between businesses and their customers.
Much Easier Said Than Done: Falsifying the Causal Relevance of Linear Decoding Methods
Hayne, Lucas, Suresh, Abhijit, Jain, Hunar, Kumar, Rahul, Carter, R. McKell
Linear classifier probes are frequently utilized to better understand how neural networks function. Researchers have approached the problem of determining unit importance in neural networks by probing their learned, internal representations. Linear classifier probes identify highly selective units as the most important for network function. Whether or not a network actually relies on high selectivity units can be tested by removing them from the network using ablation. Surprisingly, when highly selective units are ablated they only produce small performance deficits, and even then only in some cases. In spite of the absence of ablation effects for selective neurons, linear decoding methods can be effectively used to interpret network function, leaving their effectiveness a mystery. To falsify the exclusive role of selectivity in network function and resolve this contradiction, we systematically ablate groups of units in subregions of activation space. Here, we find a weak relationship between neurons identified by probes and those identified by ablation. More specifically, we find that an interaction between selectivity and the average activity of the unit better predicts ablation performance deficits for groups of units in AlexNet, VGG16, MobileNetV2, and ResNet101. Linear decoders are likely somewhat effective because they overlap with those units that are causally important for network function. Interpretability methods could be improved by focusing on causally important units.
Many Hands Make Light Work: Using Essay Traits to Automatically Score Essays
Kumar, Rahul, Mathias, Sandeep, Saha, Sriparna, Bhattacharyya, Pushpak
Most research in the area of automatic essay grading (AEG) is geared towards scoring the essay holistically while there has also been some work done on scoring individual essay traits. In this paper, we describe a way to score essays holistically using a multi-task learning (MTL) approach, where scoring the essay holistically is the primary task, and scoring the essay traits is the auxiliary task. We compare our results with a single-task learning (STL) approach, using both LSTMs and BiLSTMs. We also compare our results of the auxiliary task with such tasks done in other AEG systems. To find out which traits work best for different types of essays, we conduct ablation tests for each of the essay traits. We also report the runtime and number of training parameters for each system. We find that MTL-based BiLSTM system gives the best results for scoring the essay holistically, as well as performing well on scoring the essay traits.