Not enough data to create a plot.
Try a different view from the menu above.
Kumar, Anuj
CRAG -- Comprehensive RAG Benchmark
Yang, Xiao, Sun, Kai, Xin, Hao, Sun, Yushi, Bhalla, Nikita, Chen, Xiangsen, Choudhary, Sajal, Gui, Rongze Daniel, Jiang, Ziran Will, Jiang, Ziyu, Kong, Lingkun, Moran, Brian, Wang, Jiaqi, Xu, Yifan Ethan, Yan, An, Yang, Chenyu, Yuan, Eting, Zha, Hanwen, Tang, Nan, Chen, Lei, Scheffer, Nicolas, Liu, Yue, Shah, Nirav, Wanga, Rakesh, Kumar, Anuj, Yih, Wen-tau, Dong, Xin Luna
Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)'s deficiency in lack of knowledge. Existing RAG datasets, however, do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factual question answering benchmark of 4,409 question-answer pairs and mock APIs to simulate web and Knowledge Graph (KG) search. CRAG is designed to encapsulate a diverse array of questions across five domains and eight question categories, reflecting varied entity popularity from popular to long-tail, and temporal dynamisms ranging from years to seconds. Our evaluation on this benchmark highlights the gap to fully trustworthy QA. Whereas most advanced LLMs achieve <=34% accuracy on CRAG, adding RAG in a straightforward manner improves the accuracy only to 44%. State-of-the-art industry RAG solutions only answer 63% questions without any hallucination. CRAG also reveals much lower accuracy in answering questions regarding facts with higher dynamism, lower popularity, or higher complexity, suggesting future research directions. The CRAG benchmark laid the groundwork for a KDD Cup 2024 challenge, attracting thousands of participants and submissions within the first 50 days of the competition. We commit to maintaining CRAG to serve research communities in advancing RAG solutions and general QA solutions.
Lumos : Empowering Multimodal LLMs with Scene Text Recognition
Shenoy, Ashish, Lu, Yichao, Jayakumar, Srihari, Chatterjee, Debojeet, Moslehpour, Mohsen, Chuang, Pierce, Harpale, Abhay, Bhardwaj, Vikas, Xu, Di, Zhao, Shicong, Zhao, Longfang, Ramchandani, Ankit, Dong, Xin Luna, Kumar, Anuj
We introduce Lumos, the first end-to-end multimodal question-answering system with text understanding capabilities. At the core of Lumos is a Scene Text Recognition (STR) component that extracts text from first person point-of-view images, the output of which is used to augment input to a Multimodal Large Language Model (MM-LLM). While building Lumos, we encountered numerous challenges related to STR quality, overall latency, and model inference. In this paper, we delve into those challenges, and discuss the system architecture, design choices, and modeling techniques employed to overcome these obstacles. We also provide a comprehensive evaluation for each component, showcasing high quality and efficiency.
A Posteriori Evaluation of a Physics-Constrained Neural Ordinary Differential Equations Approach Coupled with CFD Solver for Modeling Stiff Chemical Kinetics
Kumar, Tadbhagya, Kumar, Anuj, Pal, Pinaki
The high computational cost associated with solving for detailed chemistry poses a significant challenge for predictive computational fluid dynamics (CFD) simulations of turbulent reacting flows. These models often require solving a system of coupled stiff ordinary differential equations (ODEs). While deep learning techniques have been experimented with to develop faster surrogate models, they often fail to integrate reliably with CFD solvers. This instability arises because deep learning methods optimize for training error without ensuring compatibility with ODE solvers, leading to accumulation of errors over time. Recently, NeuralODE-based techniques have offered a promising solution by effectively modeling chemical kinetics. In this study, we extend the NeuralODE framework for stiff chemical kinetics by incorporating mass conservation constraints directly into the loss function during training. This ensures that the total mass and the elemental mass are conserved, a critical requirement for reliable downstream integration with CFD solvers. Proof-of-concept studies are performed with physics-constrained neuralODE (PC-NODE) approach for homogeneous autoignition of hydrogen-air mixture over a range of composition and thermodynamic conditions. Our results demonstrate that this enhancement not only improves the physical consistency with respect to mass conservation criteria but also ensures better robustness. Lastly, a posteriori studies are performed wherein the trained PC-NODE model is coupled with a 3D CFD solver for computing the chemical source terms. PC-NODE is shown to be more accurate relative to the purely data-driven neuralODE approach. Moreover, PC-NODE also exhibits robustness and generalizability to unseen initial conditions from within (interpolative capability) as well as outside (extrapolative capability) the training regime.
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Moon, Seungwhan, Madotto, Andrea, Lin, Zhaojiang, Nagarajan, Tushar, Smith, Matt, Jain, Shashank, Yeh, Chun-Fu, Murugesan, Prakash, Heidari, Peyman, Liu, Yue, Srinet, Kavya, Damavandi, Babak, Kumar, Anuj
We present Any-Modality Augmented Language Model (AnyMAL), a unified model that reasons over diverse input modality signals (i.e. text, image, video, audio, IMU motion sensor), and generates textual responses. AnyMAL inherits the powerful text-based reasoning abilities of the state-of-the-art LLMs including LLaMA-2 (70B), and converts modality-specific signals to the joint textual space through a pre-trained aligner module. To further strengthen the multimodal LLM's capabilities, we fine-tune the model with a multimodal instruction set manually collected to cover diverse topics and tasks beyond simple QAs. We conduct comprehensive empirical analysis comprising both human and automatic evaluations, and demonstrate state-of-the-art performance on various multimodal tasks.
A Framework for Combustion Chemistry Acceleration with DeepONets
Kumar, Anuj, Echekki, Tarek
A combustion chemistry acceleration scheme is developed based on deep operator networks (DeepONets). The scheme is based on the identification of combustion reaction dynamics through a modified DeepOnet architecture such that the solutions of thermochemical scalars are projected to new solutions in small and flexible time increments. The approach is designed to efficiently implement chemistry acceleration without the need for computationally expensive integration of stiff chemistry. An additional framework of latent-space dynamics identification with modified DeepOnet is also proposed which enhances the computational efficiency and widens the applicability of the proposed scheme. The scheme is demonstrated on simple chemical kinetics of hydrogen oxidation to more complex chemical kinetics of n-dodecane high- and low-temperature oxidations. The proposed framework accurately learns the chemical kinetics and efficiently reproduces species and temperature temporal profiles corresponding to each application. In addition, a very large speed-up with a great extrapolation capability is also observed with the proposed scheme.
Extended Feature Space-Based Automatic Melanoma Detection System
Kumar, Shakti, Kumar, Anuj
Melanoma is the deadliest form of skin cancer. Uncontrollable growth of melanocytes leads to melanoma. Melanoma has been growing wildly in the last few decades. In recent years, the detection of melanoma using image processing techniques has become a dominant research field. The Automatic Melanoma Detection System (AMDS) helps to detect melanoma based on image processing techniques by accepting infected skin area images as input. A single lesion image is a source of multiple features. Therefore, It is crucial to select the appropriate features from the image of the lesion in order to increase the accuracy of AMDS. For melanoma detection, all extracted features are not important. Some of the extracted features are complex and require more computation tasks, which impacts the classification accuracy of AMDS. The feature extraction phase of AMDS exhibits more variability, therefore it is important to study the behaviour of AMDS using individual and extended feature extraction approaches. A novel algorithm ExtFvAMDS is proposed for the calculation of Extended Feature Vector Space. The six models proposed in the comparative study revealed that the HSV feature vector space for automatic detection of melanoma using Ensemble Bagged Tree classifier on Med-Node Dataset provided 99% AUC, 95.30% accuracy, 94.23% sensitivity, and 96.96% specificity.
El Volumen Louder Por Favor: Code-switching in Task-oriented Semantic Parsing
Einolghozati, Arash, Arora, Abhinav, Lecanda, Lorena Sainz-Maza, Kumar, Anuj, Gupta, Sonal
Being able to parse code-switched (CS) utterances, such as Spanish+English or Hindi+English, is essential to democratize task-oriented semantic parsing systems for certain locales. In this work, we focus on Spanglish (Spanish+English) and release a dataset, CSTOP, containing 5800 CS utterances alongside their semantic parses. We examine the CS generalizability of various Cross-lingual (XL) models and exhibit the advantage of pre-trained XL language models when data for only one language is present. As such, we focus on improving the pre-trained models for the case when only English corpus alongside either zero or a few CS training instances are available. We propose two data augmentation methods for the zero-shot and the few-shot settings: fine-tune using translate-and-align and augment using a generation model followed by match-and-filter. Combining the few-shot setting with the above improvements decreases the initial 30-point accuracy gap between the zero-shot and the full-data settings by two thirds.
Federated User Representation Learning
Bui, Duc, Malik, Kshitiz, Goetz, Jack, Liu, Honglei, Moon, Seungwhan, Kumar, Anuj, Shin, Kang G.
Collaborative personalization, such as through learned user representations (em-beddings), can improve the prediction accuracy of neural-network-based models significantly. We propose Federated User Representation Learning (FURL), a simple, scalable, privacy-preserving and resource-efficient way to utilize existing neural personalization techniques in the Federated Learning (FL) setting. FURL divides model parameters into federated and private parameters. Private parameters, such as private user embeddings, are trained locally, but unlike federated parameters, they are not transferred to or averaged on the server. We show theoretically that this parameter split does not affect training for most model per-sonalization approaches. Storing user embeddings locally not only preserves user privacy, but also improves memory locality of personalization compared to on-server training. We evaluate FURL on two datasets, demonstrating a significant improvement in model quality with 8% and 51% performance increases, and approximately the same level of performance as centralized training with only 0% and 4% reductions. Furthermore, we show that user embeddings learned in FL and the centralized setting have a very similar structure, indicating that FURL can learn collaboratively through the shared parameters while preserving user privacy.
Active Federated Learning
Goetz, Jack, Malik, Kshitiz, Bui, Duc, Moon, Seungwhan, Liu, Honglei, Kumar, Anuj
Federated Learning allows for population level models to be trained without centralizing client data by transmitting the global model to clients, calculating gradients locally, then averaging the gradients. Downloading models and uploading gradients uses the client's bandwidth, so minimizing these transmission costs is important. The data on each client is highly variable, so the benefit of training on different clients may differ dramatically. To exploit this we propose Active Federated Learning, where in each round clients are selected not uniformly at random, but with a probability conditioned on the current model and the data on the client to maximize efficiency. We propose a cheap, simple and intuitive sampling scheme which reduces the number of required training iterations by 20-70% while maintaining the same model accuracy, and which mimics well known resampling techniques under certain conditions.
Explore-Exploit: A Framework for Interactive and Online Learning
Liu, Honglei, Kumar, Anuj, Yang, Wenhai, Dumoulin, Benoit
Interactive user interfaces need to continuously evolve based on the interactions that a user has (or does not have) with the system. This may require constant exploration of various options that the system may have for the user and obtaining signals of user preferences on those. However, such an exploration, especially when the set of available options itself can change frequently, can lead to sub-optimal user experiences. We present Explore-Exploit: a framework designed to collect and utilize user feedback in an interactive and online setting that minimizes regressions in end-user experience. This framework provides a suite of online learning operators for various tasks such as personalization ranking, candidate selection and active learning. We demonstrate how to integrate this framework with run-time services to leverage online and interactive machine learning out-of-the-box. We also present results demonstrating the efficiencies that can be achieved using the Explore-Exploit framework.