Wertheimer, Davis
Accelerating Production LLMs with Combined Token/Embedding Speculators
Wertheimer, Davis, Rosenkranz, Joshua, Parnell, Thomas, Suneja, Sahil, Ranganathan, Pavithra, Ganti, Raghu, Srivatsa, Mudhakar
One approach to squaring this circle is speculative decoding, where a smaller draft model or speculator is trained This technical report describes the design and training to predict multiple tokens given a sequence of input. These of novel speculative decoding draft models, for accelerating speculative tokens are produced with low cost, and lower the inference speeds of large language models in a accuracy than the base LLM. However, we can leverage production environment. By conditioning draft predictions GPU parallelism during the LLM forward pass to evaluate on both context vectors and sampled tokens, we can train the output for each of these new tokens with minimal additional our speculators to efficiently predict high-quality n-grams, overhead. Then, by comparing the outputs to the which the base model then accepts or rejects. This allows us speculated inputs, we can accept all the predicted tokens to effectively predict multiple tokens per inference forward that match the output of the base model, while rejecting all pass, accelerating wall-clock inference speeds of highly optimized those that don't. In this way we can predict multiple tokens base model implementations by a factor of 2-3x. We per LLM forward pass at minimal extra cost. A deeper explore these initial results and describe next steps for further explanation of speculative decoding can be found in [3, 6].
INDUS: Effective and Efficient Language Models for Scientific Applications
Bhattacharjee, Bishwaranjan, Trivedi, Aashka, Muraoka, Masayasu, Ramasubramanian, Muthukumaran, Udagawa, Takuma, Gurung, Iksha, Zhang, Rong, Dandala, Bharath, Ramachandran, Rahul, Maskey, Manil, Bugbee, Kaylin, Little, Mike, Fancher, Elizabeth, Sanders, Lauren, Costes, Sylvain, Blanco-Cuaresma, Sergi, Lockhart, Kelly, Allen, Thomas, Grezes, Felix, Ansdell, Megan, Accomazzi, Alberto, El-Kurdi, Yousef, Wertheimer, Davis, Pfitzmann, Birgit, Ramis, Cesar Berrospi, Dolfi, Michele, de Lima, Rafael Teixeira, Vagenas, Panagiotis, Mukkavilli, S. Karthik, Staar, Peter, Vahidinia, Sanaz, McGranaghan, Ryan, Mehrabian, Armin, Lee, Tsendgar
Large language models (LLMs) trained on general domain corpora showed remarkable results on natural language processing (NLP) tasks. However, previous research demonstrated LLMs trained using domain-focused corpora perform better on specialized tasks. Inspired by this pivotal insight, we developed INDUS, a comprehensive suite of LLMs tailored for the Earth science, biology, physics, heliophysics, planetary sciences and astrophysics domains and trained using curated scientific corpora drawn from diverse data sources. The suite of models include: (1) an encoder model trained using domain-specific vocabulary and corpora to address natural language understanding tasks, (2) a contrastive-learning-based general text embedding model trained using a diverse set of datasets drawn from multiple sources to address information retrieval tasks and (3) smaller versions of these models created using knowledge distillation techniques to address applications which have latency or resource constraints. We also created three new scientific benchmark datasets namely, CLIMATE-CHANGE-NER (entity-recognition), NASA-QA (extractive QA) and NASA-IR (IR) to accelerate research in these multi-disciplinary fields. Finally, we show that our models outperform both general-purpose encoders (RoBERTa) and existing domain-specific encoders (SciBERT) on these new tasks as well as existing benchmark tasks in the domains of interest.
SudokuSens: Enhancing Deep Learning Robustness for IoT Sensing Applications using a Generative Approach
Wang, Tianshi, Li, Jinyang, Wang, Ruijie, Kara, Denizhan, Liu, Shengzhong, Wertheimer, Davis, Viros-i-Martin, Antoni, Ganti, Raghu, Srivatsa, Mudhakar, Abdelzaher, Tarek
This paper introduces SudokuSens, a generative framework for automated generation of training data in machine-learning-based Internet-of-Things (IoT) applications, such that the generated synthetic data mimic experimental configurations not encountered during actual sensor data collection. The framework improves the robustness of resulting deep learning models, and is intended for IoT applications where data collection is expensive. The work is motivated by the fact that IoT time-series data entangle the signatures of observed objects with the confounding intrinsic properties of the surrounding environment and the dynamic environmental disturbances experienced. To incorporate sufficient diversity into the IoT training data, one therefore needs to consider a combinatorial explosion of training cases that are multiplicative in the number of objects considered and the possible environmental conditions in which such objects may be encountered. Our framework substantially reduces these multiplicative training needs. To decouple object signatures from environmental conditions, we employ a Conditional Variational Autoencoder (CVAE) that allows us to reduce data collection needs from multiplicative to (nearly) linear, while synthetically generating (data for) the missing conditions. To obtain robustness with respect to dynamic disturbances, a session-aware temporal contrastive learning approach is taken. Integrating the aforementioned two approaches, SudokuSens significantly improves the robustness of deep learning for IoT applications. We explore the degree to which SudokuSens benefits downstream inference tasks in different data sets and discuss conditions under which the approach is particularly effective.
Few-Shot Learning with Localization in Realistic Settings
Wertheimer, Davis, Hariharan, Bharath
Traditional recognition methods typically require large, artificially-balanced training classes, while few-shot learning methods are tested on artificially small ones. In contrast to both extremes, real world recognition problems exhibit heavy-tailed class distributions, with cluttered scenes and a mix of coarse and fine-grained class distinctions. We show that prior methods designed for few-shot learning do not work out of the box in these challenging conditions, based on a new "meta-iNat" benchmark. We introduce three parameter-free improvements: (a) better training procedures based on adapting cross-validation to meta-learning, (b) novel architectures that localize objects using limited bounding box annotations before classification, and (c) simple parameter-free expansions of the feature space based on bilinear pooling. Together, these improvements double the accuracy of state-of-the-art models on meta-iNat while generalizing to prior benchmarks, complex neural architectures, and settings with substantial domain shift.