Sarkar, Soumalya
Leveraging External Knowledge Resources to Enable Domain-Specific Comprehension
Sengupta, Saptarshi, Heaton, Connor, Mitra, Prasenjit, Sarkar, Soumalya
Machine Reading Comprehension (MRC) has been a long-standing problem in NLP and, with the recent introduction of the BERT family of transformer based language models, it has come a long way to getting solved. Unfortunately, however, when BERT variants trained on general text corpora are applied to domain-specific text, their performance inevitably degrades on account of the domain shift i.e. genre/subject matter discrepancy between the training and downstream application data. Knowledge graphs act as reservoirs for either open or closed domain information and prior studies have shown that they can be used to improve the performance of general-purpose transformers in domain-specific applications. Building on existing work, we introduce a method using Multi-Layer Perceptrons (MLPs) for aligning and integrating embeddings extracted from knowledge graphs with the embeddings spaces of pre-trained language models (LMs). We fuse the aligned embeddings with open-domain LMs BERT and RoBERTa, and fine-tune them for two MRC tasks namely span detection (COVID-QA) and multiple-choice questions (PubMedQA). On the COVID-QA dataset, we see that our approach allows these models to perform similar to their domain-specific counterparts, Bio/Sci-BERT, as evidenced by the Exact Match (EM) metric. With regards to PubMedQA, we observe an overall improvement in accuracy while the F1 stays relatively the same over the domain-specific models. MRC is defined as a class of supervised question answering (QA) problems wherein a system learns a function to answer a question given an associated passage(s), i.e. given a question and context text, select the answer to the question from within the context. Mathematically, MRC: f(C,Q) A where C is the relevant context, Q is the question andAis the answer space to be learned (Liu et al., 2019). Reading comprehension is one of the most challenging areas of NLP since a system needs to manage with multiple facets of language (identifying entities, supporting facts in context, the intent of the question, etc.) to answer correctly. Fortunately, with the introduction of the Transformer (Vaswani et al., 2017) and subsequent BERT (Devlin et al., 2019) family of models (Rogers et al., 2020), the state-of-the-art in MRC has moved forward by leaps and bounds.
LLMs for Multi-Modal Knowledge Extraction and Analysis in Intelligence/Safety-Critical Applications
Israelsen, Brett, Sarkar, Soumalya
Large Language Models have seen rapid progress in capability in recent years; this progress has been accelerating and their capabilities, measured by various benchmarks, are beginning to approach those of humans. There is a strong demand to use such models in a wide variety of applications but, due to unresolved vulnerabilities and limitations, great care needs to be used before applying them to intelligence and safety-critical applications. This paper reviews recent literature related to LLM assessment and vulnerabilities to synthesize the current research landscape and to help understand what advances are most critical to enable use of of these technologies in intelligence and safety-critical applications. The vulnerabilities are broken down into ten high-level categories and overlaid onto a high-level life cycle of an LLM. Some general categories of mitigations are reviewed.
Scalable Bayesian optimization with high-dimensional outputs using randomized prior networks
Bhouri, Mohamed Aziz, Joly, Michael, Yu, Robert, Sarkar, Soumalya, Perdikaris, Paris
Several fundamental problems in science and engineering consist of global optimization tasks involving unknown high-dimensional (black-box) functions that map a set of controllable variables to the outcomes of an expensive experiment. Bayesian Optimization (BO) techniques are known to be effective in tackling global optimization problems using a relatively small number objective function evaluations, but their performance suffers when dealing with high-dimensional outputs. To overcome the major challenge of dimensionality, here we propose a deep learning framework for BO and sequential decision making based on bootstrapped ensembles of neural architectures with randomized priors. Using appropriate architecture choices, we show that the proposed framework can approximate functional relationships between design variables and quantities of interest, even in cases where the latter take values in high-dimensional vector spaces or even infinite-dimensional function spaces. In the context of BO, we augmented the proposed probabilistic surrogates with re-parameterized Monte Carlo approximations of multiple-point (parallel) acquisition functions, as well as methodological extensions for accommodating black-box constraints and multi-fidelity information sources. We test the proposed framework against state-of-the-art methods for BO and demonstrate superior performance across several challenging tasks with high-dimensional outputs, including a constrained multi-fidelity optimization task involving shape optimization of rotor blades in turbo-machinery.