AITopics | Sil, Avirup

Collaborating Authors

Sil, Avirup

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

PrimeQA: The Prime Repository for State-of-the-Art Multilingual Question Answering Research and Development

Sil, Avirup, Sen, Jaydeep, Iyer, Bhavani, Franz, Martin, Fadnis, Kshitij, Bornea, Mihaela, Rosenthal, Sara, McCarley, Scott, Zhang, Rong, Kumar, Vishwajeet, Li, Yulong, Sultan, Md Arafat, Bhat, Riyaz, Florian, Radu, Roukos, Salim

arXiv.org Artificial IntelligenceJan-25-2023

The field of Question Answering (QA) has made remarkable progress in recent years, thanks to the advent of large pre-trained language models, newer realistic benchmark datasets with leaderboards, and novel algorithms for key components such as retrievers and readers. In this paper, we introduce PRIMEQA: a one-stop and open-source QA repository with an aim to democratize QA re-search and facilitate easy replication of state-of-the-art (SOTA) QA methods. PRIMEQA supports core QA functionalities like retrieval and reading comprehension as well as auxiliary capabilities such as question generation.It has been designed as an end-to-end toolkit for various use cases: building front-end applications, replicating SOTA methods on pub-lic benchmarks, and expanding pre-existing methods. PRIMEQA is available at : https://github.com/primeqa.

computational linguistic, natural language, question answering, (17 more...)

arXiv.org Artificial Intelligence

2301.09715

Country:

Europe (0.68)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)

Add feedback

Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking

Santhanam, Keshav, Saad-Falcon, Jon, Franz, Martin, Khattab, Omar, Sil, Avirup, Florian, Radu, Sultan, Md Arafat, Roukos, Salim, Zaharia, Matei, Potts, Christopher

arXiv.org Artificial IntelligenceDec-2-2022

Neural information retrieval (IR) systems have progressed rapidly in recent years, in large part due to the release of publicly available benchmarking tasks. Unfortunately, some dimensions of this progress are illusory: the majority of the popular IR benchmarks today focus exclusively on downstream task accuracy and thus conceal the costs incurred by systems that trade away efficiency for quality. Latency, hardware cost, and other efficiency considerations are paramount to the deployment of IR systems in user-facing settings. We propose that IR benchmarks structure their evaluation methodology to include not only metrics of accuracy, but also efficiency considerations such as a query latency and the corresponding cost budget for a reproducible hardware setting. For the popular IR benchmarks MS MARCO and XOR-TyDi, we show how the best choice of IR system varies according to how these efficiency considerations are chosen and weighed. We hope that future benchmarks will adopt these guidelines toward more holistic IR evaluation.

artificial intelligence, information retrieval, natural language, (16 more...)

arXiv.org Artificial Intelligence

2212.0134

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback

SPARTAN: Sparse Hierarchical Memory for Parameter-Efficient Transformers

Deshpande, Ameet, Sultan, Md Arafat, Ferritto, Anthony, Kalyan, Ashwin, Narasimhan, Karthik, Sil, Avirup

arXiv.org Artificial IntelligenceNov-29-2022

Fine-tuning pre-trained language models (PLMs) achieves impressive performance on a range of downstream tasks, and their sizes have consequently been getting bigger. Since a different copy of the model is required for each task, this paradigm is infeasible for storage-constrained edge devices like mobile phones. In this paper, we propose SPARTAN, a parameter efficient (PE) and computationally fast architecture for edge devices that adds hierarchically organized sparse memory after each Transformer layer. SPARTAN freezes the PLM parameters and fine-tunes only its memory, thus significantly reducing storage costs by re-using the PLM backbone for different tasks. SPARTAN contains two levels of memory, with only a sparse subset of parents being chosen in the first level for each input, and children cells corresponding to those parents being used to compute an output representation. This sparsity combined with other architecture optimizations improves SPARTAN's throughput by over 90% during inference on a Raspberry Pi 4 when compared to PE baselines (adapters) while also outperforming the latter by 0.1 points on the GLUE benchmark. Further, it can be trained 34% faster in a few-shot setting, while performing within 0.9 points of adapters. Qualitative analysis shows that different parent cells in SPARTAN specialize in different topics, thus dividing responsibility efficiently.

machine learning, natural language, spartan, (18 more...)

arXiv.org Artificial Intelligence

2211.16634

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (0.40)

Industry: Information Technology > Hardware (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Zero-Shot Dynamic Quantization for Transformer Inference

El-Kurdi, Yousef, Quinn, Jerry, Sil, Avirup

arXiv.org Artificial IntelligenceNov-17-2022

We introduce a novel run-time method for significantly reducing the accuracy loss associated with quantizing BERT-like models to 8-bit integers. Existing methods for quantizing models either modify the training procedure,or they require an additional calibration step to adjust parameters that also requires a selected held-out dataset. Our method permits taking advantage of quantization without the need for these adjustments. We present results on several NLP tasks demonstrating the usefulness of this technique.

machine learning, natural language, quantization, (15 more...)

arXiv.org Artificial Intelligence

2211.09744

Country: North America > United States (1.00)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Learning Cross-Lingual IR from an English Retriever

Li, Yulong, Franz, Martin, Sultan, Md Arafat, Iyer, Bhavani, Lee, Young-Suk, Sil, Avirup

arXiv.org Artificial IntelligenceDec-15-2021

We present a new cross-lingual information retrieval (CLIR) model trained using multi-stage knowledge distillation (KD). The teacher and the student are heterogeneous systems-the former is a pipeline that relies on machine translation and monolingual IR, while the latter executes a single CLIR operation. We show that the student can learn both multilingual representations and CLIR by optimizing two corresponding KD objectives. Learning multilingual representations from an English-only retriever is accomplished using a novel cross-lingual alignment algorithm that greedily re-positions the teacher tokens for alignment. Evaluation on the XOR-TyDi benchmark shows that the proposed model is far more effective than the existing approach of fine-tuning with cross-lingual labeled IR data, with a gain in accuracy of 25.4 Recall@5kt.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2112.08185

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.47)

Add feedback

Towards Robust Neural Retrieval Models with Synthetic Pre-Training

Reddy, Revanth Gangi, Yadav, Vikas, Sultan, Md Arafat, Franz, Martin, Castelli, Vittorio, Ji, Heng, Sil, Avirup

arXiv.org Artificial IntelligenceApr-15-2021

Recent work has shown that commonly available machine reading comprehension (MRC) datasets can be used to train high-performance neural information retrieval (IR) systems. However, the evaluation of neural IR has so far been limited to standard supervised learning settings, where they have outperformed traditional term matching baselines. We conduct in-domain and out-of-domain evaluations of neural IR, and seek to improve its robustness across different scenarios, including zero-shot settings. We show that synthetic training examples generated using a sequence-to-sequence generator can be effective towards this goal: in our experiments, pre-training with synthetic examples improves retrieval performance in both in-domain and out-of-domain evaluation on five different test sets.

artificial intelligence, dataset, inductive learning, (18 more...)

arXiv.org Artificial Intelligence

2104.078

Country:

North America > United States (0.68)
Africa > Tanzania (0.48)

Genre: Research Report (0.70)

Industry: Education > Assessment & Standards > Student Performance (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.70)

Add feedback

Answer Span Correction in Machine Reading Comprehension

Reddy, Revanth Gangi, Sultan, Md Arafat, Kayi, Efsun Sarioglu, Zhang, Rong, Castelli, Vittorio, Sil, Avirup

arXiv.org Artificial IntelligenceNov-6-2020

Answer validation in machine reading comprehension (MRC) consists of verifying an extracted answer against an input context and question pair. Previous work has looked at re-assessing the "answerability" of the question given the extracted answer. Here we address a different problem: the tendency of existing MRC systems to produce partially correct answers when presented with answerable questions. We explore the nature of such errors and propose a post-processing correction method that yields statistically significant performance improvements over state-of-the-art MRC systems in both monolingual and multilingual evaluation.

artificial intelligence, natural language, prediction, (13 more...)

arXiv.org Artificial Intelligence

2011.03435

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Industry: Education > Assessment & Standards > Student Performance (0.62)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Span Selection Pre-training for Question Answering

Glass, Michael, Gliozzo, Alfio, Chakravarti, Rishav, Ferritto, Anthony, Pan, Lin, Bhargav, G P Shrivatsa, Garg, Dinesh, Sil, Avirup

arXiv.org Artificial IntelligenceSep-9-2019

BERT (Bidirectional Encoder Representations from Transformers) and related pre-trained Transformers have provided large gains across many language understanding tasks, achieving a new state-of-the-art (SOTA). BERT is pre-trained on two auxiliary tasks: Masked Language Model and Next Sentence Prediction. In this paper we introduce a new pre-training task inspired by reading comprehension and an effort to avoid encoding general knowledge in the transformer network itself. We find significant and consistent improvements over both BERT-BASE and BERT-LARGE on multiple reading comprehension (MRC) and paraphrasing datasets. Specifically, our proposed model has strong empirical evidence as it obtains SOTA results on Natural Questions, a new benchmark MRC dataset, outperforming BERT-LARGE by 3 F1 points on short answer prediction. We also establish a new SOTA in HotpotQA, improving answer prediction F1 by 4 F1 points and supporting fact prediction by 1 F1 point. Moreover, we show that our pre-training approach is particularly effective when training data is limited, improving the learning curve by a large amount.

span selection, survey article, us government, (19 more...)

arXiv.org Artificial Intelligence

1909.0412

Country:

North America > United States > Florida > Brevard County (0.14)
North America > United States > California > San Bernardino County (0.14)

Genre:

Overview (0.93)
Research Report (0.65)

Industry:

Government > Regional Government > North America Government > United States Government (0.93)
Education (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.54)

Add feedback

Neural Cross-Lingual Entity Linking

Sil, Avirup (IBM Research AI) | Kundu, Gourab (IBM) | Florian, Radu (IBM) | Hamza, Wael (IBM)

AAAI ConferencesFeb-8-2018

A major challenge in Entity Linking (EL) is making effective use of contextual information to disambiguate mentions to Wikipedia that might refer to different entities in different contexts. The problem exacerbates with cross-lingual EL which involves linking mentions written in non-English documents to entries in the English Wikipedia: to compare textual clues across languages we need to compute similarity between textual fragments across languages. In this paper, we propose a neural EL model that trains fine-grained similarities and dissimilarities between the query and candidate document from multiple perspectives, combined with convolution and tensor networks. Further, we show that this English-trained system can be applied, in zero-shot learning, to other languages by making surprisingly effective use of multi-lingual embeddings. The proposed system has strong empirical evidence yielding state-of-the-art results in English as well as cross-lingual: Spanish and Chinese TAC 2015 datasets.

deep learning, neural network, wikipedia, (20 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country:

North America > United States > Missouri > Jackson County > Kansas City (0.14)
North America > United States > Florida (0.14)

Industry: Leisure & Entertainment > Sports > Football (1.00)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Extracting Action and Event Semantics from Web Text

Sil, Avirup (Temple University) | Huang, Fei (Temple University) | Yates, Alexander (Temple University)

AAAI ConferencesNov-5-2010

Most information extraction research identifies the state of the world in text, including the entities and the relationships that exist between them. Much less attention has been paid to the understanding of dynamics, or how the state of the world changes over time. Because intelligent behavior seeks to change the state of the world in rational and utility-maximizing ways, common-sense knowledge about dynamics is essential for intelligent agents. In this paper, we describe a novel system, Prepost , that tackles the problem of extracting the preconditions and effects of actions and events, two important kinds of knowledge for connecting world state and the actions that affect it. In experiments on Web text, Prepost is able to improve by 79% over a baseline technique for identifying the effects of actions (64% improvement for preconditions).

artificial intelligence, planning & scheduling, postcondition, (19 more...)

AAAI Conferences

2010 AAAI Fall Symposium Series

Country: North America > United States (0.47)

Genre: Research Report (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.70)

Add feedback