AITopics | Singh, Shruti

Collaborating Authors

Singh, Shruti

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Efficient Distributed Training through Gradient Compression with Sparsification and Quantization Techniques

Singh, Shruti, Kumar, Shantanu

arXiv.org Artificial IntelligenceDec-7-2024

This study investigates the impact of gradient compression on distributed training performance, focusing on sparsification and quantization techniques, including top-k, DGC, and QSGD. In baseline experiments, random-k compression results in severe performance degradation, highlighting its inefficacy. In contrast, using top-k and DGC at 50 times compression yields performance improvements, reducing perplexity by up to 0.06 compared to baseline. Experiments across 1, 2, and 4 workers demonstrate that conservative sparsification can have a regularizing effect, especially for smaller models, while compression ratios above 5000 times impair performance, particularly for DGC. Communication times are reduced across all compression methods, with top-k and DGC decreasing communication to negligible levels at high compression ratios. However, increased computation times offset this efficiency for top-k due to sorting demands, making it less scalable than DGC or QSGD. In convergence tests, sparsification techniques show accelerated convergence, requiring fewer epochs than the baseline, which has implications for computational savings. Although precision trade-offs emerge, floating point errors are mitigated by compression. This study's findings underscore the need to tune hyperparameters specifically for each compression technique to achieve optimal model performance, especially in distributed training systems.

cloud computing, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2502.07634

Country: North America > United States > Washington (0.14)

Genre:

Overview (1.00)
Research Report > New Finding (0.88)

Industry: Information Technology (0.93)

Technology:

Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

SciDQA: A Deep Reading Comprehension Dataset over Scientific Papers

Singh, Shruti, Sarkar, Nandan, Cohan, Arman

arXiv.org Artificial IntelligenceNov-8-2024

Scientific literature is typically dense, requiring significant background knowledge and deep comprehension for effective engagement. We introduce SciDQA, a new dataset for reading comprehension that challenges LLMs for a deep understanding of scientific articles, consisting of 2,937 QA pairs. Unlike other scientific QA datasets, SciDQA sources questions from peer reviews by domain experts and answers by paper authors, ensuring a thorough examination of the literature. We enhance the dataset's quality through a process that carefully filters out lower quality questions, decontextualizes the content, tracks the source document across different versions, and incorporates a bibliography for multi-document question-answering. Questions in SciDQA necessitate reasoning across figures, tables, equations, appendices, and supplementary materials, and require multi-document reasoning. We evaluate several open-source and proprietary LLMs across various configurations to explore their capabilities in generating relevant and factual responses. Our comprehensive evaluation, based on metrics for surface-level similarity and LLM judgements, highlights notable performance discrepancies. SciDQA represents a rigorously curated, naturally derived scientific QA dataset, designed to facilitate research on complex scientific text understanding.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2411.05338

Country:

Asia (0.67)
North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Industry:

Education > Assessment & Standards > Student Performance (0.61)
Energy > Oil & Gas > Upstream (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature

Wadden, David, Shi, Kejian, Morrison, Jacob, Naik, Aakanksha, Singh, Shruti, Barzilay, Nitzan, Lo, Kyle, Hope, Tom, Soldaini, Luca, Shen, Shannon Zejiang, Downey, Doug, Hajishirzi, Hannaneh, Cohan, Arman

arXiv.org Artificial IntelligenceJun-18-2024

We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks covering five essential scientific literature understanding capabilities: information extraction, summarization, question answering, claim verification, and classification. SciRIFF demonstrations are notable for their long input contexts, detailed task specifications, and complex structured outputs. While instruction-following resources are available in specific domains such as clinical medicine and chemistry, SciRIFF is the first dataset focused on extracting and synthesizing information from research literature across a wide range of scientific fields. To demonstrate the utility of SciRIFF, we develop a sample-efficient strategy to adapt a general instruction-following model for science by performing additional finetuning on a mix of general-domain and SciRIFF demonstrations. In evaluations on nine held-out scientific tasks, our model -- called SciTulu -- improves over a strong LLM baseline by 28.1% and 6.5% at the 7B and 70B scales respectively, while maintaining general instruction-following performance within 2% of the baseline. We are optimistic that SciRIFF will facilitate the development and evaluation of LLMs to help researchers navigate the ever-growing body of scientific literature. We release our dataset, model checkpoints, and data processing and evaluation code to enable further research.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2406.07835

Country:

North America > United States (1.00)
Asia > Middle East > UAE (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Consumer Health (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.93)
Government > Regional Government > North America Government > United States Government > FDA (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Sparse Graph Representations for Procedural Instructional Documents

Singh, Shruti, Gupta, Rishabh

arXiv.org Artificial IntelligenceFeb-6-2024

Computation of document similarity is a critical task in various NLP domains that has applications in deduplication, matching, and recommendation. Traditional approaches for document similarity computation include learning representations of documents and employing a similarity or a distance function over the embeddings. However, pairwise similarities and differences are not efficiently captured by individual representations. Graph representations such as Joint Concept Interaction Graph (JCIG) represent a pair of documents as a joint undirected weighted graph. JCIGs facilitate an interpretable representation of document pairs as a graph. However, JCIGs are undirected, and don't consider the sequential flow of sentences in documents. We propose two approaches to model document similarity by representing document pairs as a directed and sparse JCIG that incorporates sequential information. We propose two algorithms inspired by Supergenome Sorting and Hamiltonian Path that replace the undirected edges with directed edges. Our approach also sparsifies the graph to $O(n)$ edges from JCIG's worst case of $O(n^2)$. We show that our sparse directed graph model architecture consisting of a Siamese encoder and GCN achieves comparable results to the baseline on datasets not containing sequential information and beats the baseline by ten points on an instructional documents dataset containing sequential information.

machine learning, natural language, sequential information, (16 more...)

arXiv.org Artificial Intelligence

2402.03957

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

LEGOBench: Leaderboard Generation Benchmark for Scientific Models

Singh, Shruti, Alam, Shoaib, Singh, Mayank

arXiv.org Artificial IntelligenceJan-11-2024

The ever-increasing volume of paper submissions makes it difficult to stay informed about the latest state-of-the-art research. To address this challenge, we introduce LEGOBench, a benchmark for evaluating systems that generate leaderboards. LEGOBench is curated from 22 years of preprint submission data in arXiv and more than 11,000 machine learning leaderboards in the PapersWithCode portal. We evaluate the performance of four traditional graph-based ranking variants and three recently proposed large language models. Our preliminary results show significant performance gaps in automatic leaderboard generation. The code is available on https://github.com/lingo-iitgn/LEGOBench and the dataset is hosted on https://osf.io/9v2py/?view_only=6f91b0b510df498ba01595f8f278f94c .

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2401.06233

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.88)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)

Add feedback

Unlocking Model Insights: A Dataset for Automated Model Card Generation

Singh, Shruti, Lodwal, Hitesh, Malwat, Husain, Thakur, Rakesh, Singh, Mayank

arXiv.org Artificial IntelligenceSep-22-2023

Language models (LMs) are no longer restricted to ML community, and instruction-tuned LMs have led to a rise in autonomous AI agents. As the accessibility of LMs grows, it is imperative that an understanding of their capabilities, intended usage, and development cycle also improves. Model cards are a popular practice for documenting detailed information about an ML model. To automate model card generation, we introduce a dataset of 500 question-answer pairs for 25 ML models that cover crucial aspects of the model, such as its training configurations, datasets, biases, architecture details, and training resources. We employ annotators to extract the answers from the original paper. Further, we explore the capabilities of LMs in generating model cards by answering questions. Our initial experiments with ChatGPT-3.5, LLaMa, and Galactica showcase a significant gap in the understanding of research papers by these aforementioned LMs as well as generating factual textual responses. We posit that our dataset can be used to train models to automate the generation of model cards from paper text and reduce human effort in the model card curation process. The complete dataset is available on https://osf.io/hqt7p/?view_only=3b9114e3904c4443bcd9f5c270158d37

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2309.12616

Country: Asia > India > Gujarat (0.14)

Genre:

Research Report (0.50)
Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Understanding Attention: In Minds and Machines

Sawant, Shriraj P., Singh, Shruti

arXiv.org Artificial IntelligenceDec-4-2020

Attention is a complex and broad concept, studied across multiple disciplines spanning artificial intelligence, cognitive science, psychology, neuroscience, and related fields. Although many of the ideas regarding attention do not significantly overlap among these fields, there is a common theme of adaptive control of limited resources. In this work, we review the concept and variants of attention in artificial neural networks (ANNs). We also discuss the origin of attention from the neuroscience point of view parallel to that of ANNs. Instead of having seemingly disconnected dialogues between varied disciplines, we suggest grounding the ideas on common conceptual frameworks for a systematic analysis of attention and towards possible unification of ideas in AI and Neuroscience.

deep learning, neural network, survey article, (20 more...)

arXiv.org Artificial Intelligence

2012.02659

Country: North America > United States > California (0.14)

Genre:

Research Report (0.64)
Overview (0.48)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

Event Representations for Automated Story Generation with Deep Neural Nets

Martin, Lara J. (Georgia Institute of Technology) | Ammanabrolu, Prithviraj (Georgia Institute of Technology) | Wang, Xinyu (Georgia Institute of Technology) | Hancock, William (Georgia Institute of Technology) | Singh, Shruti (Georgia Institute of Technology) | Harrison, Brent (Georgia Institute of Technology) | Riedl, Mark O. (Georgia Institute of Technology)

AAAI ConferencesFeb-8-2018

Automated story generation is the problem of automatically selecting a sequence of events, actions, or words that can be told as a story. We seek to develop a system that can generate stories by learning everything it needs to know from textual story corpora. To date, recurrent neural networks that learn language models at character, word, or sentence levels have had little success generating coherent stories. We explore the question of event representations that provide a mid-level of abstraction between words and sentences in order to retain the semantic information of the original data while minimizing event sparsity. We present a technique for preprocessing textual story data into event sequences. We then present a technique for automated story generation whereby we decompose the problem into the generation of successive events (event2event) and the generation of natural language sentences from events (event2sentence). We give empirical results comparing different event representations and their effects on event successor generation and the translation of events to natural language.

deep learning, neural network, representation, (19 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: North America > United States (0.68)

Technology: