AITopics | Jain, Arnav

Collaborating Authors

Jain, Arnav

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Data-Efficient Contrastive Language-Image Pretraining: Prioritizing Data Quality over Quantity

Joshi, Siddharth, Jain, Arnav, Payani, Ali, Mirzasoleiman, Baharan

arXiv.org Artificial IntelligenceMar-19-2024

Contrastive Language-Image Pre-training (CLIP) on large-scale image-caption datasets learns representations that can achieve remarkable zero-shot generalization. However, such models require a massive amount of pre-training data. Improving the quality of the pre-training data has been shown to be much more effective in improving CLIP's performance than increasing its volume. Nevertheless, finding small subsets of training data that provably generalize the best has remained an open question. In this work, we propose the first theoretically rigorous data selection method for CLIP. We show that subsets that closely preserve the cross-covariance of the images and captions of the full data provably achieve a superior generalization performance. Our extensive experiments on ConceptualCaptions3M and ConceptualCaptions12M demonstrate that subsets found by \method\ achieve over 2.7x and 1.4x the accuracy of the next best baseline on ImageNet and its shifted versions. Moreover, we show that our subsets obtain 1.5x the average accuracy across 11 downstream datasets, of the next best baseline. The code is available at: https://github.com/BigML-CS-UCLA/clipcov-data-efficient-clip.

artificial intelligence, machine learning, subset, (17 more...)

arXiv.org Artificial Intelligence

2403.12267

Country: Europe (0.28)

Genre: Research Report > New Finding (0.67)

Industry: Information Technology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

LB-SimTSC: An Efficient Similarity-Aware Graph Neural Network for Semi-Supervised Time Series Classification

Xi, Wenjie, Jain, Arnav, Zhang, Li, Lin, Jessica

arXiv.org Artificial IntelligenceSep-5-2023

Time series classification is an important data mining task that has received a lot of interest in the past two decades. Due to the label scarcity in practice, semi-supervised time series classification with only a few labeled samples has become popular. Recently, Similarity-aware Time Series Classification (SimTSC) is proposed to address this problem by using a graph neural network classification model on the graph generated from pairwise Dynamic Time Warping (DTW) distance of batch data. It shows excellent accuracy and outperforms state-of-the-art deep learning models in several few-label settings. However, since SimTSC relies on pairwise DTW distances, the quadratic complexity of DTW limits its usability to only reasonably sized datasets. To address this challenge, we propose a new efficient semi-supervised time series classification technique, LB-SimTSC, with a new graph construction module. Instead of using DTW, we propose to utilize a lower bound of DTW, LB_Keogh, to approximate the dissimilarity between instances in linear time, while retaining the relative proximity relationships one would have obtained via computing DTW. We construct the pairwise distance matrix using LB_Keogh and build a graph for the graph neural network. We apply this approach to the ten largest datasets from the well-known UCR time series classification archive. The results demonstrate that this approach can be up to 104x faster than SimTSC when constructing the graph on large datasets without significantly decreasing classification accuracy.

artificial intelligence, deep learning, machine learning, (3 more...)

arXiv.org Artificial Intelligence

2301.04838

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Add feedback

Confidence-Calibrated Ensemble Dense Phrase Retrieval

Yang, William, Bergam, Noah, Jain, Arnav, Sheikhoslami, Nima

arXiv.org Artificial IntelligenceJun-28-2023

The passage retrieval problem, which is of central The principal limitation to this approach is its dependence importance in search engine optimization and text on explicit term matches between the analytics, entails the following: given a set of documents query and the context. In many cases, the correct and a query, determine which document best context-query pair may have no words in common.

information retrieval, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2306.15917

Country: North America > United States (0.30)

Genre: Research Report > New Finding (0.69)

Industry:

Law (0.49)
Government (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Information Management (0.87)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.35)

Add feedback