AITopics | Jain, Nishant

Collaborating Authors

Jain, Nishant

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Training-efficient density quantum machine learning

Coyle, Brian, Cherrat, El Amine, Jain, Nishant, Mathur, Natansh, Raj, Snehal, Kazdaghli, Skander, Kerenidis, Iordanis

arXiv.org Artificial IntelligenceMay-30-2024

Quantum machine learning requires powerful, flexible and efficiently trainable models to be successful in solving challenging problems. In this work, we present density quantum neural networks, a learning model incorporating randomisation over a set of trainable unitaries. These models generalise quantum neural networks using parameterised quantum circuits, and allow a trade-off between expressibility and efficient trainability, particularly on quantum hardware. We demonstrate the flexibility of the formalism by applying it to two recently proposed model families. The first are commuting-block quantum neural networks (QNNs) which are efficiently trainable but may be limited in expressibility. The second are orthogonal (Hamming-weight preserving) quantum neural networks which provide well-defined and interpretable transformations on data but are challenging to train at scale on quantum devices. Density commuting QNNs improve capacity with minimal gradient complexity overhead, and density orthogonal neural networks admit a quadratic-to-constant gradient query advantage with minimal to no performance loss. We conduct numerical experiments on synthetic translationally invariant data and MNIST image data with hyperparameter optimisation to support our findings. Finally, we discuss the connection to post-variational quantum neural networks, measurement-based quantum machine learning and the dropout mechanism.

artificial intelligence, machine learning, neural network, (19 more...)

arXiv.org Artificial Intelligence

2405.20237

Country: Europe > France > Île-de-France > Paris > Paris (0.14)

Genre: Research Report > New Finding (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Improving Generalization via Meta-Learning on Hard Samples

Jain, Nishant, Suggala, Arun S., Shenoy, Pradeep

arXiv.org Artificial IntelligenceMar-29-2024

Learned reweighting (LRW) approaches to supervised learning use an optimization criterion to assign weights for training instances, in order to maximize performance on a representative validation dataset. We pose and formalize the problem of optimized selection of the validation set used in LRW training, to improve classifier generalization. In particular, we show that using hard-to-classify instances in the validation set has both a theoretical connection to, and strong empirical evidence of generalization. We provide an efficient algorithm for training this meta-optimized model, as well as a simple train-twice heuristic for careful comparative study. We demonstrate that LRW with easy validation data performs consistently worse than LRW with hard validation data, establishing the validity of our meta-optimization problem. Our proposed algorithm outperforms a wide range of baselines on a range of datasets and domain shift challenges (Imagenet-1K, CIFAR-100, Clothing-1M, CAMELYON, WILDS, etc.), with ~1% gains using VIT-B on Imagenet. We also show that using naturally hard examples for validation (Imagenet-R / Imagenet-A) in LRW training for Imagenet improves performance on both clean and naturally hard test instances by 1-2%. Secondary analyses show that using hard validation data in an LRW framework improves margins on test data, hinting at the mechanism underlying our empirical gains. We believe this work opens up new research directions for the meta-optimization of meta-learning in a supervised learning context.

artificial intelligence, inductive learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2403.12236

Country:

South America > Peru (0.14)
Asia (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback

Selective classification using a robust meta-learning approach

Jain, Nishant, Shanmugam, Karthikeyan, Shenoy, Pradeep

arXiv.org Artificial IntelligenceJan-2-2024

Predictive uncertainty-a model's self awareness regarding its accuracy on an input-is key for both building robust models via training interventions and for test-time applications such as selective classification. We propose a novel instance-conditioned reweighting approach that captures predictive uncertainty using an auxiliary network and unifies these train- and test-time applications. The auxiliary network is trained using a meta-objective in a bilevel optimization framework. A key contribution of our proposal is the meta-objective of minimizing the dropout variance, an approximation of Bayesian Predictive uncertainty. We show in controlled experiments that we effectively capture the diverse specific notions of uncertainty through this meta-objective, while previous approaches only capture certain aspects. These results translate to significant gains in real-world settings-selective classification, label noise, domain adaptation, calibration-and across datasets-Imagenet, Cifar100, diabetic retinopathy, Camelyon, WILDs, Imagenet-C,-A,-R, Clothing1M, etc. For Diabetic Retinopathy, we see upto 3.4%/3.3% accuracy and AUC gains over SOTA in selective classification. We also improve upon large-scale pretrained models such as PLEX.

artificial intelligence, dataset, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2212.05987

Country: Asia (0.14)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.67)
Research Report > Strength High (0.54)

Industry: Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Instance-Conditional Timescales of Decay for Non-Stationary Learning

Jain, Nishant, Shenoy, Pradeep

arXiv.org Artificial IntelligenceDec-20-2023

Slow concept drift is a ubiquitous, yet under-studied problem in practical machine learning systems. In such settings, although recent data is more indicative of future data, naively prioritizing recent instances runs the risk of losing valuable information from the past. We propose an optimization-driven approach towards balancing instance importance over large training windows. First, we model instance relevance using a mixture of multiple timescales of decay, allowing us to capture rich temporal trends. Second, we learn an auxiliary scorer model that recovers the appropriate mixture of timescales as a function of the instance itself. Finally, we propose a nested optimization objective for learning the scorer, by which it maximizes forward transfer for the learned model. Experiments on a large real-world dataset of 39M photos over a 9 year period show upto 15% relative gains in accuracy compared to other robust learning baselines. We replicate our gains on two collections of real-world datasets for non-stationary learning, and extend our work to continual learning settings where, too, we beat SOTA methods by large margins.

artificial intelligence, dataset, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2212.05908

Country:

Europe > Netherlands (0.14)
Europe > France (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Education > Educational Setting (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

MTCNET: Multi-task Learning Paradigm for Crowd Count Estimation

Kumar, Abhay, Jain, Nishant, Tripathi, Suraj, Singh, Chirag, Krishna, Kamal

arXiv.org Machine LearningAug-22-2019

We propose a Multi-Task Learning (MTL) paradigm based deep neural network architecture, called MTCNet (Multi-Task Crowd Network) for crowd density and count estimation. Crowd count estimation is challenging due to the non-uniform scale variations and the arbitrary perspective of an individual image. The proposed model has two related tasks, with Crowd Density Estimation as the main task and Crowd-Count Group Classification as the auxiliary task. The auxiliary task helps in capturing the relevant scale-related information to improve the performance of the main task. The main task model comprises two blocks: VGG-16 front-end for feature extraction and a dilated Convolutional Neural Network for density map generation. The auxiliary task model shares the same front-end as the main task, followed by a CNN classifier. Our proposed network achieves 5.8% and 14.9% lower Mean Absolute Error (MAE) than the state-of-the-art methods on ShanghaiTech dataset without using any data augmentation. Our model also outperforms with 10.5% lower MAE on UCF_CC_50 dataset.

auxiliary task, deep learning, neural network, (17 more...)

arXiv.org Machine Learning

1908.08652

Country:

North America > United States (0.68)
Asia (0.46)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

From Fully Supervised to Zero Shot Settings for Twitter Hashtag Recommendation

Kumar, Abhay, Jain, Nishant, Tripathi, Suraj, Singh, Chirag

arXiv.org Machine LearningJun-11-2019

We propose a comprehensive end-to-end pipeline for Twitter hashtags recommendation system including data collection, supervised training setting and zero shot training setting. In the supervised training setting, we have proposed and compared the performance of various deep learning architectures, namely Convolutional Neural Network (CNN), Recurrent Neural Network (RNN) and Transformer Network. However, it is not feasible to collect data for all possible hashtag labels and train a classifier model on them. To overcome this limitation, we propose a Zero Shot Learning (ZSL) paradigm for predicting unseen hashtag labels by learning the relationship between the semantic space of tweets and the embedding space of hashtag labels. We evaluated various state-of-the-art ZSL methods like Convex combination of Semantic Embedding (ConSE), Embarrassingly Simple Zero-Shot Learning (ESZSL) and Deep Embedding Model for Zero-Shot Learning (DEM-ZSL) for the hashtag recommendation task. We demonstrate the effectiveness and scalability of ZSL methods for the recommendation of unseen hashtags. To the best of our knowledge, this is the first quantitative evaluation of ZSL methods to date for unseen hashtags recommendations from tweet text.

deep learning, hashtag, neural network, (22 more...)

arXiv.org Machine Learning

1906.04914

Country: Asia > India (0.14)

Genre: Research Report (0.64)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exploiting SIFT Descriptor for Rotation Invariant Convolutional Neural Network

Kumar, Abhay, Jain, Nishant, Singh, Chirag, Tripathi, Suraj

arXiv.org Machine LearningMar-30-2019

This paper presents a novel approach to exploit the distinctive invariant features in convolutional neural network. The proposed CNN model uses Scale Invariant Feature Transform (SIFT) descriptor instead of the max-pooling layer. Max-pooling layer discards the pose, i.e., translational and rotational relationship between the low-level features, and hence unable to capture the spatial hierarchies between low and high level features. The SIFT descriptor layer captures the orientation and the spatial relationship of the features extracted by convolutional layer. The proposed SIFT Descriptor CNN therefore combines the feature extraction capabilities of CNN model and rotation invariance of SIFT descriptor. Experimental results on the MNIST and fashionMNIST datasets indicates reasonable improvements over conventional methods available in literature.

deep learning, descriptor, neural network, (19 more...)

arXiv.org Machine Learning

1904.00197

Country: Asia > India (0.16)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback