AITopics | Mummadi, Chaithanya Kumar

Collaborating Authors

Mummadi, Chaithanya Kumar

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Text-driven Prompt Generation for Vision-Language Models in Federated Learning

Qiu, Chen, Li, Xingyu, Mummadi, Chaithanya Kumar, Ganesh, Madan Ravi, Li, Zhenzhen, Peng, Lu, Lin, Wan-Yi

arXiv.org Artificial IntelligenceOct-9-2023

Prompt learning for vision-language models, e.g., CoOp, has shown great success in adapting CLIP to different downstream tasks, making it a promising solution for federated learning due to computational reasons. Existing prompt learning techniques replace hand-crafted text prompts with learned vectors that offer improvements on seen classes, but struggle to generalize to unseen classes. Our work addresses this challenge by proposing Federated Text-driven Prompt Generation (FedTPG), which learns a unified prompt generation network across multiple remote clients in a scalable manner. The prompt generation network is conditioned on task-related text input, thus is context-aware, making it suitable to generalize for both seen and unseen classes. Our comprehensive empirical evaluations on nine diverse image classification datasets show that our method is superior to existing federated prompt learning methods, that achieve overall better generalization on both seen and unseen classes and is also generalizable to unseen datasets. Vision-language models have recently emerged as a transformative technology for machine learning applications. Seminal contributions like Contrastive Language-Image Pretraining (CLIP) Radford et al. (2021) have demonstrated unprecedented capabilities in diverse image classification tasks. Different classification methods often leverage manually-engineered text prompts, such as "a photo of a [class]," to utilize CLIP's rich semantic features (Jia et al., 2021). CLIP has shown its robustness and versatility in handling a wide range of image distributions.

artificial intelligence, machine learning, prompt vector, (15 more...)

arXiv.org Artificial Intelligence

2310.06123

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.55)

Add feedback

More Context, Less Distraction: Zero-shot Visual Classification by Inferring and Conditioning on Contextual Attributes

An, Bang, Zhu, Sicheng, Panaitescu-Liess, Michael-Andrei, Mummadi, Chaithanya Kumar, Huang, Furong

arXiv.org Artificial IntelligenceOct-8-2023

Vision-language models like CLIP are widely used in zero-shot image classification due to their ability to understand various visual concepts and natural language descriptions. However, how to fully leverage CLIP's unprecedented human-like understanding capabilities to achieve better performance is still an open question. This paper draws inspiration from the human visual perception process: when classifying an object, humans first infer contextual attributes (e.g., background and orientation) which help separate the foreground object from the background, and then classify the object based on this information. Inspired by it, we observe that providing CLIP with contextual attributes improves zero-shot image classification and mitigates reliance on spurious features. We also observe that CLIP itself can reasonably infer the attributes from an image. With these observations, we propose a training-free, two-step zero-shot classification method PerceptionCLIP. Given an image, it first infers contextual attributes (e.g., background) and then performs object classification conditioning on them. Our experiments show that PerceptionCLIP achieves better generalization, group robustness, and interpretability. For example, PerceptionCLIP with ViT-L/14 improves the worst group accuracy by 16.5% on the Waterbirds dataset and by 3.5% on CelebA.

contextual, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2308.01313

Country:

North America > United States (1.00)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Government > Regional Government > North America Government > United States Government (0.67)
Government > Military (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models

Metzen, Jan Hendrik, Saranrittichai, Piyapat, Mummadi, Chaithanya Kumar

arXiv.org Artificial IntelligenceSep-29-2023

Classifiers built upon vision-language models such as CLIP have shown remarkable zero-shot performance across a broad range of image classification tasks. Prior work has studied different ways of automatically creating descriptor sets for every class based on prompt templates, ranging from manually engineered templates over templates obtained from a large language model to templates built from random words and characters. Up until now, deriving zero-shot classifiers from the respective encoded class descriptors has remained nearly unchanged, i.e., classify to the class that maximizes cosine similarity between its averaged encoded class descriptors and the image encoding. However, weighing all class descriptors equally can be suboptimal when certain descriptors match visual clues on a given image better than others. In this work, we propose AutoCLIP, a method for auto-tuning zero-shot classifiers. AutoCLIP tunes per-image weights to each prompt template at inference time, based on statistics of class descriptor-image similarities. AutoCLIP is fully unsupervised, has very low computational overhead, and can be easily implemented in few lines of code. We show that AutoCLIP outperforms baselines across a broad range of vision-language models, datasets, and prompt templates consistently and by up to 3 percent point accuracy.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2309.16414

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Test-Time Adaptation to Distribution Shift by Confidence Maximization and Input Transformation

Mummadi, Chaithanya Kumar, Hutmacher, Robin, Rambach, Kilian, Levinkov, Evgeny, Brox, Thomas, Metzen, Jan Hendrik

arXiv.org Machine LearningJun-28-2021

Deep neural networks often exhibit poor performance on data that is unlikely under the train-time data distribution, for instance data affected by corruptions. Previous works demonstrate that test-time adaptation to data shift, for instance using entropy minimization, effectively improves performance on such shifted distributions. This paper focuses on the fully test-time adaptation setting, where only unlabeled data from the target distribution is required. This allows adapting arbitrary pretrained networks. Specifically, we propose a novel loss that improves test-time adaptation by addressing both premature convergence and instability of entropy minimization. This is achieved by replacing the entropy by a non-saturating surrogate and adding a diversity regularizer based on batch-wise entropy maximization that prevents convergence to trivial collapsed solutions. Moreover, we propose to prepend an input transformation module to the network that can partially undo test-time distribution shifts. Surprisingly, this preprocessing can be learned solely using the fully test-time adaptation loss in an end-to-end fashion without any target domain labels or source domain data. We show that our approach outperforms previous work in improving the robustness of publicly available pretrained image classifiers to common corruptions on such challenging benchmarks as ImageNet-C.

artificial intelligence, machine learning, neural network, (7 more...)

arXiv.org Machine Learning

2106.14999

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.53)

Add feedback

SELF: Learning to Filter Noisy Labels with Self-Ensembling

Nguyen, Duc Tam, Mummadi, Chaithanya Kumar, Ngo, Thi Phuong Nhung, Nguyen, Thi Hoai Phuong, Beggel, Laura, Brox, Thomas

arXiv.org Machine LearningOct-4-2019

Deep neural networks (DNNs) have been shown to over-fit a dataset when being trained with noisy labels for a long enough time. To overcome this problem, we present a simple and effective method self-ensemble label filtering (SELF) to progressively filter out the wrong labels during training. Our method improves the task performance by gradually allowing supervision only from the potentially non-noisy (clean) labels and stops learning on the filtered noisy labels. For the filtering, we form running averages of predictions over the entire training dataset using the network output at different training epochs. We show that these ensemble estimates yield more accurate identification of inconsistent predictions throughout training than the single estimates of the network at the most recent training epoch. While filtered samples are removed entirely from the supervised training loss, we dynamically leverage them via semi-supervised learning in the unsupervised loss. We demonstrate the positive effect of such an approach on various image classification tasks under both symmetric and asymmetric label noise and at different noise ratios. It substantially outperforms all previous works on noise-aware learning across different datasets and can be applied to a broad set of network architectures. The acquisition of large quantities of a high-quality human annotation is a frequent bottleneck in applying DNNs. There are two cheap but imperfect alternatives to collect annotation at large scale: crowdsourcing from non-experts and web annotations, particularly for image data where the tags and online query keywords are treated as valid labels. Both these alternatives typically introduce noisy (wrong) labels. While Rolnick et al. (2017) empirically demonstrated that DNNs can be surprisingly robust to label noise under certain conditions, Zhang et al. (2017) has shown that DNNs have the capacity to memorize the data and will do so eventually when being confronted with too many noisy labels. Consequently, training DNNs with traditional learning procedures on noisy data strongly deteriorates their ability to generalize - a severe problem.

deep learning, prediction, self study, (19 more...)

arXiv.org Machine Learning

1910.01842

Country: Europe > Germany > Baden-Württemberg (0.14)

Genre: Research Report (0.40)

Industry: Education (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Group Pruning using a Bounded-Lp norm for Group Gating and Regularization

Mummadi, Chaithanya Kumar, Genewein, Tim, Zhang, Dan, Brox, Thomas, Fischer, Volker

arXiv.org Machine LearningAug-9-2019

Deep neural networks achieve state-of-the-art results on several tasks while increasing in complexity. It has been shown that neural networks can be pruned during training by imposing sparsity inducing regularizers. In this paper, we investigate two techniques for group-wise pruning during training in order to improve network efficiency. We propose a gating factor after every convolutional layer to induce channel level sparsity, encouraging insignificant channels to become exactly zero. Further, we introduce and analyse a bounded variant of the L1 regularizer, which interpolates between L1 and L0-norms to retain performance of the network at higher pruning rates. To underline effectiveness of the proposed methods,we show that the number of parameters of ResNet-164, DenseNet-40 and MobileNetV2 can be reduced down by 30%, 69% and 75% on CIFAR100 respectively without a significant drop in accuracy. We achieve state-of-the-art pruning results for ResNet-50 with higher accuracy on ImageNet. Furthermore, we show that the light weight MobileNetV2 can further be compressed on ImageNet without a significant drop in performance.

deep learning, neural network, null 1, (22 more...)

arXiv.org Machine Learning

1908.03463

Country:

Europe > Germany (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Defending against Universal Perturbations with Shared Adversarial Training

Mummadi, Chaithanya Kumar, Brox, Thomas, Metzen, Jan Hendrik

arXiv.org Machine LearningDec-10-2018

Classifiers such as deep neural networks have been shown to be vulnerable against adversarial perturbations on problems with high-dimensional input space. While adversarial training improves the robustness of image classifiers against such adversarial perturbations, it leaves them sensitive to perturbations on a non-negligible fraction of the inputs. In this work, we show that adversarial training is more effective in preventing universal perturbations, where the same perturbation needs to fool a classifier on many inputs. Moreover, we investigate the trade-off between robustness against universal perturbations and performance on unperturbed data and propose an extension of adversarial training that handles this trade-off more gracefully. We present results for image classification and semantic segmentation to showcase that universal perturbations that fool a model hardened with adversarial training become clearly perceptible and show patterns of the target scene.

deep learning, neural network, perturbation, (19 more...)

arXiv.org Machine Learning

1812.03705

Country:

North America > United States > Nevada (0.14)
North America > United States > Hawaii (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback