AITopics | Sethi, Amit

Collaborating Authors

Sethi, Amit

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

WavePaint: Resource-efficient Token-mixer for Self-supervised Inpainting

Jeevan, Pranav, Kumar, Dharshan Sampath, Sethi, Amit

arXiv.org Artificial IntelligenceJul-1-2023

Image inpainting, which refers to the synthesis of missing regions in an image, can help restore occluded or degraded areas and also serve as a precursor task for self-supervision. The current state-of-the-art models for image inpainting are computationally heavy as they are based on transformer or CNN backbones that are trained in adversarial or diffusion settings. This paper diverges from vision transformers by using a computationally-efficient WaveMix-based fully convolutional architecture -- WavePaint. It uses a 2D-discrete wavelet transform (DWT) for spatial and multi-resolution token-mixing along with convolutional layers. The proposed model outperforms the current state-of-the-art models for image inpainting on reconstruction quality while also using less than half the parameter count and considerably lower training and evaluation times. Our model even outperforms current GAN-based architectures in CelebA-HQ dataset without using an adversarially trainable discriminator. Our work suggests that neural architectures that are modeled after natural image priors require fewer parameters and computations to achieve generalization comparable to transformers.

artificial intelligence, data quality, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2307.00407

Country: Asia > India (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Data Science > Data Quality > Data Transformation (0.87)

Add feedback

WaveMix: A Resource-efficient Neural Network for Image Analysis

Jeevan, Pranav, Viswanathan, Kavitha, S, Anandu A, Sethi, Amit

arXiv.org Artificial IntelligenceMar-15-2023

We propose WaveMix -- a novel neural architecture for computer vision that is resource-efficient yet generalizable and scalable. WaveMix networks achieve comparable or better accuracy than the state-of-the-art convolutional neural networks, vision transformers, and token mixers for several tasks, establishing new benchmarks for segmentation on Cityscapes; and for classification on Places-365, five EMNIST datasets, and iNAT-mini. Remarkably, WaveMix architectures require fewer parameters to achieve these benchmarks compared to the previous state-of-the-art. Moreover, when controlled for the number of parameters, WaveMix requires lesser GPU RAM, which translates to savings in time, cost, and energy. To achieve these gains we used multi-level two-dimensional discrete wavelet transform (2D-DWT) in WaveMix blocks, which has the following advantages: (1) It reorganizes spatial information based on three strong image priors -- scale-invariance, shift-invariance, and sparseness of edges, (2) in a lossless manner without adding parameters, (3) while also reducing the spatial sizes of feature maps, which reduces the memory and time required for forward and backward passes, and (4) expanding the receptive field faster than convolutions do. The whole architecture is a stack of self-similar and resolution-preserving WaveMix blocks, which allows architectural flexibility for various tasks and levels of resource availability. Our code and trained models are publicly available.

artificial intelligence, data quality, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2205.14375

Country: Asia > India (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

EGFR Mutation Prediction of Lung Biopsy Images using Deep Learning

Gupta, Ravi Kant, Nandgaonkar, Shivani, Kurian, Nikhil Cherian, Rane, Swapnil, Sethi, Amit

arXiv.org Artificial IntelligenceMar-13-2023

The standard diagnostic procedures for targeted therapies in lung cancer treatment involve histological subtyping and subsequent detection of key driver mutations, such as EGFR. Even though molecular profiling can uncover the driver mutation, the process is often expensive and time-consuming. Deep learning-oriented image analysis offers a more economical alternative for discovering driver mutations directly from whole slide images (WSIs). In this work, we used customized deep learning pipelines with weak supervision to identify the morphological correlates of EGFR mutation from hematoxylin and eosin-stained WSIs, in addition to detecting tumor and histologically subtyping it. We demonstrate the effectiveness of our pipeline by conducting rigorous experiments and ablation studies on two lung cancer datasets - TCGA and a private dataset from India. With our pipeline, we achieved an average area under the curve (AUC) of 0.964 for tumor detection, and 0.942 for histological subtyping between adenocarcinoma and squamous cell carcinoma on the TCGA dataset. For EGFR detection, we achieved an average AUC of 0.864 on the TCGA dataset and 0.783 on the dataset from India. Our key learning points include the following. Firstly, there is no particular advantage of using a feature extractor layers trained on histology, if one is going to fine-tune the feature extractor on the target dataset. Secondly, selecting patches with high cellularity, presumably capturing tumor regions, is not always helpful, as the sign of a disease class may be present in the tumor-adjacent stroma.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2208.12506

Country: Asia > India (0.58)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Lung Cancer (0.72)
Health & Medicine > Therapeutic Area > Oncology > Carcinoma (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Vision Xformers: Efficient Attention for Image Classification

Jeevan, Pranav, Sethi, Amit

arXiv.org Artificial IntelligenceAug-2-2021

We propose three improvements to vision transformers (ViT) to reduce the number of trainable parameters without compromising classification accuracy. We address two shortcomings of the early ViT architectures -- quadratic bottleneck of the attention mechanism and the lack of an inductive bias in their architectures that rely on unrolling the two-dimensional image structure. Linear attention mechanisms overcome the bottleneck of quadratic complexity, which restricts application of transformer models in vision tasks. We modify the ViT architecture to work on longer sequence data by replacing the quadratic attention with efficient transformers, such as Performer, Linformer and Nystr\"omformer of linear complexity creating Vision X-formers (ViX). We show that all three versions of ViX may be more accurate than ViT for image classification while using far fewer parameters and computational resources. We also compare their performance with FNet and multi-layer perceptron (MLP) mixer. We further show that replacing the initial linear embedding layer by convolutional layers in ViX further increases their performance. Furthermore, our tests on recent vision transformer models, such as LeViT, Convolutional vision Transformer (CvT), Compact Convolutional Transformer (CCT) and Pooling-based Vision Transformer (PiT) show that replacing the attention with Nystr\"omformer or Performer saves GPU usage and memory without deteriorating the classification accuracy. We also show that replacing the standard learnable 1D position embeddings in ViT with Rotary Position Embedding (RoPE) give further improvements in accuracy. Incorporating these changes can democratize transformers by making them accessible to those with limited data and computing resources.

artificial intelligence, neural network, transformer, (16 more...)

arXiv.org Artificial Intelligence

2107.02239

Country: Asia > India (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Activation Functions: Do They Represent A Trade-Off Between Modular Nature of Neural Networks And Task Performance

Aswani, Himanshu Pradeep, Sethi, Amit

arXiv.org Machine LearningSep-16-2020

Current research suggests that the key factors in designing neural network architectures involve choosing number of filters for every convolution layer, number of hidden neurons for every fully connected layer, dropout and pruning. The default activation function in most cases is the ReLU, as it has empirically shown faster training convergence. We explore whether ReLU is the best choice if one is aiming to desire better modularity structure within a neural network.

activation function, artificial intelligence, neural network, (13 more...)

arXiv.org Machine Learning

2009.07793

Genre: Research Report > New Finding (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Functional Space Variational Inference for Uncertainty Estimation in Computer Aided Diagnosis

Poduval, Pranav, Loya, Hrushikesh, Sethi, Amit

arXiv.org Artificial IntelligenceMay-28-2020

Deep neural networks have revolutionized medical image analysis and disease diagnosis. Despite their impressive performance, it is difficult to generate well-calibrated probabilistic outputs for such networks, which makes them uninterpretable black boxes. Bayesian neural networks provide a principled approach for modelling uncertainty and increasing patient safety, but they have a large computational overhead and provide limited improvement in calibration. In this work, by taking skin lesion classification as an example task, we show that by shifting Bayesian inference to the functional space we can craft meaningful priors that give better calibrated uncertainty estimates at a much lower computational cost.

deep learning, neural network, ood sample, (19 more...)

arXiv.org Artificial Intelligence

2005.11797

Genre: Research Report (0.40)

Industry:

Health & Medicine > Therapeutic Area > Dermatology (0.50)
Health & Medicine > Diagnostic Medicine > Imaging (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)

Add feedback

Sports Video Classification from Multimodal Information Using Deep Neural Networks

Sachan, Devendra Singh (Indian Institute of Technology, Guwahati) | Tekwani, Umesh (Indian Institute of Technology, Guwahati) | Sethi, Amit (Indian Institute of Technology, Guwahati)

AAAI ConferencesNov-14-2013

The work presents a methodology for classification of sports videos using both audio and visual information by applying deep learning algorithms. We show a methodology to combine multiple deep learning architectures through higher layers. Our method learns two separate models trained on audio and visual part of the data. We have trained the model for the audio part of the multimedia input using two stacked layers of CRBMs forminga CDBN. We also train two layered ISA network to extract features from video part of the data. We then train deep stacked autoencoder over both audio and visual features with discriminative fine tuning. Our results show that by combining both audio and visual features we get better accuracy as compared to single type of features.

deep neural network, multimodal information, sport video classification

AAAI Conferences

2013 AAAI Fall Symposium Series

Genre: Research Report > New Finding (0.53)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback