AITopics | Desai, Karan

Collaborating Authors

Desai, Karan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Hyperbolic Image-Text Representations

Desai, Karan, Nickel, Maximilian, Rajpurohit, Tanmay, Johnson, Justin, Vedantam, Ramakrishna

arXiv.org Artificial IntelligenceJun-5-2023

Visual and linguistic concepts naturally organize themselves in a hierarchy, where a textual concept "dog" entails all images that contain dogs. Despite being intuitive, current large-scale vision and language models such as CLIP do not explicitly capture such hierarchy. We propose MERU, a contrastive model that yields hyperbolic representations of images and text. Hyperbolic spaces have suitable geometric properties to embed tree-like data, so MERU can better capture the underlying hierarchy in image-text datasets. Our results show that MERU learns a highly interpretable and structured representation space while being competitive with CLIP's performance on standard multi-modal tasks like image classification and image-text retrieval.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2304.09172

Country:

Asia (0.67)
Europe > United Kingdom > England (0.46)
North America > United States > Alaska (0.28)
(2 more...)

Genre: Research Report > New Finding (0.86)

Industry:

Leisure & Entertainment (1.00)
Consumer Products & Services (1.00)
Energy (0.67)
(2 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

CASTing Your Model: Learning to Localize Improves Self-Supervised Representations

Selvaraju, Ramprasaath R., Desai, Karan, Johnson, Justin, Naik, Nikhil

arXiv.org Artificial IntelligenceDec-8-2020

Recent advances in self-supervised learning (SSL) have largely closed the gap with supervised ImageNet pretraining. Despite their success these methods have been primarily applied to unlabeled ImageNet images, and show marginal gains when trained on larger sets of uncurated images. We hypothesize that current SSL methods perform best on iconic images, and struggle on complex scene images with many objects. Analyzing contrastive SSL methods shows that they have poor visual grounding and receive poor supervisory signal when trained on scene images. We propose Contrastive Attention-Supervised Tuning(CAST) to overcome these limitations. CAST uses unsupervised saliency maps to intelligently sample crops, and to provide grounding supervision via a Grad-CAM attention loss. Experiments on COCO show that CAST significantly improves the features learned by SSL methods on scene images, and further experiments show that CAST-trained models are more robust to changes in backgrounds.

deep learning, neural network, representation, (15 more...)

arXiv.org Artificial Intelligence

2012.0463

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Continual Reinforcement Learning in 3D Non-stationary Environments

Lomonaco, Vincenzo, Desai, Karan, Culurciello, Eugenio, Maltoni, Davide

arXiv.org Machine LearningMay-24-2019

High-dimensional always-changing environments constitute a hard challenge for current reinforcement learning techniques. Artificial agents, nowadays, are often trained off-line in very static and controlled conditions in simulation such that training observations can be thought as sampled i.i.d. from the entire observations space. However, in real world settings, the environment is often non-stationary and subject to unpredictable, frequent changes. In this paper we propose and openly release CRLMaze, a new benchmark for learning continually through reinforcement in a complex 3D non-stationary task based on ViZDoom and subject to several environmental changes. Then, we introduce an end-to-end model-free continual reinforcement learning strategy showing competitive results with respect to four different baselines and not requiring any access to additional supervised signals, previously encountered environmental conditions or observations.

deep learning, neural network, scenario, (20 more...)

arXiv.org Machine Learning

1905.10112

Country: North America > United States (0.68)

Genre: Research Report (0.65)

Industry:

Health & Medicine (0.68)
Education (0.68)
Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering

Vedantam, Ramakrishna, Desai, Karan, Lee, Stefan, Rohrbach, Marcus, Batra, Dhruv, Parikh, Devi

arXiv.org Machine LearningFeb-20-2019

We propose a new class of probabilistic neural-symbolic models, that have symbolic functional programs as a latent, stochastic variable. Instantiated in the context of visual question answering, our probabilistic formulation offers two key conceptual advantages over prior neural-symbolic models for VQA. Firstly, the programs generated by our model are more understandable while requiring lesser number of teaching examples. Secondly, we show that one can pose counterfactual scenarios to the model, to probe its beliefs on the programs that could lead to a specified answer given an image. Our results on the CLEVR and SHAPES datasets verify our hypotheses, showing that the model gets better program (and answer) prediction accuracy even in the low data regime, and allows one to probe the coherence and consistency of reasoning performed.

artificial intelligence, deep learning, neural network, (21 more...)

arXiv.org Machine Learning

1902.07864

Country: North America > United States (0.93)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.82)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

nocaps: novel object captioning at scale

Agrawal, Harsh, Desai, Karan, Chen, Xinlei, Jain, Rishabh, Batra, Dhruv, Parikh, Devi, Lee, Stefan, Anderson, Peter

arXiv.org Artificial IntelligenceDec-20-2018

Image captioning models have achieved impressive results on datasets containing limited visual concepts and large amounts of paired image-caption training data. However, if these models are to ever function in the wild, a much larger variety of visual concepts must be learned, ideally from less supervision. To encourage the development of image captioning models that can learn visual concepts from alternative data sources, such as object detection datasets, we present the first large-scale benchmark for this task. Dubbed 'nocaps', for novel object captioning at scale, our benchmark consists of 166,100 human-generated captions describing 15,100 images from the Open Images validation and test sets. The associated training data consists of COCO image-caption pairs, plus Open Images image-level labels and object bounding boxes. Since Open Images contains many more classes than COCO, more than 500 object classes seen in test images have no training captions (hence, nocaps). We evaluate several existing approaches to novel object captioning on our challenging benchmark. In automatic evaluations these approaches show modest improvements over a strong baseline trained only on image-caption data. However, even when using ground-truth object detections, the results are significantly weaker than our human baseline - indicating substantial room for improvement.

caption, deep learning, neural network, (22 more...)

arXiv.org Artificial Intelligence

1812.08658

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (1.00)
Leisure & Entertainment > Sports (1.00)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback