AITopics | Chaplot, Devendra Singh

Collaborating Authors

Chaplot, Devendra Singh

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Navigating to Objects Specified by Images

Krantz, Jacob, Gervet, Theophile, Yadav, Karmesh, Wang, Austin, Paxton, Chris, Mottaghi, Roozbeh, Batra, Dhruv, Malik, Jitendra, Lee, Stefan, Chaplot, Devendra Singh

arXiv.org Artificial IntelligenceApr-3-2023

Images are a convenient way to specify which particular object instance an embodied agent should navigate to. Solving this task requires semantic visual reasoning and exploration of unknown environments. We present a system that can perform this task in both simulation and the real world. Our modular method solves sub-tasks of exploration, goal instance re-identification, goal localization, and local navigation. We re-identify the goal instance in egocentric vision using feature-matching and localize the goal instance by projecting matched features to a map. Each sub-task is solved using off-the-shelf components requiring zero fine-tuning. On the HM3D InstanceImageNav benchmark, this system outperforms a baseline end-to-end RL policy 7x and a state-of-the-art ImageNav model 2.3x (56% vs 25% success). We deploy this system to a mobile robot platform and demonstrate effective real-world performance, achieving an 88% success rate across a home and an office environment.

artificial intelligence, machine learning, navigation, (16 more...)

arXiv.org Artificial Intelligence

2304.01192

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Navigating to Objects in the Real World

Gervet, Theophile, Chintala, Soumith, Batra, Dhruv, Malik, Jitendra, Chaplot, Devendra Singh

arXiv.org Artificial IntelligenceDec-1-2022

Semantic navigation is necessary to deploy mobile robots in uncontrolled environments like our homes, schools, and hospitals. Many learning-based approaches have been proposed in response to the lack of semantic understanding of the classical pipeline for spatial navigation, which builds a geometric map using depth sensors and plans to reach point goals. Broadly, end-to-end learning approaches reactively map sensor inputs to actions with deep neural networks, while modular learning approaches enrich the classical pipeline with learning-based semantic sensing and exploration. But learned visual navigation policies have predominantly been evaluated in simulation. How well do different classes of methods work on a robot? We present a large-scale empirical study of semantic visual navigation methods comparing representative methods from classical, modular, and end-to-end learning approaches across six homes with no prior experience, maps, or instrumentation. We find that modular learning works well in the real world, attaining a 90% success rate. In contrast, end-to-end learning does not, dropping from 77% simulation to 23% real-world success rate due to a large image domain gap between simulation and reality. For practitioners, we show that modular learning is a reliable approach to navigate to objects: modularity and abstraction in policy design enable Sim-to-Real transfer. For researchers, we identify two key issues that prevent today's simulators from being reliable evaluation benchmarks - (A) a large Sim-to-Real gap in images and (B) a disconnect between simulation and real-world error modes - and propose concrete steps forward.

artificial intelligence, machine learning, real world, (17 more...)

arXiv.org Artificial Intelligence

2212.00922

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry:

Transportation (0.68)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

PONI: Potential Functions for ObjectGoal Navigation with Interaction-free Learning

Ramakrishnan, Santhosh Kumar, Chaplot, Devendra Singh, Al-Halah, Ziad, Malik, Jitendra, Grauman, Kristen

arXiv.org Artificial IntelligenceJan-24-2022

State-of-the-art approaches to ObjectGoal navigation Prior work has made good progress on this task by rely on reinforcement learning and typically require significant formulating it as a reinforcement learning (RL) problem computational resources and time for learning. We and developing useful representations [20, 60], auxiliary propose Potential functions for ObjectGoal Navigation with tasks [61], data augmentation techniques [37], and improved Interaction-free learning (PONI), a modular approach that reward functions [37]. Despite this progress, end-toend disentangles the skills of'where to look?' for an object and RL incurs high computational cost, has poor sample efficiency, 'how to navigate to (x, y)?'. Our key insight is that'where and tends to generalize poorly to new scenes [7,12, to look?' can be treated purely as a perception problem, 37] since skills like moving without collisions, exploration, and learned without environment interactions. To address and stopping near the object are all learned from scratch this, we propose a network that predicts two complementary purely using RL. Modular navigation methods aim to address potential functions conditioned on a semantic map and uses these issues by disentangling'where to look for an object?' them to decide where to look for an unseen object. We train and'how to navigate to (x, y)?' [12,36]. These methods the potential function network using supervised learning on have emerged as strong competitors to end-to-end RL a passive dataset of top-down semantic maps, and integrate with good sample efficiency, better generalization to new it into a modular framework to perform ObjectGoal navigation.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2201.10029

Genre: Research Report (0.70)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.54)

Add feedback

Differentiable Spatial Planning using Transformers

Chaplot, Devendra Singh, Pathak, Deepak, Malik, Jitendra

arXiv.org Artificial IntelligenceDec-2-2021

We consider the problem of spatial path planning. In contrast to the classical solutions which optimize a new plan from scratch and assume access to the full map with ground truth obstacle locations, we learn a planner from the data in a differentiable manner that allows us to leverage statistical regularities from past data. We propose Spatial Planning Transformers (SPT), which given an obstacle map learns to generate actions by planning over long-range spatial dependencies, unlike prior data-driven planners that propagate information locally via convolutional structure in an iterative manner. In the setting where the ground truth map is not known to the agent, we leverage pre-trained SPTs in an end-to-end framework that has the structure of mapper and planner built into it which allows seamless generalization to out-of-distribution maps and goals. SPTs outperform prior state-of-the-art differentiable planners across all the setups for both manipulation and navigation tasks, leading to an absolute improvement of 7-19%.

artificial intelligence, machine learning, planning & scheduling, (19 more...)

arXiv.org Artificial Intelligence

2112.0101

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

SEAL: Self-supervised Embodied Active Learning using Exploration and 3D Consistency

Chaplot, Devendra Singh, Dalal, Murtaza, Gupta, Saurabh, Malik, Jitendra, Salakhutdinov, Ruslan

arXiv.org Artificial IntelligenceDec-2-2021

In this paper, we explore how we can build upon the data and models of Internet images and use them to adapt to robot vision without requiring any extra labels. We present a framework called Self-supervised Embodied Active Learning (SEAL). It utilizes perception models trained on internet images to learn an active exploration policy. The observations gathered by this exploration policy are labelled using 3D consistency and used to improve the perception model. We build and utilize 3D semantic maps to learn both action and perception in a completely self-supervised manner. The semantic map is used to compute an intrinsic motivation reward for training the exploration policy and for labelling the agent observations using spatio-temporal 3D consistency and label propagation. We demonstrate that the SEAL framework can be used to close the action-perception loop: it improves object detection and instance segmentation performance of a pretrained perception model by just moving around in training environments and the improved perception model can be used to improve Object Goal Navigation.

machine learning, teaching medhods, teaching method, (21 more...)

arXiv.org Artificial Intelligence

2112.01001

Country: North America > United States > Wisconsin (0.14)

Genre: Research Report (0.40)

Industry:

Government > Regional Government > North America Government > United States Government (0.46)
Education > Educational Technology > Educational Software > Computer Based Training (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Building Intelligent Autonomous Navigation Agents

Chaplot, Devendra Singh

arXiv.org Artificial IntelligenceJun-25-2021

Breakthroughs in machine learning in the last decade have led to `digital intelligence', i.e. machine learning models capable of learning from vast amounts of labeled data to perform several digital tasks such as speech recognition, face recognition, machine translation and so on. The goal of this thesis is to make progress towards designing algorithms capable of `physical intelligence', i.e. building intelligent autonomous navigation agents capable of learning to perform complex navigation tasks in the physical world involving visual perception, natural language understanding, reasoning, planning, and sequential decision making. Despite several advances in classical navigation methods in the last few decades, current navigation agents struggle at long-term semantic navigation tasks. In the first part of the thesis, we discuss our work on short-term navigation using end-to-end reinforcement learning to tackle challenges such as obstacle avoidance, semantic perception, language grounding, and reasoning. In the second part, we present a new class of navigation methods based on modular learning and structured explicit map representations, which leverage the strengths of both classical and end-to-end learning methods, to tackle long-term navigation tasks. We show that these methods are able to effectively tackle challenges such as localization, mapping, long-term planning, exploration and learning semantic priors. These modular learning methods are capable of long-term spatial and semantic understanding and achieve state-of-the-art results on various navigation tasks.

computer game, deep learning, end-to-end reinforcement learning, (25 more...)

arXiv.org Artificial Intelligence

2106.13415

Country:

North America > United States > Wisconsin (0.13)
North America > United States > New York (0.13)
North America > United States > California (0.13)

Genre:

Research Report > New Finding (0.92)
Instructional Material (0.92)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Education > Educational Setting > Online (0.67)
Health & Medicine > Therapeutic Area (0.67)
Government > Regional Government > North America Government > United States Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Add feedback

Planning with Submodular Objective Functions

Wang, Ruosong, Zhang, Hanrui, Chaplot, Devendra Singh, Garagić, Denis, Salakhutdinov, Ruslan

arXiv.org Artificial IntelligenceOct-22-2020

Modern reinforcement learning and planning algorithms have achieved tremendous successes on various tasks [Mnih et al., 2015, Silver et al., 2017]. However, most of these algorithms work in the standard Markov decision process (MDP) framework where the goal is to maximize the cumulative reward and thus it can be difficult to apply them to various practical sequential decision-making problems. In this paper, we study planning in generalized MDPs, where instead of maximizing the cumulative reward, the goal is to maximize the objective value induced by a submodular function. To motivate our approach, let us consider the following scenario: a company manufactures cars, and as part of its customer service, continuously monitors the status of all cars produced by the company. Each car is equipped with a number of sensors, each of which constantly produces noisy measurements of some attribute of the car, e.g., speed, location, temperature, etc. Due to bandwidth constraints, at any moment, each car may choose to transmit data generated by a single sensor to the company. The goal is to combine the statistics collected over a fixed period of time, presumably from multiple sensors, to gather as much information about the car as possible. Perhaps one seemingly natural strategy is to transmit only data generated by the most "informative" sensor. However, as the output of a sensor remains the same between two samples, it is pointless to transmit the same data multiple times. One may alternatively try to order sensors by their "informativity" and always choose the most informative sensor that has not yet transmitted data since the last sample was generated.

algorithm, artificial intelligence, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2010.11863

Country: North America > United States (0.93)

Genre:

Research Report (0.50)
Workflow (0.46)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.69)

Add feedback

Semantic Curiosity for Active Visual Learning

Chaplot, Devendra Singh, Jiang, Helen, Gupta, Saurabh, Gupta, Abhinav

arXiv.org Artificial IntelligenceJun-16-2020

In this paper, we study the task of embodied interactive learning for object detection. Given a set of environments (and some labeling budget), our goal is to learn an object detector by having an agent select what data to obtain labels for. How should an exploration policy decide which trajectory should be labeled? One possibility is to use a trained object detector's failure cases as an external reward. However, this will require labeling millions of frames required for training RL policies, which is infeasible. Instead, we explore a self-supervised approach for training our exploration policy by introducing a notion of semantic curiosity. Our semantic curiosity policy is based on a simple observation -- the detection outputs should be consistent. Therefore, our semantic curiosity rewards trajectories with inconsistent labeling behavior and encourages the exploration policy to explore such areas. The exploration policy trained via semantic curiosity generalizes to novel scenes and helps train an object detector that outperforms baselines trained with other possible alternatives such as random exploration, prediction-error curiosity, and coverage-maximizing exploration.

artificial intelligence, computer game, exploration policy, (16 more...)

arXiv.org Artificial Intelligence

2006.09367

Country: North America > United States (0.93)

Genre: Research Report > New Finding (0.46)

Industry:

Education > Educational Setting > Online (0.48)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Neural Topological SLAM for Visual Navigation

Chaplot, Devendra Singh, Salakhutdinov, Ruslan, Gupta, Abhinav, Gupta, Saurabh

arXiv.org Artificial IntelligenceMay-28-2020

This paper studies the problem of image-goal navigation which involves navigating to the location indicated by a goal image in a novel previously unseen environment. To tackle this problem, we design topological representations for space that effectively leverage semantics and afford approximate geometric reasoning. At the heart of our representations are nodes with associated semantic features, that are interconnected using coarse geometric information. We describe supervised learning-based algorithms that can build, maintain and use such representations under noisy actuation. Experimental study in visually and physically realistic simulation suggests that our method builds effective representations that capture structural regularities and efficiently solve long-horizon navigation problems. We observe a relative improvement of more than 50% over existing methods that study this task.

goal image, inductive learning, neural network, (21 more...)

arXiv.org Artificial Intelligence

2005.12256

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Sensing and Signal Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.34)

Add feedback

Embodied Multimodal Multitask Learning

Chaplot, Devendra Singh, Lee, Lisa, Salakhutdinov, Ruslan, Parikh, Devi, Batra, Dhruv

arXiv.org Machine LearningFeb-4-2019

Recent efforts on training visual navigation agents conditioned on language using deep reinforcement learning have been successful in learning policies for different multimodal tasks, such as semantic goal navigation and embodied question answering. In this paper, we propose a multitask model capable of jointly learning these multimodal tasks, and transferring knowledge of words and their grounding in visual objects across the tasks. The proposed model uses a novel Dual-Attention unit to disentangle the knowledge of words in the textual representations and visual concepts in the visual representations, and align them with each other. This disentangled task-invariant alignment of representations facilitates grounding and knowledge transfer across both tasks. We show that the proposed model outperforms a range of baselines on both tasks in simulated 3D environments. We also show that this disentanglement of representations makes our model modular, interpretable, and allows for transfer to instructions containing new words by leveraging object detectors.

artificial intelligence, instruction, neural network, (19 more...)

arXiv.org Machine Learning

1902.01385

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.56)
(2 more...)

Add feedback