AITopics | Paxton, Chris

Plotting

Paxton, Chris

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

GraphEQA: Using 3D Semantic Scene Graphs for Real-time Embodied Question Answering

Saxena, Saumya, Buchanan, Blake, Paxton, Chris, Chen, Bingqing, Vaskevicius, Narunas, Palmieri, Luigi, Francis, Jonathan, Kroemer, Oliver

arXiv.org Artificial IntelligenceDec-18-2024

For example, to answer explore and develop a semantic understanding of an unseen the question "How many chairs are there at the dining environment in order to answer a situated question table?", the agent might rely on commonsense knowledge with confidence. This remains a challenging problem in to understand that dining tables are often associated with robotics, due to the difficulties in obtaining useful semantic dining rooms and dining rooms are usually near the kitchen representations, updating these representations online, and towards the back of a home. A reasonable navigation strategy leveraging prior world knowledge for efficient exploration would involve navigating to the back of the house to and planning. Aiming to address these limitations, we propose locate a kitchen. To ground this search in the current environment, GraphEQA, a novel approach that utilizes real-time however, requires the agent to continually maintain 3D metric-semantic scene graphs (3DSGs) and task relevant an understanding of where it is, memory of where it images as multi-modal memory for grounding Vision-has been, and what further exploratory actions will lead it Language Models (VLMs) to perform EQA tasks in unseen to relevant regions. Finally, the agent needs to observe the environments. We employ a hierarchical planning approach target object(s) and perform visual grounding, in order to that exploits the hierarchical nature of 3DSGs for structured reason about the number of chairs around the dining table, planning and semantic-guided exploration. Through experiments and confidently answer the question correctly.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2412.1448

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation

Liu, Peiqi, Guo, Zhanqiu, Warke, Mohit, Chintala, Soumith, Paxton, Chris, Shafiullah, Nur Muhammad Mahi, Pinto, Lerrel

arXiv.org Artificial IntelligenceNov-7-2024

Significant progress has been made in open-vocabulary mobile manipulation, where the goal is for a robot to perform tasks in any environment given a natural language description. However, most current systems assume a static environment, which limits the system's applicability in real-world scenarios where environments frequently change due to human intervention or the robot's own actions. In this work, we present DynaMem, a new approach to open-world mobile manipulation that uses a dynamic spatio-semantic memory to represent a robot's environment. DynaMem constructs a 3D data structure to maintain a dynamic memory of point clouds, and answers open-vocabulary object localization queries using multimodal LLMs or open-vocabulary features generated by state-of-the-art vision-language models. Powered by DynaMem, our robots can explore novel environments, search for objects not found in memory, and continuously update the memory as objects move, appear, or disappear in the scene. We run extensive experiments on the Stretch SE3 robots in three real and nine offline scenes, and achieve an average pick-and-drop success rate of 70% on non-stationary objects, which is more than a 2x improvement over state-of-the-art static systems. Our code as well as our experiment and deployment videos are open sourced and can be found on our project website: https://dynamem.github.io/

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.04999

Genre: Research Report (0.66)

Industry: Health & Medicine > Consumer Health (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

DegustaBot: Zero-Shot Visual Preference Estimation for Personalized Multi-Object Rearrangement

Newman, Benjamin A., Gupta, Pranay, Kitani, Kris, Bisk, Yonatan, Admoni, Henny, Paxton, Chris

arXiv.org Artificial IntelligenceJul-11-2024

De gustibus non est disputandum ("there is no accounting for others' tastes") is a common Latin maxim describing how many solutions in life are determined by people's personal preferences. Many household tasks, in particular, can only be considered fully successful when they account for personal preferences such as the visual aesthetic of the scene. For example, setting a table could be optimized by arranging utensils according to traditional rules of Western table setting decorum, without considering the color, shape, or material of each object, but this may not be a completely satisfying solution for a given person. Toward this end, we present DegustaBot, an algorithm for visual preference learning that solves household multi-object rearrangement tasks according to personal preference. To do this, we use internet-scale pre-trained vision-and-language foundation models (VLMs) with novel zero-shot visual prompting techniques. To evaluate our method, we collect a large dataset of naturalistic personal preferences in a simulated table-setting task, and conduct a user study in order to develop two novel metrics for determining success based on personal preference. This is a challenging problem and we find that 50% of our model's predictions are likely to be found acceptable by at least 20% of people.

artificial intelligence, enim, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2407.08876

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.60)
Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

HACMan++: Spatially-Grounded Motion Primitives for Manipulation

Jiang, Bowen, Wu, Yilin, Zhou, Wenxuan, Paxton, Chris, Held, David

arXiv.org Artificial IntelligenceJul-11-2024

Although end-to-end robot learning has shown some success for robot manipulation, the learned policies are often not sufficiently robust to variations in object pose or geometry. To improve the policy generalization, we introduce spatially-grounded parameterized motion primitives in our method HACMan++. Specifically, we propose an action representation consisting of three components: what primitive type (such as grasp or push) to execute, where the primitive will be grounded (e.g. where the gripper will make contact with the world), and how the primitive motion is executed, such as parameters specifying the push direction or grasp orientation. These three components define a novel discrete-continuous action space for reinforcement learning. Our framework enables robot agents to learn to chain diverse motion primitives together and select appropriate primitive parameters to complete long-horizon manipulation tasks. By grounding the primitives on a spatial location in the environment, our method is able to effectively generalize across object shape and pose variations. Our approach significantly outperforms existing methods, particularly in complex scenarios demanding both high-level sequential reasoning and object generalization. With zero-shot sim-to-real transfer, our policy succeeds in challenging real-world manipulation tasks, with generalization to unseen objects. Videos can be found on the project website: https://sgmp-rss2024.github.io.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2407.08585

Genre:

Workflow (0.93)
Research Report (0.81)

Industry: Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.48)

Add feedback

Towards Open-World Mobile Manipulation in Homes: Lessons from the Neurips 2023 HomeRobot Open Vocabulary Mobile Manipulation Challenge

Yenamandra, Sriram, Ramachandran, Arun, Khanna, Mukul, Yadav, Karmesh, Vakil, Jay, Melnik, Andrew, Büttner, Michael, Harz, Leon, Brown, Lyon, Nandi, Gora Chand, PS, Arjun, Yadav, Gaurav Kumar, Kala, Rahul, Haschke, Robert, Luo, Yang, Zhu, Jinxin, Han, Yansen, Lu, Bingyi, Gu, Xuan, Liu, Qinyuan, Zhao, Yaping, Ye, Qiting, Dou, Chenxiao, Chua, Yansong, Kuzma, Volodymyr, Humennyy, Vladyslav, Partsey, Ruslan, Francis, Jonathan, Chaplot, Devendra Singh, Chhablani, Gunjan, Clegg, Alexander, Gervet, Theophile, Jain, Vidhi, Ramrakhya, Ram, Szot, Andrew, Wang, Austin, Yang, Tsung-Yen, Edsinger, Aaron, Kemp, Charlie, Shah, Binit, Kira, Zsolt, Batra, Dhruv, Mottaghi, Roozbeh, Bisk, Yonatan, Paxton, Chris

arXiv.org Artificial IntelligenceJul-9-2024

In order to develop robots that can effectively serve as versatile and capable home assistants, it is crucial for them to reliably perceive and interact with a wide variety of objects across diverse environments. To this end, we proposed Open Vocabulary Mobile Manipulation as a key benchmark task for robotics: finding any object in a novel environment and placing it on any receptacle surface within that environment. We organized a NeurIPS 2023 competition featuring both simulation and real-world components to evaluate solutions to this task. Our baselines on the most challenging version of this task, using real perception in simulation, achieved only an 0.8% success rate; by the end of the competition, the best participants achieved an 10.8\% success rate, a 13x improvement. We observed that the most successful teams employed a variety of methods, yet two common threads emerged among the best solutions: enhancing error detection and recovery, and improving the integration of perception with decision-making processes. In this paper, we detail the results and methodologies used, both in simulation and real-world settings. We discuss the lessons learned and their implications for future research. Additionally, we compare performance in real and simulated environments, emphasizing the necessity for robust generalization to novel settings.

artificial intelligence, machine learning, receptacle, (16 more...)

arXiv.org Artificial Intelligence

2407.06939

Country:

North America > United States (0.28)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Bootstrapping Linear Models for Fast Online Adaptation in Human-Agent Collaboration

Newman, Benjamin A, Paxton, Chris, Kitani, Kris, Admoni, Henny

arXiv.org Artificial IntelligenceApr-16-2024

Agents that assist people need to have well-initialized policies that can adapt quickly to align with their partners' reward functions. Initializing policies to maximize performance with unknown partners can be achieved by bootstrapping nonlinear models using imitation learning over large, offline datasets. Such policies can require prohibitive computation to fine-tune in-situ and therefore may miss critical run-time information about a partner's reward function as expressed through their immediate behavior. In contrast, online logistic regression using low-capacity models performs rapid inference and fine-tuning updates and thus can make effective use of immediate in-task behavior for reward function alignment. However, these low-capacity models cannot be bootstrapped as effectively by offline datasets and thus have poor initializations. We propose BLR-HAC, Bootstrapped Logistic Regression for Human Agent Collaboration, which bootstraps large nonlinear models to learn the parameters of a low-capacity model which then uses online logistic regression for updates during collaboration. We test BLR-HAC in a simulated surface rearrangement task and demonstrate that it achieves higher zero-shot accuracy than shallow methods and takes far less computation to adapt online while still achieving similar performance to fine-tuned, large nonlinear models. For code, please see our project page https://sites.google.com/view/blr-hac.

artificial intelligence, collaboration, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2404.10733

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre:

Research Report > New Finding (0.89)
Research Report > Experimental Study (0.75)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Real-World Robot Applications of Foundation Models: A Review

Kawaharazuka, Kento, Matsushima, Tatsuya, Gambardella, Andrew, Guo, Jiaxian, Paxton, Chris, Zeng, Andy

arXiv.org Artificial IntelligenceFeb-8-2024

Recent developments in foundation models, like Large Language Models (LLMs) and Vision-Language Models (VLMs), trained on extensive data, facilitate flexible application across different tasks and modalities. Their impact spans various fields, including healthcare, education, and robotics. This paper provides an overview of the practical application of foundation models in real-world robotics, with a primary emphasis on the replacement of specific components within existing robot systems. The summary encompasses the perspective of input-output relationships in foundation models, as well as their role in perception, motion planning, and control within the field of robotics. This paper concludes with a discussion of future challenges and implications for practical robot applications.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2402.05741

Country:

Asia > Japan > Honshū (0.14)
Asia > Middle East > Israel (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Health & Medicine (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.88)

Add feedback

OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics

Liu, Peiqi, Orru, Yaswanth, Paxton, Chris, Shafiullah, Nur Muhammad Mahi, Pinto, Lerrel

arXiv.org Artificial IntelligenceJan-22-2024

Remarkable progress has been made in recent years in the fields of vision, language, and robotics. We now have vision models capable of recognizing objects based on language queries, navigation systems that can effectively control mobile systems, and grasping models that can handle a wide range of objects. Despite these advancements, general-purpose applications of robotics still lag behind, even though they rely on these fundamental capabilities of recognition, navigation, and grasping. In this paper, we adopt a systems-first approach to develop a new Open Knowledge-based robotics framework called OK-Robot. By combining Vision-Language Models (VLMs) for object detection, navigation primitives for movement, and grasping primitives for object manipulation, OK-Robot offers a integrated solution for pick-and-drop operations without requiring any training. To evaluate its performance, we run OK-Robot in 10 real-world home environments. The results demonstrate that OK-Robot achieves a 58.5% success rate in open-ended pick-and-drop tasks, representing a new state-of-the-art in Open Vocabulary Mobile Manipulation (OVMM) with nearly 1.8x the performance of prior work. On cleaner, uncluttered environments, OK-Robot's performance increases to 82%. However, the most important insight gained from OK-Robot is the critical role of nuanced details when combining Open Knowledge systems like VLMs with robotic modules. Videos of our experiments are available on our website: https://ok-robot.github.io

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2401.12202

Country:

North America > United States (0.14)
Asia > Japan (0.14)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine > Consumer Health (1.00)
Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.94)
Consumer Products & Services > Personal Products > Beauty Care Products (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.54)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.46)

Add feedback

HomeRobot: Open-Vocabulary Mobile Manipulation

Yenamandra, Sriram, Ramachandran, Arun, Yadav, Karmesh, Wang, Austin, Khanna, Mukul, Gervet, Theophile, Yang, Tsung-Yen, Jain, Vidhi, Clegg, Alexander William, Turner, John, Kira, Zsolt, Savva, Manolis, Chang, Angel, Chaplot, Devendra Singh, Batra, Dhruv, Mottaghi, Roozbeh, Bisk, Yonatan, Paxton, Chris

arXiv.org Artificial IntelligenceJan-10-2024

HomeRobot (noun): An affordable compliant robot that navigates homes and manipulates a wide range of objects in order to complete everyday tasks. Open-Vocabulary Mobile Manipulation (OVMM) is the problem of picking any object in any unseen environment, and placing it in a commanded location. This is a foundational challenge for robots to be useful assistants in human environments, because it involves tackling sub-problems from across robotics: perception, language understanding, navigation, and manipulation are all essential to OVMM. In addition, integration of the solutions to these sub-problems poses its own substantial challenges. To drive research in this area, we introduce the HomeRobot OVMM benchmark, where an agent navigates household environments to grasp novel objects and place them on target receptacles. HomeRobot has two components: a simulation component, which uses a large and diverse curated object set in new, high-quality multi-room home environments; and a real-world component, providing a software stack for the low-cost Hello Robot Stretch to encourage replication of real-world experiments across labs. We implement both reinforcement learning and heuristic (model-based) baselines and show evidence of sim-to-real transfer. Our baselines achieve a 20% success rate in the real world; our experiments identify ways future research work improve performance. See videos on our website: https://ovmm.github.io/.

machine learning, receptacle, reinforcement learning, (21 more...)

arXiv.org Artificial Intelligence

2306.11565

Country: North America > United States (0.46)

Genre: Research Report (0.81)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.34)

Add feedback

GOAT: GO to Any Thing

Chang, Matthew, Gervet, Theophile, Khanna, Mukul, Yenamandra, Sriram, Shah, Dhruv, Min, So Yeon, Shah, Kavit, Paxton, Chris, Gupta, Saurabh, Batra, Dhruv, Mottaghi, Roozbeh, Malik, Jitendra, Chaplot, Devendra Singh

arXiv.org Artificial IntelligenceNov-10-2023

In deployment scenarios such as homes and warehouses, mobile robots are expected to autonomously navigate for extended periods, seamlessly executing tasks articulated in terms that are intuitively understandable by human operators. We present GO To Any Thing (GOAT), a universal navigation system capable of tackling these requirements with three key features: a) Multimodal: it can tackle goals specified via category labels, target images, and language descriptions, b) Lifelong: it benefits from its past experience in the same environment, and c) Platform Agnostic: it can be quickly deployed on robots with different embodiments. GOAT is made possible through a modular system design and a continually augmented instance-aware semantic memory that keeps track of the appearance of objects from different viewpoints in addition to category-level semantics. This enables GOAT to distinguish between different instances of the same category to enable navigation to targets specified by images and language descriptions. In experimental comparisons spanning over 90 hours in 9 different homes consisting of 675 goals selected across 200+ different object instances, we find GOAT achieves an overall success rate of 83%, surpassing previous methods and ablations by 32% (absolute improvement). GOAT improves with experience in the environment, from a 60% success rate at the first goal to a 90% success after exploration. In addition, we demonstrate that GOAT can readily be applied to downstream tasks such as pick and place and social navigation.

artificial intelligence, goat, thing

arXiv.org Artificial Intelligence

2311.0643

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback