AITopics | saycan

Collaborating Authors

saycan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ConceptBot: Enhancing Robot's Autonomy through Task Decomposition with Large Language Models and Knowledge Graph

Leanza, Alessandro, Moroncelli, Angelo, Vizzari, Giuseppe, Braghin, Francesco, Roveda, Loris, Spahiu, Blerina

arXiv.org Artificial IntelligenceSep-3-2025

--ConceptBot is a modular robotic planning framework that combines Large Language Models and Knowledge Graphs to generate feasible and risk-aware plans despite ambiguities in natural language instructions and correctly analyzing the objects present in the environment--challenges that typically arise from a lack of commonsense reasoning. T o do that, ConceptBot integrates (i) an Object Property Extraction (OPE) module that enriches scene understanding with semantic concepts from ConceptNet, (ii) a User Request Processing (URP) module that disambiguates and structures instructions, and (iii) a Planner that generates context-aware, feasible pick-and-place policies. In comparative evaluations against Google SayCan, ConceptBot achieved 100% success on explicit tasks, maintained 87% accuracy on implicit tasks (versus 31% for SayCan), reached 76% on risk-aware tasks (versus 15%), and outperformed SayCan in application-specific scenarios, including material classification (70% vs. 20%) and toxicity detection (86% vs. 36%). On SafeAgentBench, ConceptBot achieved an overall score of 80% (versus 46% for the next-best baseline). These results, validated in both simulation and laboratory experiments, demonstrate ConceptBot's ability to generalize without domain-specific training and to significantly improve the reliability of robotic policies in unstructured environments. Advances in recent decades in robotic core capabilities, i.e., perception, control, and manipulation, have increased demand for autonomous systems in fields ranging from manufacturing to healthcare, logistics to home care, etc. These capabilities are deeply interconnected with the planning phase [1], as successful planning depends on a robot's ability to perceive its environment accurately, execute precise control, and perform effective manipulation. Despite significant progress, planning in robotic systems continues to face challenges, particularly in unstructured environments [2]. A key element in achieving effective planning is task decomposition [3], which involves breaking complex objectives into smaller, manageable actions. This process is essential for simplifying execution and ensuring flexibility in diverse environments. Traditional task decomposition approaches, however, often rely on rigid, pre-programmed templates or static models, which struggle to adapt to unfamiliar or dynamic conditions [4]-[7]. Recently, advancements in Large Language Models (LLMs) have introduced a more dynamic alternative. LLMs enable robots to process natural language instructions, understand contextual nuances, and dynamically decompose tasks into actionable steps [8]-[10]. However, directly employing pre-trained LLMs often leads to non-executable or ineffective plans, as these models struggle to account for domain-specific constraints and real-world feasibility [11]- [13].

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2509.0057

Country: Europe > Italy (0.28)

Genre:

Workflow (1.00)
Research Report (1.00)

Industry:

Materials > Containers & Packaging (0.93)
Health & Medicine (0.86)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

CAPE: Corrective Actions from Precondition Errors using Large Language Models

Raman, Shreyas Sundara, Cohen, Vanya, Paulius, David, Idrees, Ifrah, Rosen, Eric, Mooney, Ray, Tellex, Stefanie

arXiv.org Artificial IntelligenceOct-22-2023

Extracting commonsense knowledge from a large language model (LLM) offers a path to designing intelligent robots. Existing approaches that leverage LLMs for planning are unable to recover when an action fails and often resort to retrying failed actions, without resolving the error's underlying cause. We propose a novel approach (CAPE) that attempts to propose corrective actions to resolve precondition errors during planning. CAPE improves the quality of generated plans by leveraging few-shot reasoning from action preconditions. Our approach enables embodied agents to execute more tasks than baseline methods while ensuring semantic correctness and minimizing re-prompting. In VirtualHome, CAPE generates executable plans while improving a human-annotated plan correctness metric from 28.89% to 49.63% over SayCan. Our improvements transfer to a Boston Dynamics Spot robot initialized with a set of skills (specified in language) and associated preconditions, where CAPE improves the correctness metric of the executed task plans by 76.49% compared to SayCan. Our approach enables the robot to follow natural language commands and robustly recover from failures, which baseline approaches largely cannot resolve or address inefficiently.

cape, precondition, proceedings, (16 more...)

arXiv.org Artificial Intelligence

2211.09935

Country:

North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > Louisiana (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

A Picture is Worth a Thousand Words: Language Models Plan from Pixels

Liu, Anthony Z., Logeswaran, Lajanugen, Sohn, Sungryull, Lee, Honglak

arXiv.org Artificial IntelligenceMar-15-2023

Planning is an important capability of artificial agents that perform long-horizon tasks in real-world environments. In this work, we explore the use of pre-trained language models (PLMs) to reason about plan sequences from text instructions in embodied visual environments. Prior PLM based approaches for planning either assume observations are available in the form of text (e.g., provided by a captioning model), reason about plans from the instruction alone, or incorporate information about the visual environment in limited ways (such as a pre-trained affordance function). In contrast, we show that PLMs can accurately plan even when observations are directly encoded as input prompts for the PLM. We show that this simple approach outperforms prior approaches in experiments on the ALFWorld and VirtualHome benchmarks.

large language model, machine learning, plm, (19 more...)

arXiv.org Artificial Intelligence

2303.09031

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > Dominican Republic (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

RT-1: Robotics Transformer for Real-World Control at Scale – Google AI Blog

#artificialintelligenceDec-24-2022, 23:11:29 GMT

Major recent advances in multiple subfields of machine learning (ML) research, such as computer vision and natural language processing, have been enabled by a shared common approach that leverages large, diverse datasets and expressive models that can absorb all of the data effectively. Although there have been various attempts to apply this approach to robotics, robots have not yet leveraged highly-capable models as well as other subfields. Several factors contribute to this challenge. First, there's the lack of large-scale and diverse robotic data, which limits a model's ability to absorb a broad set of robotic experiences. Data collection is particularly expensive and challenging for robotics because dataset curation requires engineering-heavy autonomous operation, or demonstrations collected using human teleoperations. To address these challenges, we propose the Robotics Transformer 1 (RT-1), a multi-task model that tokenizes robot inputs and outputs actions (e.g., camera images, task instructions, and motor commands) to enable efficient inference at runtime, which makes real-time control feasible.

robot, rt-1, saycan, (15 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Open-vocabulary Queryable Scene Representations for Real World Planning

Chen, Boyuan, Xia, Fei, Ichter, Brian, Rao, Kanishka, Gopalakrishnan, Keerthana, Ryoo, Michael S., Stone, Austin, Kappler, Daniel

arXiv.org Artificial IntelligenceOct-15-2022

Abstract-- Large language models (LLMs) have unlocked new capabilities of task planning from human instructions. NLMap first establishes a natural language queryable scene representation with Visual Language models (VLMs). An LLM based object proposal module parses instructions and proposes involved objects to query the scene representation for object availability and location. An LLM planner then plans with such information about the scene. We propose an open-vocabulary and queryable scene representation for real-world planning. The returned object presence and location are used for LLM-based planning. It has to first identify relevant objects and upon it. Recent progress in large language models (LLMs), locations within the scene (e.g., the watering can, the sink, and has shown impressive few-shot performance in language each potential plant) and then plan over these objects in sequential comprehension, semantic understanding, and reasoning, as order (get the watering can, then go the sink, and then fill it well as application to robotics problems like planning [5]-[7] up), conditioning on its affordances (e.g., can it carry a full and instruction following [8]. Using such models in embodied watering can), and conditioning on the scene (e.g., how many settings can provide significant challenges, most critically because plants there are, and where are they).

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2209.09874

Country: Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)

Genre: Research Report (0.82)

Industry: Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

Ahn, Michael, Brohan, Anthony, Brown, Noah, Chebotar, Yevgen, Cortes, Omar, David, Byron, Finn, Chelsea, Fu, Chuyuan, Gopalakrishnan, Keerthana, Hausman, Karol, Herzog, Alex, Ho, Daniel, Hsu, Jasmine, Ibarz, Julian, Ichter, Brian, Irpan, Alex, Jang, Eric, Ruano, Rosario Jauregui, Jeffrey, Kyle, Jesmonth, Sally, Joshi, Nikhil J, Julian, Ryan, Kalashnikov, Dmitry, Kuang, Yuheng, Lee, Kuang-Huei, Levine, Sergey, Lu, Yao, Luu, Linda, Parada, Carolina, Pastor, Peter, Quiambao, Jornell, Rao, Kanishka, Rettinghouse, Jarek, Reyes, Diego, Sermanet, Pierre, Sievers, Nicolas, Tan, Clayton, Toshev, Alexander, Vanhoucke, Vincent, Xia, Fei, Xiao, Ted, Xu, Peng, Xu, Sichun, Yan, Mengyuan, Zeng, Andy

arXiv.org Artificial IntelligenceAug-16-2022

Such knowledge could be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language. However, a significant weakness of language models is that they lack real-world experience, which makes it difficult to leverage them for decision making within a given embodiment. For example, asking a language model to describe how to clean a spill might result in a reasonable narrative, but it may not be applicable to a particular agent, such as a robot, that needs to perform this task in a particular environment. We propose to provide real-world grounding by means of pretrained skills, which are used to constrain the model to propose natural language actions that are both feasible and contextually appropriate. The robot can act as the language model's "hands and eyes," while the language model supplies high-level semantic knowledge about the task. We show how low-level skills can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally extended instructions, while value functions associated with these skills provide the grounding necessary to connect this knowledge to a particular physical environment. We evaluate our method on a number of real-world robotic tasks, where we show the need for real-world grounding and that this approach is capable of completing long-horizon, abstract, natural language instructions on a mobile manipulator. The project's website, the video, and open sourced code in a tabletop domain can be found at say-can.github.io. Figure 1: LLMs have not interacted with their environment and observed the outcome of their responses, and thus are not grounded in the world. SayCan grounds LLMs via value functions of pretrained skills, allowing them to execute real-world, abstract, long-horizon commands on robots.

instruction, language model, robot, (14 more...)

arXiv.org Artificial Intelligence

2204.01691

Country: North America > United States > California > San Mateo County > Menlo Park (0.04)

Genre:

Workflow (0.93)
Research Report (0.81)

Industry: Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)

Add feedback

Deep Science: Vision plus language could yield capable AI – TechCrunch

#artificialintelligenceApr-10-2022, 17:18:00 GMT

Depending on the theory of intelligence to which you subscribe, achieving "human-level" AI will require a system that can leverage multiple modalities -- e.g., sound, vision and text -- to reason about the world. For example, when shown an image of a toppled truck and a police cruiser on a snowy freeway, a human-level AI might infer that dangerous road conditions caused an accident. Or, running on a robot, when asked to grab a can of soda from the refrigerator, they'd navigate around people, furniture and pets to retrieve the can and place it within reach of the requester. But new research shows signs of encouraging progress, from robots that can figure out steps to satisfy basic commands (e.g., "get a water bottle") to text-producing systems that learn from explanations. In this revived edition of Deep Science, our weekly series about the latest developments in AI and the broader scientific field, we're covering work out of DeepMind, Google and OpenAI that makes strides toward systems that can -- if not perfectly understand the world -- solve narrow tasks like generating images with impressive robustness.

dall-e 2, deep science, robot, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.65)

Add feedback