AITopics | Hsu, David

Collaborating Authors

Hsu, David

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

FUNCTO: Function-Centric One-Shot Imitation Learning for Tool Manipulation

Tang, Chao, Xiao, Anxing, Deng, Yuhong, Hu, Tianrun, Dong, Wenlong, Zhang, Hanbo, Hsu, David, Zhang, Hong

arXiv.org Artificial IntelligenceFeb-17-2025

Abstract--Learning tool use from a single human demonstration video offers a highly intuitive and efficient approach to robot teaching. While humans can effortlessly generalize a demonstrated tool manipulation skill to diverse tools that support the same function (e.g., pouring with a mug versus a teapot), current one-shot imitation learning (OSIL) methods struggle to achieve this. A key challenge lies in establishing functional correspondences between demonstration and test tools, considering significant geometric variations among tools with the same function (i.e., intra-function variations). To address this challenge, we propose FUNCTO (Function-Centric OSIL for Tool Manipulation), an OSIL method that establishes function-centric correspondences with a 3D functional keypoint representation, enabling robots to generalize tool manipulation skills from a single human demonstration video to novel tools with the same function despite significant intra-function variations. We evaluate FUNCTO against exiting modular OSIL methods and end-to-end behavioral cloning methods through real-robot experiments on diverse tool manipulation tasks. The results demonstrate the superiority of FUNCTO when generalizing to novel tools with intra-function geometric variations. More details are available at https://sites.google.com/view/functo. The ability to use tools has long been recognized as a hallmark of human intelligence [1]. Endowing robots with the same capability holds the promise of unlocking a wide range of downstream tasks and applications [2, 3, 4]. As a step towards this goal, we tackle the problem of one-shot imitation learning (OSIL) for tool manipulation, which involves teaching robots a tool manipulation skill with a single human demonstration video. Previous OSIL methods [4, 5, 6, 7, 8, 9, 10] above, it remains a non-trivial challenge for robots due assume that tools supporting the same function share highly to significant geometric variations (e.g., shape, size, topology) similar shapes or appearances.

large language model, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2502.11744

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Robi Butler: Remote Multimodal Interactions with Household Robot Assistant

Xiao, Anxing, Janaka, Nuwan, Hu, Tianrun, Gupta, Anshul, Li, Kaixin, Yu, Cunjun, Hsu, David

arXiv.org Artificial IntelligenceSep-30-2024

In this paper, we introduce Robi Butler, a novel household robotic system that enables multimodal interactions with remote users. Building on the advanced communication interfaces, Robi Butler allows users to monitor the robot's status, send text or voice instructions, and select target objects by hand pointing. At the core of our system is a high-level behavior module, powered by Large Language Models (LLMs), that interprets multimodal instructions to generate action plans. These plans are composed of a set of open vocabulary primitives supported by Vision Language Models (VLMs) that handle both text and pointing queries. The integration of the above components allows Robi Butler to ground remote multimodal instructions in the real-world home environment in a zero-shot manner. We demonstrate the effectiveness and efficiency of this system using a variety of daily household tasks that involve remote users giving multimodal instructions. Additionally, we conducted a user study to analyze how multimodal interactions affect efficiency and user experience during remote human-robot interaction and discuss the potential improvements.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2409.20548

Country:

Asia (0.28)
North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Stable Object Placement Under Geometric Uncertainty via Differentiable Contact Dynamics

Li, Linfeng, Yang, Gang, Shao, Lin, Hsu, David

arXiv.org Artificial IntelligenceSep-26-2024

From serving a cup of coffee to carefully rearranging delicate items, stable object placement is a crucial skill for future robots. This skill is challenging due to the required accuracy, which is difficult to achieve under geometric uncertainty. We leverage differentiable contact dynamics to develop a principled method for stable object placement under geometric uncertainty. We estimate the geometric uncertainty by minimizing the discrepancy between the force-torque sensor readings and the model predictions through gradient descent. We further keep track of a belief over multiple possible geometric parameters to mitigate the gradient-based method's sensitivity to the initialization. We verify our approach in the real world on various geometric uncertainties, including the in-hand pose uncertainty of the grasped object, the object's shape uncertainty, and the environment's shape uncertainty.

artificial intelligence, geometric uncertainty, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2409.17725

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

IntentionNet: Map-Lite Visual Navigation at the Kilometre Scale

Gao, Wei, Ai, Bo, Loo, Joel, Vinay, null, Hsu, David

arXiv.org Artificial IntelligenceJul-3-2024

Inspired by modern datadriven through diverse environments to distant goals? This remains approaches, the lower level of our system design is an open challenge due to the complexity and difficulty of a neural network-based controller that maps observations designing a robot that can generalise over environments, directly to velocity commands, and which is learned end-toend tolerate significant mapping and positioning inaccuracies from real world experience. Neural networks have the and recover from inevitable navigation errors. While many flexibility to accept a wide variety of input types, and we works tackle robot navigation, few systems capable of find that design space for the signals used by the system's long-range, kilometre-scale navigation exist. Classical robot upper level to guide the lower level is large. We exploit systems capable of long-range navigation like Montemerlo this property to design several different types of guidance et al. (2008); Kümmerle et al. (2013) use e xplicit signals, which we call intentions. We find that designing maps and find paths over them using classical planning the appropriate intention imbues the navigation system with algorithms (Siegwart et al. 2011), allowing them to reach specific abilities, such as the ability to tolerate significant arbitrarily distant goals in principle.

artificial intelligence, controller, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2407.03122

Country:

North America > United States (1.00)
Europe (0.68)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.50)

Industry: Transportation > Ground > Road (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Open Scene Graphs for Open World Object-Goal Navigation

Loo, Joel, Wu, Zhanxin, Hsu, David

arXiv.org Artificial IntelligenceJul-2-2024

How can we build robots for open-world semantic navigation tasks, like searching for target objects in novel scenes? While foundation models have the rich knowledge and generalisation needed for these tasks, a suitable scene representation is needed to connect them into a complete robot system. We address this with Open Scene Graphs (OSGs), a topo-semantic representation that retains and organises open-set scene information for these models, and has a structure that can be configured for different environment types. We integrate foundation models and OSGs into the OpenSearch system for Open World Object-Goal Navigation, which is capable of searching for open-set objects specified in natural language, while generalising zero-shot across diverse environments and embodiments. Our OSGs enhance reasoning with Large Language Models (LLM), enabling robust object-goal navigation outperforming existing LLM approaches. Through simulation and real-world experiments, we validate OpenSearch's generalisation across varied environments, robots and novel instructions.

large language model, machine learning, node, (20 more...)

arXiv.org Artificial Intelligence

2407.02473

Country: Asia > Middle East > Israel (0.28)

Genre: Research Report (0.63)

Industry: Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

"Set It Up!": Functional Object Arrangement with Compositional Generative Models

Xu, Yiqing, Mao, Jiayuan, Du, Yilun, Lozáno-Pérez, Tomas, Kaebling, Leslie Pack, Hsu, David

arXiv.org Artificial IntelligenceMay-20-2024

This paper studies the challenge of developing robots capable of understanding under-specified instructions for creating functional object arrangements, such as "set up a dining table for two"; previous arrangement approaches have focused on much more explicit instructions, such as "put object A on the table." We introduce a framework, SetItUp, for learning to interpret under-specified instructions. SetItUp takes a small number of training examples and a human-crafted program sketch to uncover arrangement rules for specific scene types. By leveraging an intermediate graph-like representation of abstract spatial relationships among objects, SetItUp decomposes the arrangement problem into two subproblems: i) learning the arrangement patterns from limited data and ii) grounding these abstract relationships into object poses. SetItUp leverages large language models (LLMs) to propose the abstract spatial relationships among objects in novel scenes as the constraints to be satisfied; then, it composes a library of diffusion models associated with these abstract relationships to find object poses that satisfy the constraints. We validate our framework on a dataset comprising study desks, dining tables, and coffee tables, with the results showing superior performance in generating physically plausible, functional, and aesthetically pleasing object arrangements compared to existing models.

large language model, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2405.11928

Country: North America > United States (0.27)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.47)

Industry:

Leisure & Entertainment > Games (0.46)
Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.88)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)

Add feedback

Neural Randomized Planning for Whole Body Robot Motion

Lu, Yunfan, Ma, Yuchen, Hsu, David, Pan, Caicai

arXiv.org Artificial IntelligenceMay-18-2024

Robot motion planning has made vast advances over the past decades, but the challenge remains: robot mobile manipulators struggle to plan long-range whole-body motion in common household environments in real time, because of high-dimensional robot configuration space and complex environment geometry. To tackle the challenge, this paper proposes Neural Randomized Planner (NRP), which combines a global sampling-based motion planning (SBMP) algorithm and a local neural sampler. Intuitively, NRP uses the search structure inside the global planner to stitch together learned local sampling distributions to form a global sampling distribution adaptively. It benefits from both learning and planning. Locally, it tackles high dimensionality by learning to sample in promising regions from data, with a rich neural network representation. Globally, it composes the local sampling distributions through planning and exploits local geometric similarity to scale up to complex environments. Experiments both in simulation and on a real robot show \NRP yields superior performance compared to some of the best classical and learning-enhanced SBMP algorithms. Further, despite being trained in simulation, NRP demonstrates zero-shot transfer to a real robot operating in novel household environments, without any fine-tuning or manual adaptation.

artificial intelligence, machine learning, sampler, (20 more...)

arXiv.org Artificial Intelligence

2405.11317

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Automobiles & Trucks (0.67)
Energy > Oil & Gas (0.46)
Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.89)

Add feedback

Scene Action Maps: Behavioural Maps for Navigation without Metric Information

Loo, Joel, Hsu, David

arXiv.org Artificial IntelligenceMay-13-2024

Humans are remarkable in their ability to navigate without metric information. We can read abstract 2D maps, such as floor-plans or hand-drawn sketches, and use them to navigate in unseen rich 3D environments, without requiring prior traversals to map out these scenes in detail. We posit that this is enabled by the ability to represent the environment abstractly as interconnected navigational behaviours, e.g., "follow the corridor" or "turn right", while avoiding detailed, accurate spatial information at the metric level. We introduce the Scene Action Map (SAM), a behavioural topological graph, and propose a learnable map-reading method, which parses a variety of 2D maps into SAMs. Map-reading extracts salient information about navigational behaviours from the overlooked wealth of pre-existing, abstract and inaccurate maps, ranging from floor-plans to sketches. We evaluate the performance of SAMs for navigation, by building and deploying a behavioural navigation stack on a quadrupedal robot. Videos and more information is available at: https://scene-action-maps.github.io.

artificial intelligence, machine learning, navigation, (18 more...)

arXiv.org Artificial Intelligence

2405.07948

Country: Oceania > Australia (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.34)

Add feedback

Differentiable Particles for General-Purpose Deformable Object Manipulation

Chen, Siwei, Xu, Yiqing, Yu, Cunjun, Li, Linfeng, Hsu, David

arXiv.org Artificial IntelligenceMay-2-2024

Deformable object manipulation is a long-standing challenge in robotics. While existing approaches often focus narrowly on a specific type of object, we seek a general-purpose algorithm, capable of manipulating many different types of objects: beans, rope, cloth, liquid, . . . . One key difficulty is a suitable representation, rich enough to capture object shape, dynamics for manipulation and yet simple enough to be acquired effectively from sensor data. Specifically, we propose Differentiable Particles (DiPac), a new algorithm for deformable object manipulation. DiPac represents a deformable object as a set of particles and uses a differentiable particle dynamics simulator to reason about robot manipulation. To find the best manipulation action, DiPac combines learning, planning, and trajectory optimization through differentiable trajectory tree optimization. Differentiable dynamics provides significant benefits and enable DiPac to (i) estimate the dynamics parameters efficiently, thereby narrowing the sim-to-real gap, and (ii) choose the best action by backpropagating the gradient along sampled trajectories. Both simulation and real-robot experiments show promising results. DiPac handles a variety of object types. By combining planning and learning, DiPac outperforms both pure model-based planning methods and pure data-driven learning methods. In addition, DiPac is robust and adapts to changes in dynamics, thereby enabling the transfer of an expert policy from one object to another with different physical properties, e.g., from a rigid rod to a deformable rope.

artificial intelligence, optimization, optimization problem, (16 more...)

arXiv.org Artificial Intelligence

2405.01044

Country:

Europe > Switzerland (0.14)
Asia (0.14)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

INVIGORATE: Interactive Visual Grounding and Grasping in Clutter

Zhang, Hanbo, Lu, Yunfan, Yu, Cunjun, Hsu, David, Lan, Xuguang, Zheng, Nanning

arXiv.org Artificial IntelligenceJan-7-2024

This paper presents INVIGORATE, a robot system that interacts with human through natural language and grasps a specified object in clutter. The objects may occlude, obstruct, or even stack on top of one another. INVIGORATE embodies several challenges: (i) infer the target object among other occluding objects, from input language expressions and RGB images, (ii) infer object blocking relationships (OBRs) from the images, and (iii) synthesize a multi-step plan to ask questions that disambiguate the target object and to grasp it successfully. We train separate neural networks for object detection, for visual grounding, for question generation, and for OBR detection and grasping. They allow for unrestricted object categories and language expressions, subject to the training datasets. However, errors in visual perception and ambiguity in human languages are inevitable and negatively impact the robot's performance. To overcome these uncertainties, we build a partially observable Markov decision process (POMDP) that integrates the learned neural network modules. Through approximate POMDP planning, the robot tracks the history of observations and asks disambiguation questions in order to achieve a near-optimal sequence of actions that identify and grasp the target object. INVIGORATE combines the benefits of model-based POMDP planning and data-driven deep learning. Preliminary experiments with INVIGORATE on a Fetch robot show significant benefits of this integrated approach to object grasping in clutter with natural language interactions. A demonstration video is available at https://youtu.be/zYakh80SGcU.

artificial intelligence, invigorate, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2108.11092

Country: Asia > China (0.28)

Genre:

Overview (0.93)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback