AITopics | Song, Daeun

Collaborating Authors

Song, Daeun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AutoSpatial: Visual-Language Reasoning for Social Robot Navigation through Efficient Spatial Reasoning Learning

Kong, Yangzhe, Song, Daeun, Liang, Jing, Manocha, Dinesh, Yao, Ziyu, Xiao, Xuesu

arXiv.org Artificial IntelligenceMar-10-2025

We present a novel method, AutoSpatial, an efficient approach with structured spatial grounding to enhance VLMs' spatial reasoning. By combining minimal manual supervision with large-scale Visual Question-Answering (VQA) pairs auto-labeling, our approach tackles the challenge of VLMs' limited spatial understanding in social navigation tasks. By applying a hierarchical two-round VQA strategy during training, AutoSpatial achieves both global and detailed understanding of scenarios, demonstrating more accurate spatial perception, movement prediction, Chain of Thought (CoT) reasoning, final action, and explanation compared to other SOTA approaches. These five components are essential for comprehensive social navigation reasoning. Our approach was evaluated using both expert systems (GPT-4o, Gemini 2.0 Flash, and Claude 3.5 Sonnet) that provided cross-validation scores and human evaluators who assigned relative rankings to compare model performances across four key aspects. Augmented by the enhanced spatial reasoning capabilities, AutoSpatial demonstrates substantial improvements by averaged cross-validation score from expert systems in: perception & prediction (up to 10.71%), reasoning (up to 16.26%), action (up to 20.50%), and explanation (up to 18.73%) compared to baseline models trained only on manually annotated data.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2503.07557

Country: North America > United States > Maryland (0.14)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Social-LLaVA: Enhancing Robot Navigation through Human-Language Reasoning in Social Spaces

Payandeh, Amirreza, Song, Daeun, Nazeri, Mohammad, Liang, Jing, Mukherjee, Praneel, Raj, Amir Hossain, Kong, Yangzhe, Manocha, Dinesh, Xiao, Xuesu

arXiv.org Artificial IntelligenceDec-30-2024

Most existing social robot navigation techniques either leverage hand-crafted rules or human demonstrations to connect robot perception to socially compliant actions. However, there remains a significant gap in effectively translating perception into socially compliant actions, much like how human reasoning naturally occurs in dynamic environments. Considering the recent success of Vision-Language Models (VLMs), we propose using language to bridge the gap in human-like reasoning between perception and socially aware robot actions. We create a vision-language dataset, Social robot Navigation via Explainable Interactions (SNEI), featuring 40K human-annotated Visual Question Answers (VQAs) based on 2K human-robot social interactions in unstructured, crowded public spaces, spanning perception, prediction, chain-of-thought reasoning, action, and explanation. We fine-tune a VLM, Social-LLaVA, using SNEI to demonstrate the practical application of our dataset. Social-LLaVA outperforms state-of-the-art models like GPT-4V and Gemini, based on the average of fifteen different human-judge scores across 50 VQA. Deployed onboard a mobile robot, Social-LLaVA enables human-like reasoning, marking a promising step toward socially compliant robot navigation in dynamic public spaces through language reasoning.

artificial intelligence, enhancing robot navigation, human-language reasoning, (2 more...)

arXiv.org Artificial Intelligence

2501.09024

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

GND: Global Navigation Dataset with Multi-Modal Perception and Multi-Category Traversability in Outdoor Campus Environments

Liang, Jing, Das, Dibyendu, Song, Daeun, Shuvo, Md Nahid Hasan, Durrani, Mohammad, Taranath, Karthik, Penskiy, Ivan, Manocha, Dinesh, Xiao, Xuesu

arXiv.org Artificial IntelligenceSep-26-2024

Navigating large-scale outdoor environments requires complex reasoning in terms of geometric structures, environmental semantics, and terrain characteristics, which are typically captured by onboard sensors such as LiDAR and cameras. While current mobile robots can navigate such environments using pre-defined, high-precision maps based on hand-crafted rules catered for the specific environment, they lack commonsense reasoning capabilities that most humans possess when navigating unknown outdoor spaces. To address this gap, we introduce the Global Navigation Dataset (GND), a large-scale dataset that integrates multi-modal sensory data, including 3D LiDAR point clouds and RGB and 360-degree images, as well as multi-category traversability maps (pedestrian walkways, vehicle roadways, stairs, off-road terrain, and obstacles) from ten university campuses. These environments encompass a variety of parks, urban settings, elevation changes, and campus layouts of different scales. The dataset covers approximately 2.7km2 and includes at least 350 buildings in total. We also present a set of novel applications of GND to showcase its utility to enable global robot navigation, such as map-based global navigation, mapless navigation, and global place recognition.

artificial intelligence, navigation, survey article, (17 more...)

arXiv.org Artificial Intelligence

2409.14262

Country: North America > United States (0.28)

Genre:

Overview (0.48)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.69)
Information Technology > Artificial Intelligence > Robots > Locomotion (0.51)

Add feedback

VLM-Social-Nav: Socially Aware Robot Navigation through Scoring using Vision-Language Models

Song, Daeun, Liang, Jing, Payandeh, Amirreza, Xiao, Xuesu, Manocha, Dinesh

arXiv.org Artificial IntelligenceJul-7-2024

We propose VLM-Social-Nav, a novel Vision-Language Model (VLM) based navigation approach to compute a robot's motion in human-centered environments. Our goal is to make real-time decisions on robot actions that are socially compliant with human expectations. We utilize a perception model to detect important social entities and prompt a VLM to generate guidance for socially compliant robot behavior. VLM-Social-Nav uses a VLM-based scoring module that computes a cost term that ensures socially appropriate and effective robot actions generated by the underlying planner. Our overall approach reduces reliance on large training datasets and enhances adaptability in decision-making. In practice, it results in improved socially compliant navigation in human-shared environments. We demonstrate and evaluate our system in four different real-world social navigation scenarios with a Turtlebot robot. We observe at least 27.38% improvement in the average success rate and 19.05% improvement in the average collision rate in the four social navigation scenarios. Our user study score shows that VLM-Social-Nav generates the most socially compliant navigation behavior.

large language model, natural language, navigation, (17 more...)

arXiv.org Artificial Intelligence

2404.0021

Country: North America > United States > Maryland (0.14)

Genre:

Research Report (1.00)
Questionnaire & Opinion Survey (0.71)

Industry: Transportation (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)

Add feedback

DTG : Diffusion-based Trajectory Generation for Mapless Global Navigation

Liang, Jing, Payandeh, Amirreza, Song, Daeun, Xiao, Xuesu, Manocha, Dinesh

arXiv.org Artificial IntelligenceMar-24-2024

We present a novel end-to-end diffusion-based trajectory generation method, DTG, for mapless global navigation in challenging outdoor scenarios with occlusions and unstructured off-road features like grass, buildings, bushes, etc. Given a distant goal, our approach computes a trajectory that satisfies the following goals: (1) minimize the travel distance to the goal; (2) maximize the traversability by choosing paths that do not lie in undesirable areas. Specifically, we present a novel Conditional RNN(CRNN) for diffusion models to efficiently generate trajectories. Furthermore, we propose an adaptive training method that ensures that the diffusion model generates more traversable trajectories. We evaluate our methods in various outdoor scenes and compare the performance with other global navigation algorithms on a Husky robot. In practice, we observe at least a 15% improvement in traveling distance and around a 7% improvement in traversability.

artificial intelligence, machine learning, trajectory, (18 more...)

arXiv.org Artificial Intelligence

2403.099

Country:

North America > United States > Maryland (0.14)
North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.93)

Add feedback

TSP-Bot: Robotic TSP Pen Art using High-DoF Manipulators

Song, Daeun, Lim, Eunjung, Park, Jiyoon, Jung, Minjung, Kim, Young J.

arXiv.org Artificial IntelligenceSep-14-2023

TSP art is an art form for drawing an image using piecewise-continuous line segments. This paper presents a robotic pen drawing system capable of creating complicated TSP pen art on a planar surface using multiple colors. The system begins by converting a colored raster image into a set of points that represent the image's tone, which can be controlled by adjusting the point density. Next, the system finds a piecewise-continuous linear path that visits each point exactly once, which is equivalent to solving a Traveling Salesman Problem (TSP). The path is simplified with fewer points using bounded approximation and smoothed and optimized using Bezier spline curves with bounded curvature. Our robotic drawing system consisting of single or dual manipulators with fingered grippers and a mobile platform performs the drawing task by following the resulting complex and sophisticated path composed of thousands of TSP sites. As a result, our system can draw a complicated and visually pleasing TSP pen art.

artificial intelligence, manipulator, robot, (16 more...)

arXiv.org Artificial Intelligence

2210.07592

Country: Asia (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Stroke-based Rendering and Planning for Robotic Performance of Artistic Drawing

Ilinkin, Ivaylo, Song, Daeun, Kim, Young J.

arXiv.org Artificial IntelligenceMar-3-2023

We present a new robotic drawing system based on stroke-based rendering (SBR). Our motivation is the artistic quality of the whole performance. Not only should the generated strokes in the final drawing resemble the input image, but the stroke sequence should also exhibit a human artist's planning process. Thus, when a robot executes the drawing task, both the drawing results and the way the robot executes would look artistic. Our SBR system is based on image segmentation and depth estimation. It generates the drawing strokes in an order that allows for the intended shape to be perceived quickly and for its detailed features to be filled in and emerge gradually when observed by the human. This ordering represents a stroke plan that the drawing robot should follow to create an artistic rendering of images. We experimentally demonstrate that our SBR-based drawing makes visually pleasing artistic images, and our robotic system can replicate the result with proper sequences of stroke drawing.

artificial intelligence, segmentation, sequence, (14 more...)

arXiv.org Artificial Intelligence

2210.0759

Country: North America > United States (0.47)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.35)

Add feedback