AITopics | touchdown

A Limitations

Neural Information Processing SystemsMay-29-2025, 20:29:17 GMT

This work studies how language descriptions in unlabeled demonstrations benefit learning from observations. The environments used in this work are simulations. Despite variety across grounding challenges, performance on these environments do not necessarily transfer to other applications such as robotic control. A promising direction for future work is to investigate whether dynamics modelling on language observations show similar benefits in other applications. The methodology in this work are based on reinforcement learning, which may learn uninterpretable policies that achieve the objective in surprising ways (e.g. a robot that bumps along the cabinet while fetching dishes to clean).

demonstration, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.55)

Add feedback

Improving Policy Learning via Language Dynamics Distillation

Neural Information Processing SystemsMay-29-2025, 20:29:14 GMT

Recent work has shown that augmenting environments with language descriptions improves policy learning. However, for environments with complex language abstractions, learning how to ground language to observations is difficult due to sparse, delayed rewards. We propose Language Dynamics Distillation (LDD), which pretrains a model to predict environment dynamics given demonstrations with language descriptions, and then fine-tunes these language-aware pretrained representations via reinforcement learning (RL). In this way, the model is trained to both maximize expected reward and retain knowledge about how language relates to environment dynamics. On SILG, a benchmark of five tasks with language descriptions that evaluate distinct generalization challenges on unseen environments (NetHack, ALFWorld, RTFM, Messenger, and Touchdown), LDD outperforms tabula-rasa RL, VAE pretraining, and methods that learn from unlabeled demonstrations in inverse RL and reward shaping with pretrained experts. In our analyses, we show that language descriptions in demonstrations improve sample-efficiency and generalization across environments, and that dynamics modeling with expert demonstrations is more effective than with non-experts.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.91)

Add feedback

1 Details about Extended Touchdown Dataset

Neural Information Processing SystemsMay-29-2025, 19:33:22 GMT

First, we choose some panorama IDs in the test data of the Touchdown dataset and download the panoramasin equirectangular projection. Then we slice each into eight images and project them to perspective projection. Next we put touchdowns on the target locations in the panoramas and write some language descriptions to instruct people to find them. After that, we also ask some volunteers to double check the annotations by looking for the target with the language we annotate. In addition, these data are collected from the New York StreetView.

artificial intelligence, natural language, touchdown, (13 more...)

Neural Information Processing Systems

Country: North America > United States > New York (0.25)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.34)

Add feedback

SIRI: Spatial Relation Induced Network For Spatial Description Resolution

Neural Information Processing SystemsMay-29-2025, 19:33:13 GMT

Spatial Description Resolution, as a language-guided localization task, is proposed for target location in a panoramic street view, given corresponding language descriptions. Explicitly characterizing an object-level relationship while distilling spatial relationships are currently absent but crucial to this task. Mimicking humans, who sequentially traverse spatial relationship words and objects with a first-person view to locate their target, we propose a novel spatial relationship induced (SIRI) network. Specifically, visual features are firstly correlated at an implicit object-level in a projected latent space; then they are distilled by each spatial relationship word, resulting in each differently activated feature representing each spatial relationship. Further, we introduce global position priors to fix the absence of positional information, which may result in global positional reasoning ambiguities. Both the linguistic and visual features are concatenated to finalize the target localization. Experimental results on the Touchdown show that our method is around 24% better than the state-of-the-art method in terms of accuracy, measured by an 80-pixel radius. Our method also generalizes well on our proposed extended dataset collected using the same settings as Touchdown.

machine learning, natural language, spatial relationship, (19 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre: Research Report (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.68)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

51053d7b8473df7d5a2165b2a8ee9629-Supplemental-Conference.pdf

Neural Information Processing SystemsMar-21-2025, 22:07:36 GMT

demonstration, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.71)

Add feedback

SILG: The Multi-environment Symbolic Interactive Language Grounding Benchmark, Sida I. Wang

Neural Information Processing SystemsMar-21-2025, 03:13:16 GMT

How do we build unified models that apply across multiple environments? We propose the multi-environment Symbolic Interactive Language Grounding benchmark (SILG), which unifies a collection of diverse grounded language learning environments under a common interface. SILG consists of grid-world environments that require generalization to new dynamics, entities, and partially observed worlds (RTFM, Messenger, NetHack), as well as symbolic counterparts of visual worlds that require interpreting rich natural language with respect to complex scenes (ALFWorld, Touchdown). Together, these environments provide diverse grounding challenges in richness of observation space, action space, language specification, and plan complexity. In addition, we propose the first shared model architecture for RL on these environments, and evaluate recent advances such as egocentric local convolution, recurrent state-tracking, entity-centric attention, and pretrained LM using SILG.

machine learning, natural language, silg, (21 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

778609db5dc7e1a8315717a9cdd8fd6f-Supplemental.pdf

Neural Information Processing SystemsMar-19-2025, 12:51:29 GMT

artificial intelligence, natural language, touchdown, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.34)

Add feedback

SIRI: Spatial Relation Induced Network For Spatial Description Resolution

Neural Information Processing SystemsMar-19-2025, 12:51:21 GMT

Spatial Description Resolution, as a language-guided localization task, is proposed for target location in a panoramic street view, given corresponding language descriptions. Explicitly characterizing an object-level relationship while distilling spatial relationships are currently absent but crucial to this task. Mimicking humans, who sequentially traverse spatial relationship words and objects with a first-person view to locate their target, we propose a novel spatial relationship induced (SIRI) network. Specifically, visual features are firstly correlated at an implicit object-level in a projected latent space; then they are distilled by each spatial relationship word, resulting in each differently activated feature representing each spatial relationship. Further, we introduce global position priors to fix the absence of positional information, which may result in global positional reasoning ambiguities. Both the linguistic and visual features are concatenated to finalize the target localization. Experimental results on the Touchdown show that our method is around 24% better than the state-of-the-art method in terms of accuracy, measured by an 80-pixel radius. Our method also generalizes well on our proposed extended dataset collected using the same settings as Touchdown.

machine learning, natural language, spatial relationship, (19 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre: Research Report (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.68)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Realtime Limb Trajectory Optimization for Humanoid Running Through Centroidal Angular Momentum Dynamics

Sovukluk, Sait, Schuller, Robert, Englsberger, Johannes, Ott, Christian

arXiv.org Artificial IntelligenceJan-30-2025

One of the essential aspects of humanoid robot running is determining the limb-swinging trajectories. During the flight phases, where the ground reaction forces are not available for regulation, the limb swinging trajectories are significant for the stability of the next stance phase. Due to the conservation of angular momentum, improper leg and arm swinging results in highly tilted and unsustainable body configurations at the next stance phase landing. In such cases, the robotic system fails to maintain locomotion independent of the stability of the center of mass trajectories. This problem is more apparent for fast and high flight time trajectories. This paper proposes a real-time nonlinear limb trajectory optimization problem for humanoid running. The optimization problem is tested on two different humanoid robot models, and the generated trajectories are verified using a running algorithm for both robots in a simulation environment.

artificial intelligence, optimization problem, trajectory, (16 more...)

arXiv.org Artificial Intelligence

2501.17351

Country: Europe > Austria > Vienna (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

Model predictive control-based trajectory generation for agile landing of unmanned aerial vehicle on a moving boat

Procházka, Ondřej, Novák, Filip, Báča, Tomáš, Gupta, Parakh M., Pěnička, Robert, Saska, Martin

arXiv.org Artificial IntelligenceDec-10-2024

This paper proposes a novel trajectory generation method based on Model Predictive Control (MPC) for agile landing of an Unmanned Aerial Vehicle (UAV) onto an Unmanned Surface Vehicle (USV)'s deck in harsh conditions. The trajectory generation exploits the state predictions of the USV to create periodically updated trajectories for a multirotor UAV to precisely land on the deck of a moving USV even in cases where the deck's inclination is continuously changing. We use an MPC-based scheme to create trajectories that consider both the UAV dynamics and the predicted states of the USV up to the first derivative of position and orientation. Compared to existing approaches, our method dynamically modifies the penalization matrices to precisely follow the corresponding states with respect to the flight phase. Especially during the landing maneuver, the UAV synchronizes attitude with the USV's, allowing for fast landing on a tilted deck. Simulations show the method's reliability in various sea conditions up to Rough sea (wave height 4 m), outperforming state-of-the-art methods in landing speed and accuracy, with twice the precision on average. Finally, real-world experiments validate the simulation results, demonstrating robust landings on a moving USV, while all computations are performed in real-time onboard the UAV.

artificial intelligence, uav, usv, (11 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.oceaneng.2024.119164

2412.07332

Country: