attention space
RoboSeek: You Need to Interact with Your Objects
Peng, Yibo, Yang, Jiahao, Yan, Shenhao, Huang, Ziyu, Li, Shuang, Cui, Shuguang, Zhao, Yiming, Han, Yatong
Optimizing and refining action execution through exploration and interaction is a promising way for robotic manipulation. However, practical approaches to interaction-driven robotic learning are still underexplored, particularly for long-horizon tasks where sequential decision-making, physical constraints, and perceptual uncertainties pose significant challenges. Motivated by embodied cognition theory, we propose RoboSeek, a framework for embodied action execution that leverages interactive experience to accomplish manipulation tasks. RoboSeek optimizes prior knowledge from high-level perception models through closed-loop training in simulation and achieves robust real-world execution via a real2sim2real transfer pipeline. Specifically, we first replicate real-world environments in simulation using 3D reconstruction to provide visually and physically consistent environments, then we train policies in simulation using reinforcement learning and the cross-entropy method leveraging visual priors. The learned policies are subsequently deployed on real robotic platforms for execution. RoboSeek is hardware-agnostic and is evaluated on multiple robotic platforms across eight long-horizon manipulation tasks involving sequential interactions, tool use, and object handling. Our approach achieves an average success rate of 79%, significantly outperforming baselines whose success rates remain below 50%, highlighting its generalization and robustness across tasks and platforms. Experimental results validate the effectiveness of our training framework in complex, dynamic real-world settings and demonstrate the stability of the proposed real2sim2real transfer mechanism, paving the way for more generalizable embodied robotic learning. Project Page: https://russderrick.github.io/Roboseek/
Selective Exploration and Information Gathering in Search and Rescue Using Hierarchical Learning Guided by Natural Language Input
Panagopoulos, Dimitrios, Perrusquia, Adolfo, Guo, Weisi
In recent years, robots and autonomous systems have become increasingly integral to our daily lives, offering solutions to complex problems across various domains. Their application in search and rescue (SAR) operations, however, presents unique challenges. Comprehensively exploring the disaster-stricken area is often infeasible due to the vastness of the terrain, transformed environment, and the time constraints involved. Traditional robotic systems typically operate on predefined search patterns and lack the ability to incorporate and exploit ground truths provided by human stakeholders, which can be the key to speeding up the learning process and enhancing triage. Addressing this gap, we introduce a system that integrates social interaction via large language models (LLMs) with a hierarchical reinforcement learning (HRL) framework. The proposed system is designed to translate verbal inputs from human stakeholders into actionable RL insights and adjust its search strategy. By leveraging human-provided information through LLMs and structuring task execution through HRL, our approach not only bridges the gap between autonomous capabilities and human intelligence but also significantly improves the agent's learning efficiency and decision-making process in environments characterised by long horizons and sparse rewards.
AttViz: Online exploration of self-attention for transparent neural language modeling
Škrlj, Blaž, Eržen, Nika, Sheehan, Shane, Luz, Saturnino, Robnik-Šikonja, Marko, Pollak, Senja
Neural language models are becoming the prevailing methodology for the tasks of query answering, text classification, disambiguation, completion and translation. Commonly comprised of hundreds of millions of parameters, these neural network models offer state-of-the-art performance at the cost of interpretability; humans are no longer capable of tracing and understanding how decisions are being made. The attention mechanism, introduced initially for the task of translation, has been successfully adopted for other language-related tasks. We propose AttViz, an online toolkit for exploration of self-attention---real values associated with individual text tokens. We show how existing deep learning pipelines can produce outputs suitable for AttViz, offering novel visualizations of the attention heads and their aggregations with minimal effort, online. We show on examples of news segments how the proposed system can be used to inspect and potentially better understand what a model has learned (or emphasized).
Machine Learning Recognition & Implications For Our AI Velociraptor And Us CleanTechnica
Plastic Dinosaur approached the irregularly shaped object, a platform with four stems reaching to the floor with a projection upward in the rear. He was skittish as he approached, not knowing what it was, his amygdalanet sending out warnings while curiousnet sent out its desire to approach. It didn't react or move or make noise, so he sniffed it, nudged it and then walked away to charge. Plastic Dinosaur approached the irregularly shaped object, a platform with four stems reaching to the floor with a projection upward on the right. He was skittish as he approached, not knowing what it was, his amygdalanet sending out warnings while curiousnet sent out its desire to approach.
How Would A Robotic Machine Learning Velociraptor Learn To Play Goalie? CleanTechnica
The 1.5 meter, silvery gray velociraptor lunges forward, interrupting the flight of the tennis ball with its head before the ball can get to the soccer net at the end of the gym. Its tail stretches out, stopping another ball. It pivots, somewhat clumsily, and runs three steps in the other direction to intercept a third ball. Robots building Teslas aren't as sophisticated as AI velociraptors that tend goals It's been doing this for an hour, running back and forth as a trio of tennis ball machines toss yellow balls in various loopy ways toward the net. It's a game that its creators have invented to rapidly improve its coordination. But then it stops trying to intercept the balls, although it still twitches toward them.