perception and control
Unsupervised Learning of Object Keypoints for Perception and Control
The study of object representations in computer vision has primarily focused on developing representations that are useful for image classification, object detection, or semantic segmentation as downstream tasks. In this work we aim to learn object representations that are useful for control and reinforcement learning (RL). To this end, we introduce Transporter, a neural network architecture for discovering concise geometric object representations in terms of keypoints or image-space coordinates. Our method learns from raw video frames in a fully unsupervised manner, by transporting learnt image features between video frames using a keypoint bottleneck. The discovered keypoints track objects and object parts across long time-horizons more accurately than recent similar methods. Furthermore, consistent long-term tracking enables two notable results in control domains -- (1) using the keypoint co-ordinates and corresponding image features as inputs enables highly sample-efficient reinforcement learning; (2) learning to explore by controlling keypoint locations drastically reduces the search space, enabling deep exploration (leading to states unreachable through random action exploration) without any extrinsic rewards.
Best of Sim and Real: Decoupled Visuomotor Manipulation via Learning Control in Simulation and Perception in Real
Huang, Jialei, Yin, Zhaoheng, Hu, Yingdong, Wang, Shuo, Lin, Xingyu, Gao, Yang
Abstract-- Sim-to-real transfer remains a fundamental challenge in robot manipulation due to the entanglement of perception and control in end-to-end learning. We present a decoupled framework that learns each component where it is most reliable: control policies are trained in simulation with privileged state to master spatial layouts and manipulation dynamics, while perception is adapted only at deployment to bridge real observations to the frozen control policy. Our key insight is that control strategies and action patterns are universal across environments and can be learned in simulation through systematic randomization, while perception is inherently domain-specific and must be learned where visual observations are authentic. Unlike existing end-to-end approaches that require extensive real-world data, our method achieves strong performance with only 10-20 real demonstrations by reducing the complex sim-to-real problem to a structured perception alignment task. We validate our approach on tabletop manipulation tasks, demonstrating superior data efficiency and out-of-distribution generalization compared to end-to-end baselines. The learned policies successfully handle object positions and scales beyond the training distribution, confirming that decoupling perception from control fundamentally improves sim-to-real transfer . I. INTRODUCTION Simulation environments provide a safe, scalable, and cost-effective platform for robot learning [6]. We can run thousands of robots in parallel, automatically reset environments, and access perfect state information in simulation, making large-scale interactive learning feasible.
- North America > United States > California > Alameda County > Berkeley (0.14)
- Asia > China > Beijing > Beijing (0.04)
- Education (0.66)
- Leisure & Entertainment > Games > Computer Games (0.34)
Agile in the Face of Delay: Asynchronous End-to-End Learning for Real-World Aerial Navigation
Li, Yude, Zhou, Zhexuan, Li, Huizhe, Gong, Youmin, Mei, Jie
Robust autonomous navigation for Autonomous Aerial Vehicles (AAVs) in complex environments is a critical capability. However, modern end-to-end navigation faces a key challenge: the high-frequency control loop needed for agile flight conflicts with low-frequency perception streams, which are limited by sensor update rates and significant computational cost. This mismatch forces conventional synchronous models into undesirably low control rates. To resolve this, we propose an asynchronous reinforcement learning framework that decouples perception and control, enabling a high-frequency policy to act on the latest IMU state for immediate reactivity, while incorporating perception features asynchronously. To manage the resulting data staleness, we introduce a theoretically-grounded Temporal Encoding Module (TEM) that explicitly conditions the policy on perception delays, a strategy complemented by a two-stage curriculum to ensure stable and efficient training. Validated in extensive simulations, our method was successfully deployed in zero-shot sim-to-real transfer on an onboard NUC, where it sustains a 100~Hz control rate and demonstrates robust, agile navigation in cluttered real-world environments. Our source code will be released for community reference.
- Asia > China > Heilongjiang Province > Harbin (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Information Technology (0.69)
- Transportation > Air (0.68)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Unsupervised Learning of Object Keypoints for Perception and Control
The study of object representations in computer vision has primarily focused on developing representations that are useful for image classification, object detection, or semantic segmentation as downstream tasks. In this work we aim to learn object representations that are useful for control and reinforcement learning (RL). To this end, we introduce Transporter, a neural network architecture for discovering concise geometric object representations in terms of keypoints or image-space coordinates. Our method learns from raw video frames in a fully unsupervised manner, by transporting learnt image features between video frames using a keypoint bottleneck. The discovered keypoints track objects and object parts across long time-horizons more accurately than recent similar methods.
Reviews: Unsupervised Learning of Object Keypoints for Perception and Control
They also show how accurate their keypoint prediction is with an object tracking task using ground truth coordinates. Lastly, they use these keypoints effectively in two downstream RL tasks: model-free RL (neural fitted q iteration) with keypoint-indexed features as input and sample-efficient exploration by defining an intrinsic reward based on maximizing each keypoints movement in x,-x, y,-y directions.
Unsupervised Learning of Object Keypoints for Perception and Control
The study of object representations in computer vision has primarily focused on developing representations that are useful for image classification, object detection, or semantic segmentation as downstream tasks. In this work we aim to learn object representations that are useful for control and reinforcement learning (RL). To this end, we introduce Transporter, a neural network architecture for discovering concise geometric object representations in terms of keypoints or image-space coordinates. Our method learns from raw video frames in a fully unsupervised manner, by transporting learnt image features between video frames using a keypoint bottleneck. The discovered keypoints track objects and object parts across long time-horizons more accurately than recent similar methods.
Unsupervised Learning of Object Keypoints for Perception and Control
Kulkarni, Tejas D., Gupta, Ankush, Ionescu, Catalin, Borgeaud, Sebastian, Reynolds, Malcolm, Zisserman, Andrew, Mnih, Volodymyr
The study of object representations in computer vision has primarily focused on developing representations that are useful for image classification, object detection, or semantic segmentation as downstream tasks. In this work we aim to learn object representations that are useful for control and reinforcement learning (RL). To this end, we introduce Transporter, a neural network architecture for discovering concise geometric object representations in terms of keypoints or image-space coordinates. Our method learns from raw video frames in a fully unsupervised manner, by transporting learnt image features between video frames using a keypoint bottleneck. The discovered keypoints track objects and object parts across long time-horizons more accurately than recent similar methods.
Evolution of learning and plastic neural networks for perception and control at Loughborough University
A funded PhD position is available at the Computer Science Department, School of Science, Loughborough University, UK, on the topic of the evolution of lifelong learning in neural networks. The aim is to develop new neuroevolution algorithms for lifelong learning. The objectives are to devise machine learning systems that autonomously adapt to changing conditions such as variation of the data distribution, variation of the problem domain or parameters, with minimal human intervention. The approach will use neuroevolution, neuromodulation, and other methodologies to continuously discover and update learning strategies, implement selective plasticity, and achieve continual learning. Application areas include a variety of automation and machine learning problems, e.g.
Video Friday: Security Robot as a Service, Robotic Mining, and Saved by a Drone
Video Friday is your weekly selection of awesome robotics videos, collected by your Automaton bloggers. We'll also be posting a weekly calendar of upcoming robotics events for the next few months; here's what we have so far (send us your events!): Let us know if you have suggestions for next week, and enjoy today's videos. Nothing is more secure than a workplace protected by prowling robots. But are the fish okay?
- Europe > Iceland (0.06)
- North America > United States > Florida > Brevard County (0.05)
- North America > United States > Colorado (0.05)
- Asia > China (0.05)
- Transportation (0.31)
- Government > Regional Government > North America Government (0.30)
- Energy > Power Industry (0.30)
iclr2016:main
The problem of building an autonomous robot has traditionally been viewed as one of integration: connecting together modular components, each one designed to handle some portion of the perception and decision making process. For example, a vision system might be connected to a planner that might in turn provide commands to a low-level controller that drives the robot's motors. In this talk, I will discuss how ideas from deep learning can allow us to build robotic control mechanisms that combine both perception and control into a single system. This system can then be trained end-to-end on the task at hand. I will show how this end-to-end approach actually simplifies the perception and control problems, by allowing the perception and control mechanisms to adapt to one another and to the task.