AITopics | perception and control

Unsupervised Learning of Object Keypoints for Perception and Control

Neural Information Processing SystemsDec-26-2025, 01:25:43 GMT

The study of object representations in computer vision has primarily focused on developing representations that are useful for image classification, object detection, or semantic segmentation as downstream tasks. In this work we aim to learn object representations that are useful for control and reinforcement learning (RL). To this end, we introduce Transporter, a neural network architecture for discovering concise geometric object representations in terms of keypoints or image-space coordinates. Our method learns from raw video frames in a fully unsupervised manner, by transporting learnt image features between video frames using a keypoint bottleneck. The discovered keypoints track objects and object parts across long time-horizons more accurately than recent similar methods. Furthermore, consistent long-term tracking enables two notable results in control domains -- (1) using the keypoint co-ordinates and corresponding image features as inputs enables highly sample-efficient reinforcement learning; (2) learning to explore by controlling keypoint locations drastically reduces the search space, enabling deep exploration (leading to states unreachable through random action exploration) without any extrinsic rewards.

object keypoint, representation, unsupervised learning, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)

Add feedback

Best of Sim and Real: Decoupled Visuomotor Manipulation via Learning Control in Simulation and Perception in Real

Huang, Jialei, Yin, Zhaoheng, Hu, Yingdong, Wang, Shuo, Lin, Xingyu, Gao, Yang

arXiv.org Artificial IntelligenceOct-1-2025

Abstract-- Sim-to-real transfer remains a fundamental challenge in robot manipulation due to the entanglement of perception and control in end-to-end learning. We present a decoupled framework that learns each component where it is most reliable: control policies are trained in simulation with privileged state to master spatial layouts and manipulation dynamics, while perception is adapted only at deployment to bridge real observations to the frozen control policy. Our key insight is that control strategies and action patterns are universal across environments and can be learned in simulation through systematic randomization, while perception is inherently domain-specific and must be learned where visual observations are authentic. Unlike existing end-to-end approaches that require extensive real-world data, our method achieves strong performance with only 10-20 real demonstrations by reducing the complex sim-to-real problem to a structured perception alignment task. We validate our approach on tabletop manipulation tasks, demonstrating superior data efficiency and out-of-distribution generalization compared to end-to-end baselines. The learned policies successfully handle object positions and scales beyond the training distribution, confirming that decoupling perception from control fundamentally improves sim-to-real transfer . I. INTRODUCTION Simulation environments provide a safe, scalable, and cost-effective platform for robot learning [6]. We can run thousands of robots in parallel, automatically reset environments, and access perfect state information in simulation, making large-scale interactive learning feasible.

artificial intelligence, machine learning, spatial reasoning, (17 more...)

arXiv.org Artificial Intelligence

2509.25747

Country: North America > United States > California (0.28)

Genre: Research Report (0.50)

Industry:

Education (0.66)
Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots > Manipulation (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.34)

Add feedback

Agile in the Face of Delay: Asynchronous End-to-End Learning for Real-World Aerial Navigation

Li, Yude, Zhou, Zhexuan, Li, Huizhe, Gong, Youmin, Mei, Jie

arXiv.org Artificial IntelligenceSep-18-2025

Robust autonomous navigation for Autonomous Aerial Vehicles (AAVs) in complex environments is a critical capability. However, modern end-to-end navigation faces a key challenge: the high-frequency control loop needed for agile flight conflicts with low-frequency perception streams, which are limited by sensor update rates and significant computational cost. This mismatch forces conventional synchronous models into undesirably low control rates. To resolve this, we propose an asynchronous reinforcement learning framework that decouples perception and control, enabling a high-frequency policy to act on the latest IMU state for immediate reactivity, while incorporating perception features asynchronously. To manage the resulting data staleness, we introduce a theoretically-grounded Temporal Encoding Module (TEM) that explicitly conditions the policy on perception delays, a strategy complemented by a two-stage curriculum to ensure stable and efficient training. Validated in extensive simulations, our method was successfully deployed in zero-shot sim-to-real transfer on an onboard NUC, where it sustains a 100~Hz control rate and demonstrates robust, agile navigation in cluttered real-world environments. Our source code will be released for community reference.

artificial intelligence, ieee robotic and automation letter, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2509.13816

Country: Asia > China (0.28)

Genre: Research Report (0.64)

Industry:

Information Technology (0.69)
Transportation > Air (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Unsupervised Learning of Object Keypoints for Perception and Control

Neural Information Processing SystemsMay-27-2025, 16:33:58 GMT

The study of object representations in computer vision has primarily focused on developing representations that are useful for image classification, object detection, or semantic segmentation as downstream tasks. In this work we aim to learn object representations that are useful for control and reinforcement learning (RL). To this end, we introduce Transporter, a neural network architecture for discovering concise geometric object representations in terms of keypoints or image-space coordinates. Our method learns from raw video frames in a fully unsupervised manner, by transporting learnt image features between video frames using a keypoint bottleneck. The discovered keypoints track objects and object parts across long time-horizons more accurately than recent similar methods.

perception and control, representation, unsupervised learning, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.40)

Add feedback

Reviews: Unsupervised Learning of Object Keypoints for Perception and Control

Neural Information Processing SystemsJan-27-2025, 12:00:02 GMT

They also show how accurate their keypoint prediction is with an object tracking task using ground truth coordinates. Lastly, they use these keypoints effectively in two downstream RL tasks: model-free RL (neural fitted q iteration) with keypoint-indexed features as input and sample-efficient exploration by defining an intrinsic reward based on maximizing each keypoints movement in x,-x, y,-y directions.

exploration, keypoint, perception and control, (14 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.40)

Add feedback

Unsupervised Learning of Object Keypoints for Perception and Control

Neural Information Processing SystemsOct-11-2024, 01:45:20 GMT

The study of object representations in computer vision has primarily focused on developing representations that are useful for image classification, object detection, or semantic segmentation as downstream tasks. In this work we aim to learn object representations that are useful for control and reinforcement learning (RL). To this end, we introduce Transporter, a neural network architecture for discovering concise geometric object representations in terms of keypoints or image-space coordinates. Our method learns from raw video frames in a fully unsupervised manner, by transporting learnt image features between video frames using a keypoint bottleneck. The discovered keypoints track objects and object parts across long time-horizons more accurately than recent similar methods.

perception and control, representation, unsupervised learning, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.40)

Add feedback

Unsupervised Learning of Object Keypoints for Perception and Control

Kulkarni, Tejas D., Gupta, Ankush, Ionescu, Catalin, Borgeaud, Sebastian, Reynolds, Malcolm, Zisserman, Andrew, Mnih, Volodymyr

Neural Information Processing SystemsMar-19-2020, 01:01:51 GMT

The study of object representations in computer vision has primarily focused on developing representations that are useful for image classification, object detection, or semantic segmentation as downstream tasks. In this work we aim to learn object representations that are useful for control and reinforcement learning (RL). To this end, we introduce Transporter, a neural network architecture for discovering concise geometric object representations in terms of keypoints or image-space coordinates. Our method learns from raw video frames in a fully unsupervised manner, by transporting learnt image features between video frames using a keypoint bottleneck. The discovered keypoints track objects and object parts across long time-horizons more accurately than recent similar methods.

perception and control, representation, unsupervised learning, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.40)

Add feedback

Evolution of learning and plastic neural networks for perception and control at Loughborough University

#artificialintelligenceJul-18-2018, 14:06:21 GMT

A funded PhD position is available at the Computer Science Department, School of Science, Loughborough University, UK, on the topic of the evolution of lifelong learning in neural networks. The aim is to develop new neuroevolution algorithms for lifelong learning. The objectives are to devise machine learning systems that autonomously adapt to changing conditions such as variation of the data distribution, variation of the problem domain or parameters, with minimal human intervention. The approach will use neuroevolution, neuromodulation, and other methodologies to continuously discover and update learning strategies, implement selective plasticity, and achieve continual learning. Application areas include a variety of automation and machine learning problems, e.g.

artificial intelligence, loughborough university, machine learning, (10 more...)

#artificialintelligence

Country: Europe > United Kingdom > England > Leicestershire > Loughborough (0.66)

Industry: Education > Educational Setting (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.80)

Add feedback

Video Friday: Security Robot as a Service, Robotic Mining, and Saved by a Drone

IEEE Spectrum RoboticsMay-5-2018, 00:45:03 GMT

Video Friday is your weekly selection of awesome robotics videos, collected by your Automaton bloggers. We'll also be posting a weekly calendar of upcoming robotics events for the next few months; here's what we have so far (send us your events!): Let us know if you have suggestions for next week, and enjoy today's videos. Nothing is more secure than a workplace protected by prowling robots. But are the fish okay?

artificial intelligence, robot, video friday, (10 more...)

IEEE Spectrum Robotics

Country: North America > United States (0.31)

Industry:

Transportation (0.31)
Government > Regional Government > North America Government (0.30)
Energy > Power Industry (0.30)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.50)

Add feedback

iclr2016:main

#artificialintelligenceApr-18-2016, 00:42:45 GMT

The problem of building an autonomous robot has traditionally been viewed as one of integration: connecting together modular components, each one designed to handle some portion of the perception and decision making process. For example, a vision system might be connected to a planner that might in turn provide commands to a low-level controller that drives the robot's motors. In this talk, I will discuss how ideas from deep learning can allow us to build robotic control mechanisms that combine both perception and control into a single system. This system can then be trained end-to-end on the task at hand. I will show how this end-to-end approach actually simplifies the perception and control problems, by allowing the perception and control mechanisms to adapt to one another and to the task.

artificial intelligence, deep learning, machine learning, (5 more...)

#artificialintelligence

Genre: Personal > Honors (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.41)

Add feedback

Filters

Collaborating Authors

perception and control

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Unsupervised Learning of Object Keypoints for Perception and Control

Best of Sim and Real: Decoupled Visuomotor Manipulation via Learning Control in Simulation and Perception in Real

Agile in the Face of Delay: Asynchronous End-to-End Learning for Real-World Aerial Navigation

Unsupervised Learning of Object Keypoints for Perception and Control

Reviews: Unsupervised Learning of Object Keypoints for Perception and Control

Unsupervised Learning of Object Keypoints for Perception and Control

Unsupervised Learning of Object Keypoints for Perception and Control

Evolution of learning and plastic neural networks for perception and control at Loughborough University

Video Friday: Security Robot as a Service, Robotic Mining, and Saved by a Drone

iclr2016:main