AITopics

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence (1.00)

Neural Information Processing SystemsFeb-11-2026, 02:46:31 GMT

MoVie: Visual Model-Based Policy Adaptation for View Generalization Sizhe Y ang

Visual Reinforcement Learning (RL) agents trained on limited views face significant challenges in generalizing their learned abilities to unseen views.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Neural Information Processing SystemsOct-8-2025, 13:50:04 GMT

43b77cef2a83a25aa27d3271d209e4fd-Supplemental-Conference.pdf

artificial intelligence, machine learning, stride, (18 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.51)

Neural Information Processing SystemsOct-8-2025, 13:50:01 GMT

43b77cef2a83a25aa27d3271d209e4fd-Paper-Conference.pdf

encoder, generalization, movie, (16 more...)

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
(2 more...)

arXiv.org Artificial IntelligenceDec-16-2024

Equivariant Action Sampling for Reinforcement Learning and Planning

Zhao, Linfeng, Howell, Owen, Zhu, Xupeng, Park, Jung Yeon, Zhang, Zhewen, Walters, Robin, Wong, Lawson L. S.

Reinforcement learning (RL) algorithms for continuous control tasks require accurate sampling-based action selection. Many tasks, such as robotic manipulation, contain inherent problem symmetries. However, correctly incorporating symmetry into sampling-based approaches remains a challenge. This work addresses the challenge of preserving symmetry in sampling-based planning and control, a key component for enhancing decision-making efficiency in RL. We introduce an action sampling approach that enforces the desired symmetry. We apply our proposed method to a coordinate regression problem and show that the symmetry aware sampling method drastically outperforms the naive sampling approach. We furthermore develop a general framework for sampling-based model-based planning with Model Predictive Path Integral (MPPI). We compare our MPPI approach with standard sampling methods on several continuous control tasks.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2412.12237

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Shimizu, Yutaka, Tomizuka, Masayoshi

Bisimulation metric for Model Predictive Control

arXiv.org Artificial IntelligenceOct-6-2024

Model-based reinforcement learning has shown promise for improving sample efficiency and decision-making in complex environments. However, existing methods face challenges in training stability, robustness to noise, and computational efficiency. In this paper, we propose Bisimulation Metric for Model Predictive Control (BS-MPC), a novel approach that incorporates bisimulation metric loss in its objective function to directly optimize the encoder. This time-step-wise direct optimization enables the learned encoder to extract intrinsic information from the original state space while discarding irrelevant details and preventing the gradients and errors from diverging. BS-MPC improves training stability, robustness against input noise, and computational efficiency by reducing training time. We evaluate BS-MPC on both continuous control and image-based tasks from the DeepMind Control Suite, demonstrating superior performance and robustness compared to state-of-the-art baseline methods.

machine learning, reinforcement learning, td-mpc, (18 more...)

2410.04553

Country:

North America > United States > Virginia (0.14)
North America > United States > California > Alameda County > Berkeley (0.14)
Europe > Sweden (0.14)
Europe > France (0.14)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Industry: Energy > Oil & Gas > Upstream (0.61)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Olivares, David, Fournier, Pierre, Vasishta, Pavan, Marzat, Julien

Model-Free versus Model-Based Reinforcement Learning for Fixed-Wing UAV Attitude Control Under Varying Wind Conditions

arXiv.org Artificial IntelligenceSep-26-2024

This paper evaluates and compares the performance of model-free and model-based reinforcement learning for the attitude control of fixed-wing unmanned aerial vehicles using PID as a reference point. The comparison focuses on their ability to handle varying flight dynamics and wind disturbances in a simulated environment. Our results show that the Temporal Difference Model Predictive Control agent outperforms both the PID controller and other model-free reinforcement learning methods in terms of tracking accuracy and robustness over different reference difficulties, particularly in nonlinear flight regimes. Furthermore, we introduce actuation fluctuation as a key metric to assess energy efficiency and actuator wear, and we test two different approaches from the literature: action variation penalty and conditioning for action policy smoothness. We also evaluate all control methods when subject to stochastic turbulence and gusts separately, so as to measure their effects on tracking performance, observe their limitations and outline their implications on the Markov decision process formalism.

controller, machine learning, reinforcement learning, (14 more...)

2409.17896

Country: Europe (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Aerospace & Defense > Aircraft (1.00)
Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.88)

Asri, Zakariae El, Sigaud, Olivier, Thome, Nicolas

Physics-Informed Model and Hybrid Planning for Efficient Dyna-Style Reinforcement Learning

arXiv.org Artificial IntelligenceJul-2-2024

Applying reinforcement learning (RL) to real-world applications requires addressing a trade-off between asymptotic performance, sample efficiency, and inference time. In this work, we demonstrate how to address this triple challenge by leveraging partial physical knowledge about the system dynamics. Our approach involves learning a physics-informed model to boost sample efficiency and generating imaginary trajectories from this model to learn a model-free policy and Q-function. Furthermore, we propose a hybrid planning strategy, combining the learned policy and Q-function with the learned model to enhance time efficiency in planning. Through practical demonstrations, we illustrate that our method improves the compromise between sample efficiency, time efficiency, and performance over state-of-the-art methods. Code is available at https://github.com/elasriz/PHIHP/

efficiency, sample efficiency, trajectory, (15 more...)

2407.02217

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > Promising Solution (0.48)
Research Report > Experimental Study (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceFeb-8-2024

DiffTOP: Differentiable Trajectory Optimization for Deep Reinforcement and Imitation Learning

Wan, Weikang, Wang, Yufei, Erickson, Zackory, Held, David

This paper introduces DiffTOP, which utilizes Differentiable Trajectory OPtimization as the policy representation to generate actions for deep reinforcement and imitation learning. Trajectory optimization is a powerful and widely used algorithm in control, parameterized by a cost and a dynamics function. The key to our approach is to leverage the recent progress in differentiable trajectory optimization, which enables computing the gradients of the loss with respect to the parameters of trajectory optimization. As a result, the cost and dynamics functions of trajectory optimization can be learned end-to-end. DiffTOP addresses the ``objective mismatch'' issue of prior model-based RL algorithms, as the dynamics model in DiffTOP is learned to directly maximize task performance by differentiating the policy gradient loss through the trajectory optimization process. We further benchmark DiffTOP for imitation learning on standard robotic manipulation task suites with high-dimensional sensory observations and compare our method to feed-forward policy classes as well as Energy-Based Models (EBM) and Diffusion. Across 15 model-based RL tasks and 13 imitation learning tasks with high-dimensional image and point cloud inputs, DiffTOP outperforms prior state-of-the-art methods in both domains.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2402.05421

Genre: Research Report > Promising Solution (0.34)

Industry: Energy > Oil & Gas (0.31)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Feng, Yunhai, Hansen, Nicklas, Xiong, Ziyan, Rajagopalan, Chandramouli, Wang, Xiaolong

Finetuning Offline World Models in the Real World

arXiv.org Artificial IntelligenceOct-24-2023

Reinforcement Learning (RL) is notoriously data-inefficient, which makes training on a real robot difficult. While model-based RL algorithms (world models) improve data-efficiency to some extent, they still require hours or days of interaction to learn skills. Recently, offline RL has been proposed as a framework for training RL policies on pre-existing datasets without any online interaction. However, constraining an algorithm to a fixed dataset induces a state-action distribution shift between training and inference, and limits its applicability to new tasks. In this work, we seek to get the best of both worlds: we consider the problem of pretraining a world model with offline data collected on a real robot, and then finetuning the model on online data collected by planning with the learned model. To mitigate extrapolation errors during online interaction, we propose to regularize the planner at test-time by balancing estimated returns and (epistemic) model uncertainty. We evaluate our method on a variety of visuo-motor control tasks in simulation and on a real robot, and find that our method enables few-shot finetuning to seen and unseen tasks even when offline data is limited. Videos, code, and data are available at https://yunhaifeng.com/FOWM .

environment step, machine learning, reinforcement learning, (17 more...)

2310.16029

Country: North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (0.46)
Energy > Oil & Gas (0.32)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)