AITopics | single policy

Collaborating Authors

single policy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Agility Meets Stability: Versatile Humanoid Control with Heterogeneous Data

Pan, Yixuan, Qiao, Ruoyi, Chen, Li, Chitta, Kashyap, Pan, Liang, Mai, Haoguang, Bu, Qingwen, Zhao, Hao, Zheng, Cunyuan, Luo, Ping, Li, Hongyang

arXiv.org Artificial IntelligenceNov-25-2025

Humanoid robots are envisioned to perform a wide range of tasks in human-centered environments, requiring controllers that combine agility with robust balance. Recent advances in locomotion and whole-body tracking have enabled impressive progress in either agile dynamic skills or stability-critical behaviors, but existing methods remain specialized, focusing on one capability while compromising the other. In this work, we introduce AMS (Agility Meets Stability), the first framework that unifies both dynamic motion tracking and extreme balance maintenance in a single policy. Our key insight is to leverage heterogeneous data sources: human motion capture datasets that provide rich, agile behaviors, and physically constrained synthetic balance motions that capture stability configurations. To reconcile the divergent optimization goals of agility and stability, we design a hybrid reward scheme that applies general tracking objectives across all data while injecting balance-specific priors only into synthetic motions. Further, an adaptive learning strategy with performance-driven sampling and motion-specific reward shaping enables efficient training across diverse motion distributions. We validate AMS extensively in simulation and on a real Unitree G1 humanoid. Experiments demonstrate that a single policy can execute agile skills such as dancing and running, while also performing zero-shot extreme balance motions like Ip Man's Squat, highlighting AMS as a versatile control paradigm for future humanoid applications.

artificial intelligence, balance motion, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2511.17373

Country: Asia (0.46)

Genre: Research Report (0.40)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.36)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.35)

Add feedback

Learning to Act Through Contact: A Unified View of Multi-Task Robot Learning

Omar, Shafeef, Khadiv, Majid

arXiv.org Artificial IntelligenceOct-7-2025

We present a unified framework for multi-task locomotion and manipulation policy learning grounded in a contact-explicit representation. Instead of designing different policies for different tasks, our approach unifies the definition of a task through a sequence of contact goals-desired contact positions, timings, and active end-effectors. This enables leveraging the shared structure across diverse contact-rich tasks, leading to a single policy that can perform a wide range of tasks. In particular, we train a goal-conditioned reinforcement learning (RL) policy to realise given contact plans. We validate our framework on multiple robotic embodiments and tasks: a quadruped performing multiple gaits, a humanoid performing multiple biped and quadrupedal gaits, and a humanoid executing different bimanual object manipulation tasks. Each of these scenarios is controlled by a single policy trained to execute different tasks grounded in contacts, demonstrating versatile and robust behaviours across morphologically distinct systems. Our results show that explicit contact reasoning significantly improves generalisation to unseen scenarios, positioning contact-explicit policy learning as a promising foundation for scalable loco-manipulation.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2510.03599

Country: Europe > Germany (0.14)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.34)

Add feedback

Prescribe-then-Select: Adaptive Policy Selection for Contextual Stochastic Optimization

Iglesias, Caio de Prospero, Carballo, Kimberly Villalobos, Bertsimas, Dimitris

arXiv.org Machine LearningSep-11-2025

We address the problem of policy selection in contextual stochastic optimization (CSO), where covariates are available as contextual information and decisions must satisfy hard feasibility constraints. In many CSO settings, multiple candidate policies--arising from different modeling paradigms--exhibit heterogeneous performance across the covariate space, with no single policy uniformly dominating. We propose Prescribe-then-Select (PS), a modular framework that first constructs a library of feasible candidate policies and then learns a meta-policy to select the best policy for the observed covariates. We implement the meta-policy using ensembles of Optimal Policy Trees trained via cross-validation on the training set, making policy choice entirely data-driven. Across two benchmark CSO problems--single-stage newsvendor and two-stage shipment planning--PS consistently outperforms the best single policy in heterogeneous regimes of the covariate space and converges to the dominant policy when such heterogeneity is absent. All the code to reproduce the results can be found at https://anonymous.4open.science/r/Prescribe-then-Select-TMLR.

candidate policy, covariate space, optimization, (15 more...)

arXiv.org Machine Learning

2509.08194

Country:

North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

$\texttt{FORM}$: Learning Expressive and Transferable First-Order Logic Reward Machines

Ardon, Leo, Furelos-Blanco, Daniel, Parać, Roko, Russo, Alessandra

arXiv.org Artificial IntelligenceDec-31-2024

Reward machines (RMs) are an effective approach for addressing non-Markovian rewards in reinforcement learning (RL) through finite-state machines. Traditional RMs, which label edges with propositional logic formulae, inherit the limited expressivity of propositional logic. This limitation hinders the learnability and transferability of RMs since complex tasks will require numerous states and edges. To overcome these challenges, we propose First-Order Reward Machines ($\texttt{FORM}$s), which use first-order logic to label edges, resulting in more compact and transferable RMs. We introduce a novel method for $\textbf{learning}$ $\texttt{FORM}$s and a multi-agent formulation for $\textbf{exploiting}$ them and facilitate their transferability, where multiple agents collaboratively learn policies for a shared $\texttt{FORM}$. Our experimental results demonstrate the scalability of $\texttt{FORM}$s with respect to traditional RMs. Specifically, we show that $\texttt{FORM}$s can be effectively learnt for tasks where traditional RM learning approaches fail. We also show significant improvements in learning speed and task transferability thanks to the multi-agent learning framework and the abstraction provided by the first-order language.

checkpoint, ground atom, learning, (15 more...)

arXiv.org Artificial Intelligence

2501.00364

Country:

North America > United States > Michigan > Wayne County > Detroit (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Optimizing adaptive sampling via Policy Ranking

Nadeem, Hassan, Shukla, Diwakar

arXiv.org Machine LearningOct-19-2024

Efficient sampling in biomolecular simulations is critical for accurately capturing the complex dynamical behaviors of biological systems. Adaptive sampling techniques aim to improve efficiency by focusing computational resources on the most relevant regions of phase space. In this work, we present a framework for identifying the optimal sampling policy through metric driven ranking. Our approach systematically evaluates the policy ensemble and ranks the policies based on their ability to explore the conformational space effectively. Through a series of biomolecular simulation case studies, we demonstrate that choice of a different adaptive sampling policy at each round significantly outperforms single policy sampling, leading to faster convergence and improved sampling performance. This approach takes an ensemble of adaptive sampling policies and identifies the optimal policy for the next round based on current data. Beyond presenting this ensemble view of adaptive sampling, we also propose two sampling algorithms that approximate this ranking framework on the fly. The modularity of this framework allows incorporation of any adaptive sampling policy making it versatile and suitable as a comprehensive adaptive sampling scheme.

artificial intelligence, machine learning, simulation, (16 more...)

arXiv.org Machine Learning

2410.15259

Country: North America > United States > Illinois > Champaign County > Urbana (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.67)
Government > Regional Government > North America Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Add feedback

Improving the Generalization of Unseen Crowd Behaviors for Reinforcement Learning based Local Motion Planners

Ng, Wen Zheng Terence, Chen, Jianda, Pan, Sinno Jialin, Zhang, Tianwei

arXiv.org Artificial IntelligenceOct-16-2024

Deploying a safe mobile robot policy in scenarios with human pedestrians is challenging due to their unpredictable movements. Current Reinforcement Learning-based motion planners rely on a single policy to simulate pedestrian movements and could suffer from the over-fitting issue. Alternatively, framing the collision avoidance problem as a multi-agent framework, where agents generate dynamic movements while learning to reach their goals, can lead to conflicts with human pedestrians due to their homogeneity. To tackle this problem, we introduce an efficient method that enhances agent diversity within a single policy by maximizing an information-theoretic objective. This diversity enriches each agent's experiences, improving its adaptability to unseen crowd behaviors. In assessing an agent's robustness against unseen crowds, we propose diverse scenarios inspired by pedestrian crowd behaviors. Our behavior-conditioned policies outperform existing works in these challenging scenes, reducing potential collisions without additional time or travel.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICRA57147.2024.10610641

2410.12232

Country:

Asia > Singapore (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Game and Reference: Policy Combination Synthesis for Epidemic Prevention and Control

Tan, Zhiyi, Bao, Bingkun

arXiv.org Artificial IntelligenceMar-15-2024

In recent years, epidemic policy-making models are increasingly being used to provide reference for governors on prevention and control policies against catastrophic epidemics such as SARS, H1N1 and COVID-19. Existing studies are currently constrained by two issues: First, previous methods develop policies based on effect evaluation, since few of factors in real-world decision-making can be modeled, the output policies will then easily become extreme. Second, the subjectivity and cognitive limitation of human make the historical policies not always optimal for the training of decision models. To these ends, we present a novel Policy Combination Synthesis (PCS) model for epidemic policy-making. Specially, to prevent extreme decisions, we introduce adversarial learning between the model-made policies and the real policies to force the output policies to be more human-liked. On the other hand, to minimize the impact of sub-optimal historical policies, we employ contrastive learning to let the model draw on experience from the best historical policies under similar scenarios. Both adversarial and contrastive learning are adaptive based on the comprehensive effects of real policies to ensure the model always learns useful information. Extensive experiments on real-world data prove the effectiveness of the proposed model.

adversarial module, learning, module, (16 more...)

arXiv.org Artificial Intelligence

2403.10744

Country:

North America > United States > Minnesota (0.05)
North America > United States > Indiana (0.05)
North America > United States > Oklahoma (0.05)
(6 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Curriculum-Based Reinforcement Learning for Quadrupedal Jumping: A Reference-free Design

Atanassov, Vassil, Ding, Jiatao, Kober, Jens, Havoutis, Ioannis, Della Santina, Cosimo

arXiv.org Artificial IntelligenceJan-29-2024

Deep reinforcement learning (DRL) has emerged as a promising solution to mastering explosive and versatile quadrupedal jumping skills. However, current DRL-based frameworks usually rely on well-defined reference trajectories, which are obtained by capturing animal motions or transferring experience from existing controllers. This work explores the possibility of learning dynamic jumping without imitating a reference trajectory. To this end, we incorporate a curriculum design into DRL so as to accomplish challenging tasks progressively. Starting from a vertical in-place jump, we then generalize the learned policy to forward and diagonal jumps and, finally, learn to jump across obstacles. Conditioned on the desired landing location, orientation, and obstacle dimensions, the proposed approach contributes to a wide range of jumping motions, including omnidirectional jumping and robust jumping, alleviating the effort to extract references in advance. Particularly, without constraints from the reference motion, a 90cm forward jump is achieved, exceeding previous records for similar robots reported in the existing literature. Additionally, continuous jumping on the soft grassy floor is accomplished, even when it is not encountered in the training stage. A supplementary video showing our results can be found at https://youtu.be/nRaMCrwU5X8 .

forward jump, obstacle, robot, (14 more...)

arXiv.org Artificial Intelligence

2401.16337

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Netherlands > South Holland > Delft (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Germany (0.04)

Genre: Research Report (1.00)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Iterative Option Discovery for Planning, by Planning

Young, Kenny, Sutton, Richard S.

arXiv.org Artificial IntelligenceDec-22-2023

Discovering useful temporal abstractions, in the form of options, is widely thought to be key to applying reinforcement learning and planning to increasingly complex domains. Building on the empirical success of the Expert Iteration approach to policy learning used in AlphaZero, we propose Option Iteration, an analogous approach to option discovery. Rather than learning a single strong policy that is trained to match the search results everywhere, Option Iteration learns a set of option policies trained such that for each state encountered, at least one policy in the set matches the search results for some horizon into the future. Intuitively, this may be significantly easier as it allows the algorithm to hedge its bets compared to learning a single globally strong policy, which may have complex dependencies on the details of the current state. Having learned such a set of locally strong policies, we can use them to guide the search algorithm resulting in a virtuous cycle where better options lead to better search results which allows for training of better options. We demonstrate experimentally that planning using options learned with Option Iteration leads to a significant benefit in challenging planning environments compared to an analogous planning algorithm operating in the space of primitive actions and learning a single rollout policy with Expert Iteration.

iteration, optit, single policy, (17 more...)

arXiv.org Artificial Intelligence

2310.01569

Country: North America > Canada > Alberta (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.88)

Add feedback

ManyQuadrupeds: Learning a Single Locomotion Policy for Diverse Quadruped Robots

Shafiee, Milad, Bellegarda, Guillaume, Ijspeert, Auke

arXiv.org Artificial IntelligenceOct-16-2023

Learning a locomotion policy for quadruped robots has traditionally been constrained to specific robot morphology, mass, and size. The learning process must usually be repeated for every new robot, where hyperparameters and reward function weights must be re-tuned to maximize performance for each new system. Alternatively, attempting to train a single policy to accommodate different robot sizes, while maintaining the same degrees of freedom (DoF) and morphology, requires either complex learning frameworks, or mass, inertia, and dimension randomization, which leads to prolonged training periods. In our study, we show that drawing inspiration from animal motor control allows us to effectively train a single locomotion policy capable of controlling a diverse range of quadruped robots. These differences encompass a variable number of DoFs, (i.e. 12 or 16 joints), three distinct morphologies, a broad mass range spanning from 2 kg to 200 kg, and nominal standing heights ranging from 16 cm to 100 cm. Our policy modulates a representation of the Central Pattern Generator (CPG) in the spinal cord, effectively coordinating both frequencies and amplitudes of the CPG to produce rhythmic output (Rhythm Generation), which is then mapped to a Pattern Formation (PF) layer. Across different robots, the only varying component is the PF layer, which adjusts the scaling parameters for the stride height and length. Subsequently, we evaluate the sim-to-real transfer by testing the single policy on both the Unitree Go1 and A1 robots. Remarkably, we observe robust performance, even when adding a 15 kg load, equivalent to 125% of the A1 robot's nominal mass.

locomotion, morphology, robot, (16 more...)

arXiv.org Artificial Intelligence

2310.10486

Country:

Europe > Switzerland > Vaud > Lausanne (0.04)
Asia > Japan > Kyūshū & Okinawa > Kyūshū > Fukuoka Prefecture > Fukuoka (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.66)

Technology: Information Technology > Artificial Intelligence > Robots > Locomotion (1.00)

Add feedback