AITopics | halfcheetah

ASimple Solution for Offline Imitation from Observations and Examples with Possibly Incomplete Trajectories

Neural Information Processing SystemsApr-24-2026, 05:29:30 GMT

Offline imitation from observations aims to solve MDPs where only task-specific expert states and task-agnostic non-expert state-action pairs are available. Offline imitation is useful in real-world scenarios where arbitrary interactions are costly and expert actions are unavailable. The state-of-the-art'DIstribution Correction Estimation' (DICE) methods minimize divergence of state occupancy between expert and learner policies and retrieve a policy with weighted behavior cloning; however, their results are unstable when learning from incomplete trajectories, due to a non-robust optimization in the dual domain. To address the issue, in this paper, we propose Trajectory-Aware Imitation Learning from Observations (TAILO). TAILO uses a discounted sum along the future trajectory as the weight for weighted behavior cloning. The terms for the sum are scaled by the output of a discriminator, which aims to identify expert states. Despite simplicity, TAILO works well if there exist trajectories or segments of expert behavior in the task-agnostic data, a common assumption in prior work. In experiments across multiple testbeds, we find TAILO to be more robust and effective, particularly with incomplete trajectories.

machine learning, natural language, trajectory, (17 more...)

Neural Information Processing Systems

Genre:

Instructional Material (0.46)
Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Diagnosing Non-Markovian Observations in Reinforcement Learning via Prediction-Based Violation Scoring

Mysore, Naveen

arXiv.org Machine LearningMar-31-2026

Reinforcement learning algorithms assume that observations satisfy the Markov property, yet real-world sensors frequently violate this assumption through correlated noise, latency, or partial observability. Standard performance metrics conflate Markov breakdowns with other sources of suboptimality, leaving practitioners without diagnostic tools for such violations. This paper introduces a prediction-based scoring method that quantifies non-Markovian structure in observation trajectories. A random forest first removes nonlinear Markov-compliant dynamics; ridge regression then tests whether historical observations reduce prediction error on the residuals beyond what the current observation provides. The resulting score is bounded in [0, 1] and requires no causal graph construction. Evaluation spans six environments (CartPole, Pendulum, Acrobot, HalfCheetah, Hopper, Walker2d), three algorithms (PPO, A2C, SAC), controlled AR(1) noise at six intensity levels, and 10 seeds per condition. In post-hoc detection, 7 of 16 environment-algorithm pairs, primarily high-dimensional locomotion tasks, show significant positive monotonicity between noise intensity and the violation score (Spearman rho up to 0.78, confirmed under repeated-measures analysis); under training-time noise, 13 of 16 pairs exhibit statistically significant reward degradation. An inversion phenomenon is documented in low-dimensional environments where the random forest absorbs the noise signal, causing the score to decrease as true violations grow, a failure mode analyzed in detail. A practical utility experiment demonstrates that the proposed score correctly identifies partial observability and guides architecture selection, fully recovering performance lost to non-Markovian observations. Source code to reproduce all results is provided at https://github.com/NAVEENMN/Markovianes.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2603.27389

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)

Add feedback

Appendix Tableof Contents

Neural Information Processing SystemsFeb-19-2026, 11:24:51 GMT

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.38)
Information Technology > Artificial Intelligence > Natural Language (0.33)

Add feedback

8078e8c3055303a884ffae2d3ea00338-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-19-2026, 06:53:17 GMT

halfcheetah, implementation, transition function, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.31)

Add feedback

e04101138a3c94544760c1dbdf2c7a2d-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 10:24:20 GMT

arxiv preprint arxiv, large language model, machine learning, (20 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)

Add feedback

A Hyperparameter Settings of RD

Neural Information Processing SystemsFeb-15-2026, 13:03:00 GMT

In this section, we describe details about hyperparameter setting of RD. SAC-N-Unc and TD3-N-Unc, M is set to 1/10 of the total training steps. To ensure fairness, algorithms employing RD are implemented using CORL repository [54]. By modifying the original SAC/TD3 algorithm to employ a critic ensemble of number N and incorporate an uncertainty regularization term within the policy update process, we derive these backbone algorithms. Additionally, using RD with fewer Q ensembles can achieve similar or even better results than the backbone methods using more Q ensembles, indicating its potential in reducing computing resource consumption.

algorithm, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.48)

Add feedback

Unsupervised Skill Discoveryvia Recurrent Skill Training

Neural Information Processing SystemsFeb-13-2026, 04:26:50 GMT

Thus written = argm JR( ).

artificial intelligence, arxivpreprintarxiv, machine learning, (11 more...)

Neural Information Processing Systems

Country: Asia > China > Shanghai > Shanghai (0.04)

Genre: Instructional Material (0.42)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.49)

Add feedback

Time-Constrained Robust MDPs

Neural Information Processing SystemsFeb-11-2026, 16:58:09 GMT

Traditional robust reinforcement learning often depends on rectangularity assumptions, where adverse probability measures of outcome states are assumed to be independent across different states and actions.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
Europe > Portugal > Braga > Braga (0.04)
Europe > France > Île-de-France (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

a862f5788fd09bb6843c694d8120d50c-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 05:31:27 GMT

dimension, implementation, transition, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Appendices ASketchofTheoreticalAnalyses

Neural Information Processing SystemsFeb-10-2026, 19:51:29 GMT

Theorem B.1 (Performance difference bound for Model-based RL). Mi denote the inconsistency between the learned dynamics PMi and the true dynamics, i.e. ϵ For L1 L3, with the performance gap approximation of M1 and π1, we apply Lemma C.2, and Here, dπMi denotes the distribution of state-action pair induced by policy π under the dynamical modelMi. Theorem B.3 (Refined bound with constraints). Let µ and v be two probability distributions on the configuration space X, according to LemmaC.1,thenwehaveDTV(µ Under these definitions, we can yield the following intermediate outcome by applying the results from B.2and B.1 Here, we take the time-varying linear quadratic regulator as an instance for illustrating the rationality of our assumption on α.

artificial intelligence, machine learning, pm2, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

Filters

Collaborating Authors

halfcheetah

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

ASimple Solution for Offline Imitation from Observations and Examples with Possibly Incomplete Trajectories

Diagnosing Non-Markovian Observations in Reinforcement Learning via Prediction-Based Violation Scoring

Appendix Tableof Contents

8078e8c3055303a884ffae2d3ea00338-Supplemental-Conference.pdf

e04101138a3c94544760c1dbdf2c7a2d-Paper-Conference.pdf

A Hyperparameter Settings of RD

Unsupervised Skill Discoveryvia Recurrent Skill Training

Time-Constrained Robust MDPs

a862f5788fd09bb6843c694d8120d50c-Supplemental-Conference.pdf

Appendices ASketchofTheoreticalAnalyses