AITopics | cartpole

Thank reviewers for the comments. Please find our responses below, with reference indices consistent with the paper . Q3-5: Meaning of the learned divergence? We agree that BC minimizes the policy KL divergence as what we noted in Sec. 4 (line 200). It is consistent with the literature, e.g., Sec. 2 in [Y u et al. arXiv:1909.09314].

artificial intelligence, different sample size, divergence, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.33)

Add feedback

Gradient Informed Proximal Policy Optimization

Neural Information Processing SystemsFeb-8-2026, 13:35:45 GMT

We introduce a novel policy learning method that integrates analytical gradients from differentiable environments with the Proximal Policy Optimization (PPO) algorithm.

artificial intelligence, machine learning, optimization problem, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Maryland > Prince George's County > College Park (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Off-PolicyEvaluationviatheRegularizedLagrangian

Neural Information Processing SystemsFeb-8-2026, 07:36:40 GMT

Although there are many commonalities between the various DICE estimators, their derivations are distinct and seemingly incompatible.

artificial intelligence, arxivpreprintarxiv, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning (0.70)

Add feedback

1e0f65eb20acbfb27ee05ddc000b50ec-Supplemental.pdf

ruihanyang

Neural Information Processing SystemsFeb-7-2026, 18:24:57 GMT

artificial intelligence, cartpole, machine learning, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.49)

Add feedback

Quantifying Memory Use in Reinforcement Learning with Temporal Range

Lafuente-Mercado, Rodney, Rus, Daniela, Rusch, T. Konstantin

arXiv.org Artificial IntelligenceDec-9-2025

How much does a trained RL policy actually use its past observations? We propose \emph{Temporal Range}, a model-agnostic metric that treats first-order sensitivities of multiple vector outputs across a temporal window to the input sequence as a temporal influence profile and summarizes it by the magnitude-weighted average lag. Temporal Range is computed via reverse-mode automatic differentiation from the Jacobian blocks $\partial y_s/\partial x_t\in\mathbb{R}^{c\times d}$ averaged over final timesteps $s\in\{t+1,\dots,T\}$ and is well-characterized in the linear setting by a small set of natural axioms. Across diagnostic and control tasks (POPGym; flicker/occlusion; Copy-$k$) and architectures (MLPs, RNNs, SSMs), Temporal Range (i) remains small in fully observed control, (ii) scales with the task's ground-truth lag in Copy-$k$, and (iii) aligns with the minimum history window required for near-optimal return as confirmed by window ablations. We also report Temporal Range for a compact Long Expressive Memory (LEM) policy trained on the task, using it as a proxy readout of task-level memory. Our axiomatic treatment draws on recent work on range measures, specialized here to temporal lag and extended to vector-valued outputs in the RL setting. Temporal Range thus offers a practical per-sequence readout of memory dependence for comparing agents and environments and for selecting the shortest sufficient context.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2512.06204

Genre: Research Report (0.64)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Add feedback

Filters

Collaborating Authors

cartpole

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

1bd8cfc0e4c53869b7f1d0ed4b1e78e1-Paper-Conference.pdf

1e0f65eb20acbfb27ee05ddc000b50ec-Supplemental.pdf

Mask-basedLatentReconstructionforReinforcement Learning

dc89a0709f213fd0ac4b1172719b2c38-Paper-Conference.pdf

Rob Ev

967990de5b3eac7b87d49a13c6834978-AuthorFeedback.pdf

Gradient Informed Proximal Policy Optimization

Off-PolicyEvaluationviatheRegularizedLagrangian

1e0f65eb20acbfb27ee05ddc000b50ec-Supplemental.pdf

Quantifying Memory Use in Reinforcement Learning with Temporal Range