AITopics | pilco

Collaborating Authors

pilco

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Data-Efficient Reinforcement Learning in Continuous State-Action Gaussian-POMDPs

Neural Information Processing SystemsMar-17-2026, 14:36:44 GMT

We present a data-efficient reinforcement learning method for continuous state-action systems under significant observation noise. Data-efficient solutions under small noise exist, such as PILCO which learns the cartpole swing-up task in 30s. PILCO evaluates policies by planning state-trajectories using a dynamics model. However, PILCO applies policies to the observed state, therefore planning in observation space. We extend PILCO with filtering to instead plan in belief space, consistent with partially observable Markov decisions process (POMDP) planning. This enables data-efficient learning under significant observation noise, outperforming more naive methods such as post-hoc application of a filter to policies optimised by the original (unfiltered) PILCO algorithm. We test our method on the cartpole swing-up task, which involves nonlinear dynamics and requires nonlinear control.

artificial intelligence, machine learning, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

677e09724f0e2df9b6c000b75b5da10d-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-12-2026, 10:37:26 GMT

latent state, ppo, reviewer, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.56)

Add feedback

Total stochastic gradient algorithms and applications in reinforcement learning

Paavo Parmas

Neural Information Processing SystemsFeb-12-2026, 05:41:05 GMT

Neural Information Processing Systems http://nips.cc/

estimator, gradient, gradient estimator, (15 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Kyūshū & Okinawa > Okinawa (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.41)

Add feedback

Data-Efficient Reinforcement Learning in Continuous State-Action Gaussian-POMDPs

Neural Information Processing SystemsNov-21-2025, 15:22:11 GMT

continuous state-action gaussian-pomdp, data-efficient reinforcement learning, name change, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

5eac43aceba42c8757b54003a58277b5-Paper.pdf

Neural Information Processing SystemsNov-21-2025, 09:40:38 GMT

algorithm, noise, pilco, (17 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

Total stochastic gradient algorithms and applications in reinforcement learning

Paavo Parmas

Neural Information Processing SystemsNov-20-2025, 14:22:55 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Kyūshū & Okinawa > Okinawa (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.41)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsOct-3-2025, 02:22:46 GMT

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The proposed approach, while straightforward, quite elegantly handles the problem at hand. What prevents this paper from being a clear cut acceptance is the lack of adequate experimental validation. Typos line 47: draw -> drawn A more thorough discussion of noise in the exploration step of Algorithm 1 (step 8) would be appreciated. This issue is also not discussed in the experiments section (how much noise was used?). I also had a few issues with some of the claimed advantages in the paper. Specifically: (1) The claim that PDDP has an advantage over PILCO since it does not have to solve non-convex optimization problems seems suspect given the non-convexity of the optimization problem solved in the hyper-parameter tuning step.

pddp, pilco, variance, (13 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback

677e09724f0e2df9b6c000b75b5da10d-AuthorFeedback.pdf

Neural Information Processing SystemsOct-2-2025, 21:43:48 GMT

We thank all reviewers for their constructive and helpful comments that will allow us to better shape this paper. We are very thankful for the review and will definitely increase our plot sizes in the final version in case of acceptance. Also, thank you for pointing out real-world experiments. In fact, we plan to take our approach to robotics in the future. We believe self-driving cars present an ideal test-bed for our algorithm.

artificial intelligence, ppo, reviewer, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.56)

Add feedback

Probabilistic Differential Dynamic Programming

Yunpeng Pan, Evangelos Theodorou

Neural Information Processing SystemsFeb-9-2025, 10:02:36 GMT

We present a data-driven, probabilistic trajectory optimization framework for systems with unknown dynamics, called Probabilistic Differential Dynamic Programming (PDDP). PDDP takes into account uncertainty explicitly for dynamics models using Gaussian processes (GPs). Based on the second-order local approximation of the value function, PDDP performs Dynamic Programming around a nominal trajectory in Gaussian belief spaces. Different from typical gradientbased policy search methods, PDDP does not require a policy parameterization and learns a locally optimal, time-varying control policy. We demonstrate the effectiveness and efficiency of the proposed algorithm using two nontrivial tasks. Compared with the classical DDP and a state-of-the-art GP-based policy search method, PDDP offers a superior combination of data-efficiency, learning speed, and applicability.

artificial intelligence, optimization problem, trajectory, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Georgia > Fulton County > Atlanta (0.04)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

Reviews: Total stochastic gradient algorithms and applications in reinforcement learning

Neural Information Processing SystemsOct-7-2024, 05:01:13 GMT

This paper provides another formalism for gradient estimation in probabilistic computation graphs. Using pathwise derivative and likelihood ratio estimators, existing and well-known policy gradient theorems are cast into the proposed formalism. This intuition is then used to propose two new methods for gradient estimation that can be used in a model-based RL framework. Some results are shown that demonstrate comparable results to PILCO on the cart-pole task. Quality: the idea in this work is interesting, and the proposed framework and methods may prove useful in RL settings.

estimator, stochastic gradient algorithm and application, total stochastic gradient algorithm, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback