AITopics | episode reward

Collaborating Authors

episode reward

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

f0fae49cdfab57c41c30c9b0244093cb-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 19:33:35 GMT

Finally, based on the feedback of one reviewer, we considered a state of the art meta-learning algorithm, PEARL [1].

artificial intelligence, encodingsize 4 4 4 4, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.90)

Add feedback

2df45244f09369e16ea3f9117ca45157-Supplemental.pdf

Neural Information Processing SystemsFeb-7-2026, 23:13:58 GMT

agent, demonstration, trajectory, (16 more...)

Neural Information Processing Systems

Country: North America > Canada (0.04)

Genre: Workflow (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.69)

Add feedback

2df45244f09369e16ea3f9117ca45157-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 23:13:50 GMT

agent, demonstration, trajectory, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan (0.04)
North America > Canada (0.04)

Industry: Leisure & Entertainment > Games (0.47)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Quantum-Enhanced Reinforcement Learning for Accelerating Newton-Raphson Convergence with Ising Machines: A Case Study for Power Flow Analysis

Kaseb, Zeynab, Moller, Matthias, Spoor, Lindsay, Guo, Jerry J., Xiang, Yu, Palensky, Peter, Vergara, Pedro P.

arXiv.org Artificial IntelligenceNov-26-2025

The Newton-Raphson (NR) method is widely used for solving power flow (PF) equations due to its quadratic convergence. However, its performance deteriorates under poor initialization or extreme operating scenarios, e.g., high levels of renewable energy penetration. Traditional NR initialization strategies often fail to address these challenges, resulting in slow convergence or even divergence. We propose the use of reinforcement learning (RL) to optimize the initialization of NR, and introduce a novel quantum-enhanced RL environment update mechanism to mitigate the significant computational cost of evaluating power system states over a combinatorially large action space at each RL timestep by formulating the voltage adjustment task as a quadratic unconstrained binary optimization problem. Specifically, quantum/digital annealers are integrated into the RL environment update to evaluate state transitions using a problem Hamiltonian designed for PF. Results demonstrate significant improvements in convergence speed, a reduction in NR iteration counts, and enhanced robustness under different operating conditions.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2511.20237

Country: Europe > Netherlands > South Holland (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Energy > Renewable (1.00)
Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

Add feedback

Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation

Yuhuai Wu, Elman Mansimov, Roger B. Grosse, Shun Liao, Jimmy Ba

Neural Information Processing SystemsNov-21-2025, 07:08:24 GMT

In this work, we propose to apply trust region optimization to deep reinforcement learning using a recently proposed Kronecker-factored approximation to the curvature.

approximation, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.15)
Asia > Middle East > Jordan (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Industry: Leisure & Entertainment > Games (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.30)

Add feedback

Memory Based Trajectory-conditioned Policies for Learning from Sparse Rewards

Neural Information Processing SystemsOct-2-2025, 13:58:05 GMT

When the index of the agent's last visited state embedding in the demonstration

agent, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Genre: Workflow (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.69)

Add feedback

Memory Based Trajectory-conditioned Policies for Learning from Sparse Rewards

Neural Information Processing SystemsOct-2-2025, 13:57:57 GMT

Recent work demonstrated that using a memory buffer of previous successful trajectories can result in more effective policies. However, existing methods may overly exploit past successful experiences, which can encourage the agent to adopt sub-optimal and myopic behaviors.

machine learning, reinforcement learning, trajectory, (15 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (0.47)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

One Net to Rule Them All: Domain Randomization in Quadcopter Racing Across Different Platforms

Ferede, Robin, Blaha, Till, Lucassen, Erin, De Wagter, Christophe, de Croon, Guido C. H. E.

arXiv.org Artificial IntelligenceMay-1-2025

In high-speed quadcopter racing, finding a single controller that works well across different platforms remains challenging. This work presents the first neural network controller for drone racing that generalizes across physically distinct quadcopters. We demonstrate that a single network, trained with domain randomization, can robustly control various types of quadcopters. The network relies solely on the current state to directly compute motor commands. The effectiveness of this generalized controller is validated through real-world tests on two substantially different crafts (3-inch and 5-inch race quadcopters). We further compare the performance of this generalized controller with controllers specifically trained for the 3-inch and 5-inch drone, using their identified model parameters with varying levels of domain randomization (0%, 10%, 20%, 30%). While the generalized controller shows slightly slower speeds compared to the fine-tuned models, it excels in adaptability across different platforms. Our results show that no randomization fails sim-to-real transfer while increasing randomization improves robustness but reduces speed. Despite this trade-off, our findings highlight the potential of domain randomization for generalizing controllers, paving the way for universal AI controllers that can adapt to any platform.

artificial intelligence, machine learning, randomization, (14 more...)

arXiv.org Artificial Intelligence

2504.21586

Country: Europe > Netherlands (0.14)

Genre: Research Report > New Finding (0.88)

Industry: Transportation > Air (0.90)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.90)

Add feedback

CurricuVLM: Towards Safe Autonomous Driving via Personalized Safety-Critical Curriculum Learning with Vision-Language Models

Sheng, Zihao, Huang, Zilin, Qu, Yansong, Leng, Yue, Bhavanam, Sruthi, Chen, Sikai

arXiv.org Artificial IntelligenceFeb-20-2025

Ensuring safety in autonomous driving systems remains a critical challenge, particularly in handling rare but potentially catastrophic safety-critical scenarios. While existing research has explored generating safety-critical scenarios for autonomous vehicle (AV) testing, there is limited work on effectively incorporating these scenarios into policy learning to enhance safety. Furthermore, developing training curricula that adapt to an AV's evolving behavioral patterns and performance bottlenecks remains largely unexplored. To address these challenges, we propose CurricuVLM, a novel framework that leverages Vision-Language Models (VLMs) to enable personalized curriculum learning for autonomous driving agents. Our approach uniquely exploits VLMs' multimodal understanding capabilities to analyze agent behavior, identify performance weaknesses, and dynamically generate tailored training scenarios for curriculum adaptation. Through comprehensive analysis of unsafe driving situations with narrative descriptions, CurricuVLM performs in-depth reasoning to evaluate the AV's capabilities and identify critical behavioral patterns. The framework then synthesizes customized training scenarios targeting these identified limitations, enabling effective and personalized curriculum learning. Extensive experiments on the Waymo Open Motion Dataset show that CurricuVLM outperforms state-of-the-art baselines across both regular and safety-critical scenarios, achieving superior performance in terms of navigation success, driving efficiency, and safety metrics. Further analysis reveals that CurricuVLM serves as a general approach that can be integrated with various RL algorithms to enhance autonomous driving systems. The code and demo video are available at: https://zihaosheng.github.io/CurricuVLM/.

agent, safety-critical scenario, scenario, (15 more...)

arXiv.org Artificial Intelligence

2502.15119

Country:

North America > United States > Wisconsin > Dane County > Madison (0.14)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
North America > United States > California > Santa Clara County > Sunnyvale (0.04)

Genre:

Research Report > New Finding (0.92)
Instructional Material > Course Syllabus & Notes (0.66)

Industry:

Transportation > Ground > Road (1.00)
Information Technology (1.00)
Education (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Appendix

Neural Information Processing SystemsFeb-10-2025, 20:16:31 GMT

The numbers in bold denote a significant statistical difference between the two methods (p-value < 0.001, paired t-test). We also list the IID (Table T6) and OOD (Tables T7, T8 and T9) test results of all the agents trained for this work. Some negative values should not surprise the reader, as some agents, when tested way outside of the training distribution, fail to walk, collecting more penalties (e.g., due to undesired contact force or excessive energy expenditure) than positive reward. We also show the graphs of the reward as a function for different perturbation intensity for the end-to-end trained Oracle, DMAP and TCN (Figure F2). Generally, DMAP performs similarly to the Oracle, while the TCN has lower performance especially for more challenging morphologies (Ant, Walker).

artificial intelligence, machine learning, morphology, (17 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (0.88)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback