AITopics | rpg

Country: North America > United States (0.06)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.41)

Neural Information Processing SystemsFeb-16-2026, 19:26:39 GMT

Policy Gradient for Rectangular Robust Markov Decision Processes

However, they do not account for transition uncertainty, whereas learning robust policies can be computationally expensive.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Country:

North America > United States (0.06)
North America > Canada > Quebec > Montreal (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.41)

Neural Information Processing SystemsFeb-7-2026, 07:47:04 GMT

040ace837dd270a87055bb10dd7c0392-Paper-Conference.pdf

conference, pruning, sparsity, (16 more...)

Country:

Europe > Italy (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Nevada (0.04)
(14 more...)

Industry: Information Technology (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

arXiv.org Artificial IntelligenceOct-21-2025

RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation

Luo, Jane, Zhang, Xin, Liu, Steven, Wu, Jie, Liu, Jianfeng, Huang, Yiming, Huang, Yangyu, Yin, Chengyu, Xin, Ying, Zhan, Yuefeng, Sun, Hao, Chen, Qi, Li, Scarlett, Yang, Mao

Large language models excel at generating individual functions or single files of code, yet generating complete repositories from scratch remains a fundamental challenge. This capability is key to building coherent software systems from high-level specifications and realizing the full potential of automated code generation. The process requires planning at two levels: deciding what features and modules to build (proposal stage) and defining their implementation details (implementation stage). Current approaches rely on natural language planning, which often produces unclear specifications, misaligned components, and brittle designs due to its inherent ambiguity and lack of structure. To address these limitations, we introduce the Repository Planning Graph (RPG), a structured representation that encodes capabilities, file structures, data flows, and functions in a unified graph. By replacing free-form natural language with an explicit blueprint, RPG enables consistent long-horizon planning for repository generation. Building on RPG, we develop ZeroRepo, a graph-driven framework that operates in three stages: proposal-level planning, implementation-level construction, and graph-guided code generation with test validation. To evaluate, we construct RepoCraft, a benchmark of six real-world projects with 1,052 tasks. On RepoCraft, ZeroRepo produces nearly 36K Code Lines and 445K Code Tokens, on average 3.9$\times$ larger than the strongest baseline (Claude Code), and 68$\times$ larger than other baselines. It achieves 81.5% coverage and 69.7% test accuracy, improving over Claude Code by 27.3 and 35.8 points. Further analysis shows that RPG models complex dependencies, enables more sophisticated planning through near-linear scaling, and improves agent understanding of repositories, thus accelerating localization.

large language model, machine learning, natural language, (19 more...)

2509.16198

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > Puerto Rico > Peñuelas > Peñuelas (0.04)
Asia > Japan (0.04)
Africa (0.04)

Genre:

Research Report (1.00)
Workflow (0.93)

Industry: Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsOct-9-2025, 05:57:31 GMT

Policy Gradient for Rectangular Robust Markov Decision Processes Anonymous Author(s) Affiliation Address email

We provide a closed-form expression for the worst occupation measure.

artificial intelligence, machine learning, optimization problem, (18 more...)

Country: North America > United States (0.06)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.41)

Neural Information Processing SystemsOct-8-2025, 00:11:24 GMT

040ace837dd270a87055bb10dd7c0392-Paper-Conference.pdf

artificial intelligence, machine learning, pruning, (19 more...)

Country:

Europe > Italy (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Nevada (0.04)
(14 more...)

Industry: Information Technology (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

arXiv.org Artificial IntelligenceSep-26-2025

Reparameterization Proximal Policy Optimization

Zhong, Hai, Wang, Xun, Li, Zhuoran, Huang, Longbo

Reparameterization policy gradient (RPG) is promising for improving sample efficiency by leveraging differentiable dynamics. However, a critical barrier is its training instability, where high-variance gradients can destabilize the learning process. To address this, we draw inspiration from Proximal Policy Optimization (PPO), which uses a surrogate objective to enable stable sample reuse in the model-free setting. We first establish a connection between this surrogate objective and RPG, which has been largely unexplored and is non-trivial. Then, we bridge this gap by demonstrating that the reparameterization gradient of a PPO-like surrogate objective can be computed efficiently using backpropagation through time. Based on this key insight, we propose Reparameterization Proximal Policy Optimization (RPO), a stable and sample-efficient RPG-based method. RPO enables stable sample reuse over multiple epochs by employing a policy gradient clipping mechanism tailored for RPG. It is further stabilized by Kullback-Leibler (KL) divergence regularization and remains fully compatible with existing variance reduction methods. We evaluate RPO on a suite of challenging locomotion and manipulation tasks, where experiments demonstrate that our method achieves superior sample efficiency and strong performance.

artificial intelligence, gradient, machine learning, (13 more...)

2508.06214

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)

arXiv.org Artificial IntelligenceJul-9-2025

From General Relation Patterns to Task-Specific Decision-Making in Continual Multi-Agent Coordination

Yao, Chang, Lin, Youfang, Song, Shoucheng, Wu, Hao, Ma, Yuqing, Han, Shang, Lv, Kai

Continual Multi-Agent Reinforcement Learning (Co-MARL) requires agents to address catastrophic forgetting issues while learning new coordination policies with the dynamics team. In this paper, we delve into the core of Co-MARL, namely Relation Patterns, which refer to agents' general understanding of interactions. In addition to generality, relation patterns exhibit task-specificity when mapped to different action spaces. To this end, we propose a novel method called General Relation Patterns-Guided Task-Specific Decision-Maker (RPG). In RPG, agents extract relation patterns from dynamic observation spaces using a relation capturer. These task-agnostic relation patterns are then mapped to different action spaces via a task-specific decision-maker generated by a conditional hypernetwork. To combat forgetting, we further introduce regularization items on both the relation capturer and the conditional hypernetwork. Results on SMAC and LBF demonstrate that RPG effectively prevents catastrophic forgetting when learning new tasks and achieves zero-shot generalization to unseen tasks.

artificial intelligence, machine learning, relation pattern, (13 more...)

2507.06004

Country:

Asia > China > Beijing > Beijing (0.05)
Europe > Austria (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Montenegro, Alessandro, Mansutti, Federico, Mussi, Marco, Papini, Matteo, Metelli, Alberto Maria

Reusing Trajectories in Policy Gradients Enables Fast Convergence

arXiv.org Artificial IntelligenceJun-9-2025

Policy gradient (PG) methods are a class of effective reinforcement learning algorithms, particularly when dealing with continuous control problems. These methods learn the parameters of parametric policies via stochastic gradient ascent, typically using on-policy trajectory data to estimate the policy gradient. However, such reliance on fresh data makes them sample-inefficient. Indeed, vanilla PG methods require $O(ε^{-2})$ trajectories to reach an $ε$-approximate stationary point. A common strategy to improve efficiency is to reuse off-policy information from past iterations, such as previous gradients or trajectories. While gradient reuse has received substantial theoretical attention, leading to improved rates of $O(ε^{-3/2})$, the reuse of past trajectories remains largely unexplored from a theoretical perspective. In this work, we provide the first rigorous theoretical evidence that extensive reuse of past off-policy trajectories can significantly accelerate convergence in PG methods. We introduce a power mean correction to the multiple importance weighting estimator and propose RPG (Retrospective Policy Gradient), a PG algorithm that combines old and new trajectories for policy updates. Through a novel analysis, we show that, under established assumptions, RPG achieves a sample complexity of $\widetilde{O}(ε^{-1})$, the best known rate in the literature. We further validate empirically our approach against PG methods with state-of-the-art rates.

machine learning, reinforcement learning, trajectory, (18 more...)