AITopics

2505.18433

Country: North America > United States > Colorado (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceAug-14-2025

The First Differentiable Transfer-Based Algorithm for Discrete MicroLED Repair

Lue, Ning-Yuan

Laser-enabled selective transfer, a key process in high-throughput microLED fabrication, requires computational models that can plan shift sequences to minimize motion of XY stages and adapt to varying optimization objectives across the substrate. We propose the first repair algorithm based on a differentiable transfer module designed to model discrete shifts of transfer platforms, while remaining trainable via gradient-based optimization. Compared to local proximity searching algorithms, our approach achieves superior repair performance and enables more flexible objective designs, such as minimizing the number of steps. Unlike reinforcement learning (RL)-based approaches, our method eliminates the need for handcrafted feature extractors and trains significantly faster, allowing scalability to large arrays. Experiments show a 50% reduction in transfer steps and sub-2-minute planning time on 2000x2000 arrays. This method provides a practical and adaptable solution for accelerating microLED repair in AR/VR and next-generation display fabrication.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

2508.09206

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Neural Information Processing SystemsAug-13-2025, 21:44:43 GMT

DiffPhyCon: A Generative Approach to Control Complex Physical Systems Long Wei

Classical techniques suffer from limited applicability or huge computational costs.

artificial intelligence, machine learning, reinforcement learning, (21 more...)

Country:

North America > United States (0.27)
Europe > Germany (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.92)

Industry:

Energy > Oil & Gas (0.46)
Information Technology (0.46)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(4 more...)

Neural Information Processing SystemsAug-13-2025, 17:58:11 GMT

0b96d81f0494fde5428c7aea243c9157-AuthorFeedback.pdf

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Industry: Energy > Oil & Gas (0.62)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.37)

Neural Information Processing SystemsAug-13-2025, 16:23:36 GMT

A Discussion on Hyper parameter Tuning

Contextual bandit is a class of online learning problems that can be viewed as a simple reinforcement learning problem without transition. For a completely understanding of contextual bandit problems, we refer the readers to the Chapter 4 of [Bubeck et al., 2012]. Here we include the main idea for completeness. In contextual bandit problems, the agent needs to find out the best action given some observed context (a.k.a the optimal policy in reinforcement learning). Formally, we define S as the context set and K as the number of action.

machine learning, reinforcement learning, srld, (16 more...)

Industry:

Energy > Oil & Gas > Upstream (0.46)
Education > Focused Education > Special Education (0.44)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.44)

Neural Information Processing SystemsAug-13-2025, 15:47:16 GMT

Value Prediction Network

Junhyuk Oh, Satinder Singh, Honglak Lee

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Country: North America > United States (0.68)

Genre: Research Report (0.46)

Industry:

Energy > Oil & Gas (0.68)
Leisure & Entertainment > Games > Computer Games (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.99)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsAug-13-2025, 00:13:48 GMT

ASPiRe: Adaptive Skill Priors for Reinforcement Learning

We introduce ASPiRe (Adaptive Skill Prior for RL), a new approach that leverages prior experience to accelerate reinforcement learning. Unlike existing methods that learn a single skill prior from a large and diverse dataset, our framework learns a library of different distinction skill priors (i.e., behavior priors) from a collection of specialized datasets, and learns how to combine them to solve a new task. This formulation allows the algorithm to acquire a set of specialized skill priors that are more reusable for downstream tasks; however, it also brings up additional challenges of how to effectively combine these unstructured sets of skill priors to form a new prior for new tasks. Specifically, it requires the agent not only to identify which skill prior(s) to use but also how to combine them (either sequentially or concurrently) to form a new prior. To achieve this goal, ASPiRe includes Adaptive Weight Module (AWM) that learns to infer an adaptive weight assignment between different skill priors and uses them to guide policy learning for downstream tasks via weighted Kullback-Leibler divergences.

artificial intelligence, aspire, reinforcement learning, (4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)

arXiv.org Artificial IntelligenceAug-13-2025

CPO: Addressing Reward Ambiguity in Role-playing Dialogue via Comparative Policy Optimization

Ye, Xinge, Wang, Rui, Wu, Yuchuan, Ma, Victor, Fang, Feiteng, Huang, Fei, Li, Yongbin

Reinforcement Learning Fine-Tuning (RLFT) has achieved notable success in tasks with objectively verifiable answers (e.g., code generation, mathematical reasoning), yet struggles with open-ended subjective tasks like role-playing dialogue. Traditional reward modeling approaches, which rely on independent sample-wise scoring, face dual challenges: subjective evaluation criteria and unstable reward signals.Motivated by the insight that human evaluation inherently combines explicit criteria with implicit comparative judgments, we propose Comparative Policy Optimization (CPO). CPO redefines the reward evaluation paradigm by shifting from sample-wise scoring to comparative group-wise scoring.Building on the same principle, we introduce the CharacterArena evaluation framework, which comprises two stages:(1) Contextualized Multi-turn Role-playing Simulation, and (2) Trajectory-level Comparative Evaluation. By operationalizing subjective scoring via objective trajectory comparisons, CharacterArena minimizes contextual bias and enables more robust and fair performance evaluation. Empirical results on CharacterEval, CharacterBench, and CharacterArena confirm that CPO effectively mitigates reward ambiguity and leads to substantial improvements in dialogue quality.

large language model, machine learning, reinforcement learning, (20 more...)

2508.09074

Country:

Asia > Thailand > Bangkok > Bangkok (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
Asia > Singapore (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
(2 more...)

arXiv.org Artificial IntelligenceAug-13-2025

Post-Completion Learning for Language Models

Fei, Xiang, Wang, Siqi, Wei, Shu, Nie, Yuxiang, Shi, Wei, Feng, Hao, Feng, Chao, Huang, Can

Current language model training paradigms typically terminate learning upon reaching the end-of-sequence () token, overlooking the potential learning opportunities in the post-completion space. We propose Post-Completion Learning (PCL), a novel training framework that systematically utilizes the sequence space after model output completion, to enhance both the reasoning and self-evaluation abilities. PCL enables models to continue generating self-assessments and reward predictions during training, while maintaining efficient inference by stopping at the completion point. To fully utilize this post-completion space, we design a white-box reinforcement learning method: let the model evaluate the output content according to the reward rules, then calculate and align the score with the reward functions for supervision. We implement dual-track SFT to optimize both reasoning and evaluation capabilities, and mixed it with RL training to achieve multi-objective hybrid optimization. Experimental results on different datasets and models demonstrate consistent improvements over traditional SFT and RL methods. Our method provides a new technical path for language model training that enhances output quality while preserving deployment efficiency.

2507.20252

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)

arXiv.org Artificial IntelligenceAug-13-2025

Unsupervised Skill Discovery as Exploration for Learning Agile Locomotion

Rho, Seungeun, Garg, Kartik, Byrd, Morgan, Ha, Sehoon

Exploration is crucial for enabling legged robots to learn agile locomotion behaviors that can overcome diverse obstacles. However, such exploration is inherently challenging, and we often rely on extensive reward engineering, expert demonstrations, or curriculum learning - all of which limit generalizability. In this work, we propose Skill Discovery as Exploration (SDAX), a novel learning framework that significantly reduces human engineering effort. SDAX leverages unsupervised skill discovery to autonomously acquire a diverse repertoire of skills for overcoming obstacles. To dynamically regulate the level of exploration during training, SDAX employs a bi-level optimization process that autonomously adjusts the degree of exploration. We demonstrate that SDAX enables quadrupedal robots to acquire highly agile behaviors including crawling, climbing, leaping, and executing complex maneuvers such as jumping off vertical walls. Finally, we deploy the learned policy on real hardware, validating its successful transfer to the real world.

arxiv preprint arxiv, machine learning, reinforcement learning, (13 more...)

2508.08982

Genre: Research Report (0.64)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots > Locomotion (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)