AITopics | td3

Collaborating Authors

td3

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

6191ab7080c840f67eaf5dff7d5edfcb-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 16:10:47 GMT

Diversity in equally-performing policies.We show that different neighborhoods correspond to different post-update return distributions and agent behaviors. We discover that at equal average returns, different policies obtained by the same deep RL algorithm may in fact have substantially different distributional profiles, as measured by statistics of the post-update return distribution.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Louisiana (0.04)
North America > Canada > Quebec (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)

Add feedback

29ef811e72b2b97cf18dd5d866b0f472-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 08:24:49 GMT

algorithm, equation, lnss, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Arizona (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Appendix for Softmax Deep Double Deterministic Policy Gradients Ling Pan

Neural Information Processing SystemsFeb-9-2026, 06:33:38 GMT

We demonstrate the smoothing effect of SD3 on the optimization landscape in this section, where experimental setup is the same as in Section 4.1 in the text for the comparative study of SD2 and Experimental details can be found in Section B.2. The performance comparison of SD3 and TD3 is shown in Figure 1(a), where SD3 significantly outperforms TD3. So far, we have demonstrated the smoothing effect of SD3 over TD3. Hyperparameters of DDPG and SD2 are summarized in Table 1. Assume that the actor is a local maximizer with respect to the critic.

artificial intelligence, machine learning, sd3, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.61)

Add feedback

884d247c6f65a96a7da4d1105d584ddd-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 06:33:31 GMT

DDPG [24]extends Q-learning to continuous control based on the Deterministic Policy Gradient [31] algorithm, which learns a deterministic policyπ(s;φ) parameterized byφto maximize the Q-function to approximate themaxoperator.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.72)

Add feedback

Truly Deterministic Policy Optimization

Neural Information Processing SystemsFeb-8-2026, 08:25:59 GMT

The P is asanoperators0 P(s,).

artificial intelligence, machine learning, optimization, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Champaign County > Urbana (0.04)
North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.47)

Add feedback

Learning When to Ask: Simulation-Trained Humanoids for Mental-Health Diagnosis

Cenacchi, Filippo, Richards, Deborah, Cao, Longbing

arXiv.org Artificial IntelligenceDec-11-2025

Testing humanoid robots with users is slow, causes wear, and limits iteration and diversity. Yet screening agents must master conversational timing, prosody, backchannels, and what to attend to in faces and speech for Depression and PTSD. Most simulators omit policy learning with nonverbal dynamics; many controllers chase task accuracy while underweighting trust, pacing, and rapport. We virtualise the humanoid as a conversational agent to train without hardware burden. Our agent-centred, simulation-first pipeline turns interview data into 276 Unreal Engine MetaHuman patients with synchronised speech, gaze/face, and head-torso poses, plus PHQ-8 and PCL-C flows. A perception-fusion-policy loop decides what and when to speak, when to backchannel, and how to avoid interruptions, under a safety shield. Training uses counterfactual replay (bounded nonverbal perturbations) and an uncertainty-aware turn manager that probes to reduce diagnostic ambiguity. Results are simulation-only; the humanoid is the transfer target. In comparing three controllers, a custom TD3 (Twin Delayed DDPG) outperformed PPO and CEM, achieving near-ceiling coverage with steadier pace at comparable rewards. Decision-quality analyses show negligible turn overlap, aligned cut timing, fewer clarification prompts, and shorter waits. Performance stays stable under modality dropout and a renderer swap, and rankings hold on a held-out patient split. Contributions: (1) an agent-centred simulator that turns interviews into 276 interactive patients with bounded nonverbal counterfactuals; (2) a safe learning loop that treats timing and rapport as first-class control variables; (3) a comparative study (TD3 vs PPO/CEM) with clear gains in completeness and social timing; and (4) ablations and robustness analyses explaining the gains and enabling clinician-supervised humanoid pilots.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2512.08952

Country:

Europe > Middle East > Cyprus (0.16)
Oceania > Australia (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

L2T-Tune:LLM-Guided Hybrid Database Tuning with LHS and TD3

Yang, Xinyue, Zheng, Chen, Hou, Yaoyang, Zhang, Renhao, Zhang, Yinyan, Wu, Yanjun, Zhang, Heng

arXiv.org Artificial IntelligenceNov-10-2025

Configuration tuning is critical for database performance. Although recent advancements in database tuning have shown promising results in throughput and latency improvement, challenges remain. First, the vast knob space makes direct optimization unstable and slow to converge. Second, reinforcement learning pipelines often lack effective warm-start guidance and require long offline training. Third, transferability is limited: when hardware or workloads change, existing models typically require substantial retraining to recover performance. To address these limitations, we propose L2T-Tune, a new LLM-guided hybrid database tuning framework that features a three-stage pipeline: Stage one performs a warm start that simultaneously generates uniform samples across the knob space and logs them into a shared pool; Stage two leverages a large language model to mine and prioritize tuning hints from manuals and community documents for rapid convergence. Stage three uses the warm-start sample pool to reduce the dimensionality of knobs and state features, then fine-tunes the configuration with the Twin Delayed Deep Deterministic Policy Gradient algorithm. We conduct experiments on L2T-Tune and the state-of-the-art models. Compared with the best-performing alternative, our approach improves performance by an average of 37.1% across all workloads, and by up to 73% on TPC-C. Compared with models trained with reinforcement learning, it achieves rapid convergence in the offline tuning stage on a single server. Moreover, during the online tuning stage, it only takes 30 steps to achieve best results.

large language model, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2511.01602

Country: Asia > China (0.48)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Actor-Free Continuous Control via Structurally Maximizable Q-Functions

Korkmaz, Yigit, Bhuwania, Urvi, Jain, Ayush, Bıyık, Erdem

arXiv.org Machine LearningOct-23-2025

Value-based algorithms are a cornerstone of off-policy reinforcement learning due to their simplicity and training stability. However, their use has traditionally been restricted to discrete action spaces, as they rely on estimating Q-values for individual state-action pairs. In continuous action spaces, evaluating the Q-value over the entire action space becomes computationally infeasible. To address this, actor-critic methods are typically employed, where a critic is trained on off-policy data to estimate Q-values, and an actor is trained to maximize the critic's output. Despite their popularity, these methods often suffer from instability during training. In this work, we propose a purely value-based framework for continuous control that revisits structural maximization of Q-functions, introducing a set of key architectural and algorithmic choices to enable efficient and stable learning. We evaluate the proposed actor-free Q-learning approach on a range of standard simulation tasks, demonstrating performance and sample efficiency on par with state-of-the-art baselines, without the cost of learning a separate actor. Particularly, in environments with constrained action spaces, where the value functions are typically non-smooth, our method with structural maximization outperforms traditional actor-critic methods with gradient-based maximization. We have released our code at https://github.com/USC-Lira/Q3C.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

2510.18828

Genre: Research Report (0.64)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Primer on SO(3) Action Representations in Deep Reinforcement Learning

Schuck, Martin, Samy, Sherif, Schoellig, Angela P.

arXiv.org Artificial IntelligenceOct-14-2025

Many robotic control tasks require policies to act on orientations, yet the geometry of SO(3) makes this nontrivial. Because SO(3) admits no global, smooth, minimal parameterization, common representations such as Euler angles, quaternions, rotation matrices, and Lie algebra coordinates introduce distinct constraints and failure modes. While these trade-offs are well studied for supervised learning, their implications for actions in reinforcement learning remain unclear. We systematically evaluate SO(3) action representations across three standard continuous control algorithms, PPO, SAC, and TD3, under dense and sparse rewards. We compare how representations shape exploration, interact with entropy regularization, and affect training stability through empirical studies and analyze the implications of different projections for obtaining valid rotations from Euclidean network outputs. Across a suite of robotics benchmarks, we quantify the practical impact of these choices and distill simple, implementation-ready guidelines for selecting and using rotation actions. Our results highlight that representation-induced geometry strongly influences exploration and optimization and show that representing actions as tangent vectors in the local frame yields the most reliable results across algorithms. Accurate reasoning over 3D rotations is a core requirement for machine learning algorithms applied in computer graphics, state estimation and control. In robotics and embodied intelligence, the problem extends to controlling physical orientations through learned actions, e.g., in manipulation policies that command full task-space poses or aerial vehicles that regulate attitude. These tasks rely on trained policies with action spaces including rotations in SO(3). This restriction has led to multiple parameterizations, each with its own tradeoffs (Macdonald, 2011; Barfoot, 2017). Euler angles are minimal and intuitive but suffer from order dependence, angle wrapping, and gimbal-lock singularities. Quaternions are smooth and numerically robust with a simple unit-norm constraint, but double-cover SO(3). Rotation matrices are a smooth and unique mapping, but are heavily over-parameterized and require orthonormalization. Viewing SO(3) as a Lie group, one can use tangent spaces, i.e., the Lie algebra m of skew-symmetric matrices, together with the exponential and logarithm maps to represent orientations. Tangent spaces are locally smooth, but globally exhibit singularities at large angles (Sol ` a et al., 2018). Irrespective of the choice of parameterization, any minimal 3-parameter chart must incur singularities, and global parameterizations that avoid singularities are necessarily redundant and constrained. Applications in deep learning that require reasoning over rotations and orientations have renewed interest in this topic by adding another perspective: irrespective of any mathematical properties, what is the best representation to learn from data in SO(3)?

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2510.11103

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

A Long N-step Surrogate Stage Reward for Deep Reinforcement Learning Junmin Zhong Arizona State University Ruofan Wu Arizona State University Jennie Si Arizona State University

Neural Information Processing SystemsOct-8-2025, 08:19:09 GMT

DDPG (TD3) [7], have demonstrated their great potential. Contributions . 1) We introduce a new, simple yet effective surrogate reward They usually proceed as follows.

algorithm, equation, lnss, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Arizona (1.00)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback