AITopics | stac

A Self-Tuning Actor-Critic Algorithm

Neural Information Processing SystemsDec-24-2025, 21:06:01 GMT

Reinforcement learning algorithms are highly sensitive to the choice of hyperparameters, typically requiring significant manual effort to identify hyperparameters that perform well on a new domain. In this paper, we take a step towards addressing this issue by using metagradients to automatically adapt hyperparameters online by meta-gradient descent (Xu et al., 2018). We apply our algorithm, Self-Tuning Actor-Critic (STAC), to self-tune all the differentiable hyperparameters of an actor-critic loss function, to discover auxiliary tasks, and to improve off-policy learning using a novel leaky V-trace operator. STAC is simple to use, sample efficient and does not require a significant increase in compute. Ablative studies show that the overall performance of STAC improved as we adapt more hyperparameters. When applied to the Arcade Learning Environment (Bellemare et al. 2012), STAC improved the median human normalized score in 200M steps from 243% to 364%. When applied to the DM Control suite (Tassa et al., 2018), STAC improved the mean score in 30M steps from 217 to 389 when learning with features, from 108 to 202 when learning from pixels, and from 195 to 295 in the Real-World Reinforcement Learning Challenge (Dulac-Arnold et al., 2020).

hyperparameter, name change, self-tuning actor-critic algorithm, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.83)

Add feedback

A Self-Tuning Actor-Critic Algorithm

Neural Information Processing SystemsMay-27-2025, 16:11:45 GMT

Reinforcement learning algorithms are highly sensitive to the choice of hyperparameters, typically requiring significant manual effort to identify hyperparameters that perform well on a new domain. In this paper, we take a step towards addressing this issue by using metagradients to automatically adapt hyperparameters online by meta-gradient descent (Xu et al., 2018). We apply our algorithm, Self-Tuning Actor-Critic (STAC), to self-tune all the differentiable hyperparameters of an actor-critic loss function, to discover auxiliary tasks, and to improve off-policy learning using a novel leaky V-trace operator. STAC is simple to use, sample efficient and does not require a significant increase in compute. Ablative studies show that the overall performance of STAC improved as we adapt more hyperparameters.

artificial intelligence, reinforcement learning, self-tuning actor-critic algorithm, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Add feedback

A Self-Tuning Actor-Critic Algorithm

Neural Information Processing SystemsJan-14-2025, 22:24:31 GMT

Reinforcement learning algorithms are highly sensitive to the choice of hyperparameters, typically requiring significant manual effort to identify hyperparameters that perform well on a new domain. In this paper, we take a step towards addressing this issue by using metagradients to automatically adapt hyperparameters online by meta-gradient descent (Xu et al., 2018). We apply our algorithm, Self-Tuning Actor-Critic (STAC), to self-tune all the differentiable hyperparameters of an actor-critic loss function, to discover auxiliary tasks, and to improve off-policy learning using a novel leaky V-trace operator. STAC is simple to use, sample efficient and does not require a significant increase in compute. Ablative studies show that the overall performance of STAC improved as we adapt more hyperparameters.

hyperparameter, self-tuning actor-critic algorithm, stac

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Add feedback

Efficient Multi-Robot Motion Planning for Manifold-Constrained Manipulators by Randomized Scheduling and Informed Path Generation

Guo, Weihang, Kingston, Zachary, Hang, Kaiyu, Kavraki, Lydia E.

arXiv.org Artificial IntelligenceNov-30-2024

Multi-robot motion planning for high degree-of-freedom manipulators in shared, constrained, and narrow spaces is a complex problem and essential for many scenarios such as construction, surgery, and more. Traditional coupled and decoupled methods either scale poorly or lack completeness, and hybrid methods that compose paths from individual robots together require the enumeration of many paths before they can find valid composite solutions. This paper introduces Scheduling to Avoid Collisions (StAC), a hybrid approach that more effectively composes paths from individual robots by scheduling (adding random stops and coordination motion along each path) and generates paths that are more likely to be feasible by using bidirectional feedback between the scheduler and motion planner for informed sampling. StAC uses 10 to 100 times fewer paths from the low-level planner than state-of-the-art baselines on challenging problems in manipulator cases.

artificial intelligence, collision, robot, (15 more...)

arXiv.org Artificial Intelligence

2412.00366

Country:

North America > United States > Texas > Harris County > Houston (0.04)
North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)

Add feedback

Unpacking Failure Modes of Generative Policies: Runtime Monitoring of Consistency and Progress

Agia, Christopher, Sinha, Rohan, Yang, Jingyun, Cao, Zi-ang, Antonova, Rika, Pavone, Marco, Bohg, Jeannette

arXiv.org Artificial IntelligenceOct-10-2024

Robot behavior policies trained via imitation learning are prone to failure under conditions that deviate from their training data. Thus, algorithms that monitor learned policies at test time and provide early warnings of failure are necessary to facilitate scalable deployment. We propose Sentinel, a runtime monitoring framework that splits the detection of failures into two complementary categories: 1) Erratic failures, which we detect using statistical measures of temporal action consistency, and 2) task progression failures, where we use Vision Language Models (VLMs) to detect when the policy confidently and consistently takes actions that do not solve the task. Our approach has two key strengths. First, because learned policies exhibit diverse failure modes, combining complementary detectors leads to significantly higher accuracy at failure detection. Second, using a statistical temporal action consistency measure ensures that we quickly detect when multimodal, generative policies exhibit erratic behavior at negligible computational cost. In contrast, we only use VLMs to detect failure modes that are less time-sensitive. We demonstrate our approach in the context of diffusion policies trained on robotic mobile manipulation domains in both simulation and the real world. By unifying temporal consistency detection and VLM runtime monitoring, Sentinel detects 18% more failures than using either of the two detectors alone and significantly outperforms baselines, thus highlighting the importance of assigning specialized detectors to complementary categories of failure. Qualitative results are made available at https://sites.google.com/stanford.edu/sentinel.

detector, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2410.0464

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.24)
North America > United States > New York > New York County > New York City (0.14)
Asia > South Korea > Daegu > Daegu (0.04)
(8 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Discourse Structure Extraction from Pre-Trained and Fine-Tuned Language Models in Dialogues

Li, Chuyuan, Huber, Patrick, Xiao, Wen, Amblard, Maxime, Braud, Chloé, Carenini, Giuseppe

arXiv.org Artificial IntelligenceJun-25-2023

Discourse processing suffers from data sparsity, especially for dialogues. As a result, we explore approaches to build discourse structures for dialogues, based on attention matrices from Pre-trained Language Models (PLMs). We investigate multiple tasks for fine-tuning and show that the dialogue-tailored Sentence Ordering task performs best. To locate and exploit discourse information in PLMs, we propose an unsupervised and a semi-supervised method. Our proposals achieve encouraging results on the STAC corpus, with F1 scores of 57.2 and 59.3 for unsupervised and semi-supervised methods, respectively. When restricted to projective trees, our scores improved to 63.3 and 68.1.

computational linguistic, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2302.05895

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Italy > Tuscany > Florence (0.04)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
(17 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

Zhou, Qiang, Yu, Chaohui, Wang, Zhibin, Qian, Qi, Li, Hao

arXiv.org Artificial IntelligenceMar-21-2021

Supervised learning based object detection frameworks demand plenty of laborious manual annotations, which may not be practical in real applications. Semi-supervised object detection (SSOD) can effectively leverage unlabeled data to improve the model performance, which is of great significance for the application of object detection models. In this paper, we revisit SSOD and propose Instant-Teaching, a completely end-to-end and effective SSOD framework, which uses instant pseudo labeling with extended weak-strong data augmentations for teaching during each training iteration. To alleviate the confirmation bias problem and improve the quality of pseudo annotations, we further propose a co-rectify scheme based on Instant-Teaching, denoted as Instant-Teaching$^*$. Extensive experiments on both MS-COCO and PASCAL VOC datasets substantiate the superiority of our framework. Specifically, our method surpasses state-of-the-art methods by 4.2 mAP on MS-COCO when using $2\%$ labeled data. Even with full supervised information of MS-COCO, the proposed method still outperforms state-of-the-art methods by about 1.0 mAP. On PASCAL VOC, we can achieve more than 5 mAP improvement by applying VOC07 as labeled data and VOC12 as unlabeled data.

instant-teaching, pseudo annotation, unlabeled data, (14 more...)

arXiv.org Artificial Intelligence

2103.11402

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre:

Research Report > New Finding (0.68)
Research Report > Promising Solution (0.54)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Filters

Collaborating Authors

stac

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

A Self-Tuning Actor-Critic Algorithm

A Self-Tuning Actor-Critic Algorithm

A Self-Tuning Actor-Critic Algorithm

Efficient Multi-Robot Motion Planning for Manifold-Constrained Manipulators by Randomized Scheduling and Informed Path Generation

Unpacking Failure Modes of Generative Policies: Runtime Monitoring of Consistency and Progress

Discourse Structure Extraction from Pre-Trained and Fine-Tuned Language Models in Dialogues

Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework