David Bertoin

Neural Information Processing Systems

Deep reinforcement learning policies, despite their outstanding efficiency in simulated visual control tasks, have shown disappointing ability to generalize across disturbances in the input training images. Changes in image statistics or distracting background elements are pitfalls that prevent generalization and real-world applicability of such control policies. We elaborate on the intuition that a good visual policy should be able to identify which pixels are important for its decision, and preserve this identification of important sources of information across images. This implies that training of a policy with small generalization gap should focus on such important pixels and ignore the others. This leads to the introduction of saliency-guided Q-networks (SGQN), a generic method for visual reinforcement learning, that is compatible with any value function learning method. SGQN vastly improves the generalization capability of Soft Actor-Critic agents and outperforms existing state-of-the-art methods on the Deepmind Control Generalization benchmark, setting a new reference in terms of training efficiency, generalization gap, and policy interpretability.


Localize, Understand, Collaborate: Semantic-Aware Dragging via Intention Reasoner

Neural Information Processing Systems

Flexible and accurate drag-based editing is a challenging task that has recently garnered significant attention. Current methods typically model this problem as automatically learning "how to drag" through point dragging and often produce one deterministic estimation, which presents two key limitations: 1) Overlooking the inherently ill-posed nature of drag-based editing, where multiple results may correspond to a given input, as illustrated in Figure 1; 2) Ignoring the constraint of image quality, which may lead to unexpected distortion. To alleviate this, we propose LucidDrag, which shifts the focus from "how to drag" to "what-then-how" paradigm. LucidDrag comprises an intention reasoner and a collaborative guidance sampling mechanism. The former infers several optimal editing strategies, identifying what content and what semantic direction to be edited. Based on the former, the latter addresses "how to drag" by collaboratively integrating existing editing guidance with the newly proposed semantic guidance and quality guidance. Specifically, semantic guidance is derived by establishing a semantic editing direction based on reasoned intentions, while quality guidance is achieved through classifier guidance using an image fidelity discriminator. Both qualitative and quantitative comparisons demonstrate the superiority of LucidDrag over previous methods.


Save over 100 on Sony XM4 headphones ahead of Memorial Day

Mashable

SAVE 120: As of May 23, Sony WH-1000XM4 headphones are on sale for 228 at Amazon. If you're looking for a seriously high-quality pair of headphones, you won't want to miss this great deal on Sony XM4s. Premium noise cancellation, stellar sound quality, and Alexa voice control, these are next level. And of May 23, you can get them for less. At Amazon, they are currently on sale for 228, saving you 120 on list price.


Forget Cocomelon--this kids' app won't rot their brains

Popular Science

If your child loves their tablet, but you struggle with finding appropriate games, try Pok Pok, a learning app for kids aged 2-8 that doesn't feel like learning. It features a collection of calming, open-ended digital toys that help children explore STEM, problem-solving, creativity, and more without ads, in-app purchases, or overstimulation. Built by parents in collaboration with early childhood experts, Pok Pok offers a Montessori-inspired experience that supports healthy screen time and lifelong learning. Kids using Pok Pok build foundational skills in STEM, problem-solving, language, numbers, cause and effect, and emotional development. Each game is open-ended, so there's no "winning" or "losing."


#ICRA2025 social media round-up

AIHub

The 2025 IEEE International Conference on Robotics & Automation (ICRA) took place from 19โ€“23 May, in Atlanta, USA. The event featured plenary and keynote sessions, tutorial and workshops, forums, and a community day. Find out what the participants got up during the conference. Check out what's happening at the #ICRA2025 Welcome Reception! The excitement is real -- #ICRA2025 is already buzzing!


#ICRA2025 social media round-up

Robohub

The 2025 IEEE International Conference on Robotics & Automation (ICRA) took place from 19โ€“23 May, in Atlanta, USA. The event featured plenary and keynote sessions, tutorial and workshops, forums, and a community day. Find out what the participants got up during the conference. Check out what's happening at the #ICRA2025 Welcome Reception! The excitement is real -- #ICRA2025 is already buzzing!


Supplementary Material A Derivations and Further Technical Details 15 A.1 Proof of Proposition 1

Neural Information Processing Systems

Following Haarnoja et al. [13], we can now rewrite Equation (A.4) as [ ( J A.3 Regularized Maximum Likelihood Estimation To address the collapse in predictive variance away from the offline dataset under MLE training seen in Figure 1, Wu et al. [51] in practice augment the usual MLE loss with an entropy bonus as follows: ฯ€ Whilst entropy regularization partially mitigates the collapse of predictive variance away from the expert demonstrations, we still observe the wrong trend similar to Figure 1 with predictive variances high near the expert demonstrations and low on unseen data. The variance surface also becomes more poorly behaved, with "islands" of high predictive variance appearing away from the data. Figure 12 shows the predictive variances of behavioral policies trained on expert demonstrations for the "door-binary-v0" environment with varying Tikhonov regularization coefficients ฮป. Similarly, Tikhonov regularization does not resolve the issue with calibration of uncertainties. We also observe that too high a regularization strength causes the model to underfit to the variances of the data.


On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations

Neural Information Processing Systems

KL-regularized reinforcement learning from expert demonstrations has proved successful in improving the sample efficiency of deep reinforcement learning algorithms, allowing them to be applied to challenging physical real-world tasks. However, we show that KL-regularized reinforcement learning with behavioral reference policies derived from expert demonstrations can suffer from pathological training dynamics that can lead to slow, unstable, and suboptimal online learning. We show empirically that the pathology occurs for commonly chosen behavioral policy classes and demonstrate its impact on sample efficiency and online policy performance. Finally, we show that the pathology can be remedied by non-parametric behavioral reference policies and that this allows KL-regularized reinforcement learning to significantly outperform state-of-the-art approaches on a variety of challenging locomotion and dexterous hand manipulation tasks.


Supplementary Material for: Parametrized Quantum Policies for Reinforcement Learning

Neural Information Processing Systems

Outline The Supplementary Material is organized as follows. In Appendix D, we give a specification of the environments considered in our numerical simulations, as well the hyperparameters we used to train all RL agents. In Appendix E, we present additional plots and numerical simulations that help our understanding and visualization of PQC polices. In Appendix F, we give a succinct description of the DLP classification task of Liu et al. In Appendices G to I, we prove our main Theorem 1 on learning separations in DLP environments.


CoSy: Evaluating Textual Explanations of Neurons

Neural Information Processing Systems

A crucial aspect of understanding the complex nature of Deep Neural Networks (DNNs) is the ability to explain learned concepts within their latent representations. While methods exist to connect neurons to human-understandable textual descriptions, evaluating the quality of these explanations is challenging due to the lack of a unified quantitative approach.