Localize, Understand, Collaborate: Semantic-Aware Dragging via Intention Reasoner

Neural Information Processing Systems

Flexible and accurate drag-based editing is a challenging task that has recently garnered significant attention. Current methods typically model this problem as automatically learning "how to drag" through point dragging and often produce one deterministic estimation, which presents two key limitations: 1) Overlooking the inherently ill-posed nature of drag-based editing, where multiple results may correspond to a given input, as illustrated in Figure 1; 2) Ignoring the constraint of image quality, which may lead to unexpected distortion. To alleviate this, we propose LucidDrag, which shifts the focus from "how to drag" to "what-then-how" paradigm. LucidDrag comprises an intention reasoner and a collaborative guidance sampling mechanism. The former infers several optimal editing strategies, identifying what content and what semantic direction to be edited. Based on the former, the latter addresses "how to drag" by collaboratively integrating existing editing guidance with the newly proposed semantic guidance and quality guidance. Specifically, semantic guidance is derived by establishing a semantic editing direction based on reasoned intentions, while quality guidance is achieved through classifier guidance using an image fidelity discriminator. Both qualitative and quantitative comparisons demonstrate the superiority of LucidDrag over previous methods.


Supplementary Material A Derivations and Further Technical Details 15 A.1 Proof of Proposition 1

Neural Information Processing Systems

Following Haarnoja et al. [13], we can now rewrite Equation (A.4) as [ ( J A.3 Regularized Maximum Likelihood Estimation To address the collapse in predictive variance away from the offline dataset under MLE training seen in Figure 1, Wu et al. [51] in practice augment the usual MLE loss with an entropy bonus as follows: ฯ€ Whilst entropy regularization partially mitigates the collapse of predictive variance away from the expert demonstrations, we still observe the wrong trend similar to Figure 1 with predictive variances high near the expert demonstrations and low on unseen data. The variance surface also becomes more poorly behaved, with "islands" of high predictive variance appearing away from the data. Figure 12 shows the predictive variances of behavioral policies trained on expert demonstrations for the "door-binary-v0" environment with varying Tikhonov regularization coefficients ฮป. Similarly, Tikhonov regularization does not resolve the issue with calibration of uncertainties. We also observe that too high a regularization strength causes the model to underfit to the variances of the data.


On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations

Neural Information Processing Systems

KL-regularized reinforcement learning from expert demonstrations has proved successful in improving the sample efficiency of deep reinforcement learning algorithms, allowing them to be applied to challenging physical real-world tasks. However, we show that KL-regularized reinforcement learning with behavioral reference policies derived from expert demonstrations can suffer from pathological training dynamics that can lead to slow, unstable, and suboptimal online learning. We show empirically that the pathology occurs for commonly chosen behavioral policy classes and demonstrate its impact on sample efficiency and online policy performance. Finally, we show that the pathology can be remedied by non-parametric behavioral reference policies and that this allows KL-regularized reinforcement learning to significantly outperform state-of-the-art approaches on a variety of challenging locomotion and dexterous hand manipulation tasks.


Supplementary Material for: Parametrized Quantum Policies for Reinforcement Learning

Neural Information Processing Systems

Outline The Supplementary Material is organized as follows. In Appendix D, we give a specification of the environments considered in our numerical simulations, as well the hyperparameters we used to train all RL agents. In Appendix E, we present additional plots and numerical simulations that help our understanding and visualization of PQC polices. In Appendix F, we give a succinct description of the DLP classification task of Liu et al. In Appendices G to I, we prove our main Theorem 1 on learning separations in DLP environments.


CoSy: Evaluating Textual Explanations of Neurons

Neural Information Processing Systems

A crucial aspect of understanding the complex nature of Deep Neural Networks (DNNs) is the ability to explain learned concepts within their latent representations. While methods exist to connect neurons to human-understandable textual descriptions, evaluating the quality of these explanations is challenging due to the lack of a unified quantitative approach.


Fair Sequential Selection Using Supervised Learning Models

Neural Information Processing Systems

We consider a selection problem where sequentially arrived applicants apply for a limited number of positions/jobs. At each time step, a decision maker accepts or rejects the given applicant using a pre-trained supervised learning model until all the vacant positions are filled. In this paper, we discuss whether the fairness notions (e.g., equal opportunity, statistical parity, etc.) that are commonly used in classification problems are suitable for the sequential selection problems. In particular, we show that even with a pre-trained model that satisfies the common fairness notions, the selection outcomes may still be biased against certain demographic groups. This observation implies that the fairness notions used in classification problems are not suitable for a selection problem where the applicants compete for a limited number of positions. We introduce a new fairness notion, "Equal Selection (ES)," suitable for sequential selection problems and propose a post-processing approach to satisfy the ES fairness notion. We also consider a setting where the applicants have privacy concerns, and the decision maker only has access to the noisy version of sensitive attributes. In this setting, we can show that the perfect ES fairness can still be attained under certain conditions.


Logical Activation Functions: Logit-space equivalents of Probabilistic Boolean Operators, Jason d'Eon

Neural Information Processing Systems

The choice of activation functions and their motivation is a long-standing issue within the neural network community. Neuronal representations within artificial neural networks are commonly understood as logits, representing the log-odds score of presence of features within the stimulus. We derive logit-space operators equivalent to probabilistic Boolean logic-gates AND, OR, and XNOR for independent probabilities. Such theories are important to formalize more complex dendritic operations in real neurons, and these operations can be used as activation functions within a neural network, introducing probabilistic Boolean-logic as the core operation of the neural network. Since these functions involve taking multiple exponents and logarithms, they are computationally expensive and not well suited to be directly used within neural networks.


Robot Talk Episode 122 โ€“ Bio-inspired flying robots, with Jane Pauline Ramos Ramirez

Robohub

Claire chatted to Jane Pauline Ramos Ramirez from Delft University of Technology about drones that can move on land and in the air. Jane Pauline Ramos Ramirez is a licensed engineer with a multidisciplinary background in bionics, mechanical, and aerospace engineering, and international research experience. Her life's work is rooted in designing inclusive, socially accessible systems that work in synergy with nature and create meaningful impact in communities. As part of this mission, she has been developing nature-inspired drones that can move on both land and in the air -- blending her appreciation for nature, design, and the mechanics of how things work.


This Google Chrome update could change the fundamentals of browsing - here's who gets to try it first

ZDNet

Google's Chrome browser for MacOS and Windows is receiving an infusion of new Gemini-powered capabilities, including an AI browsing assistant contextually sensitized to a user's browsing activities. Google made the announcement this week at Google I/O 2025. Dubbed Gemini-in-Chrome, the feature will be available May 21 to Google AI Pro and Google AI Ultra subscribers in the US as well as Chrome Beta, Dev, and Canary users. The general idea behind Gemini-in-Chrome is to reorganize, aggregate, and then more sensibly redisplay the data found on one or more browser tabs while also embellishing the final output with additional but relevant Gemini-generated information. For example, during a pre-event press briefing attended by ZDNET, Google director of Chrome product management Charmaine D'Silva demonstrated how Gemini-in-Chrome could not only organize a head-to-head feature comparison chart of individual sleeping bags -- to which multiple Chrome tabs (one tab per sleeping bag) were pointing -- but could respond to text prompts about each bag's suitability to the expected temperatures for an upcoming camping trip in Maine.


Improving Few-Shot Generalization by Exploring and Exploiting Auxiliary Data

Neural Information Processing Systems

Few-shot learning is valuable in many real-world applications, but learning a generalizable model without overfitting to the few labeled datapoints is challenging. In this work, we focus on Few-shot Learning with Auxiliary Data (FLAD), a training paradigm that assumes access to auxiliary data during few-shot learning in hopes of improving generalization. Previous works have proposed automated methods for mixing auxiliary and target data, but these methods typically scale linearly (or worse) with the number of auxiliary datasets, limiting their practicality. In this work we relate FLAD to the explore-exploit dilemma that is central to the multi-armed bandit setting and derive algorithms whose computational complexity is independent of the number of auxiliary datasets, allowing us to scale to 100 more auxiliary datasets than prior methods. We propose two algorithms - EXP3-FLAD and UCB1-FLAD - and compare them with prior FLAD methods that either explore or exploit, finding that the combination of exploration and exploitation is crucial. Through extensive experimentation we find that our methods outperform all pre-existing FLAD methods by 4% and lead to the first 3 billion parameter language models that outperform the 175 billion parameter GPT-3. Overall, our work suggests that the discovery of better, more efficient mixing strategies for FLAD may provide a viable path towards substantially improving generalization in few-shot learning. All of our code is available at github.com/alon-albalak/FLAD.